Message Boards Message Boards

0
|
4029 Views
|
5 Replies
|
2 Total Likes
View groups...
Share
Share this post:

How to bin non-scalar data?

Posted 9 years ago

I have a list of pairs that I want to bin accordingly to the first element only.

Since my data is 2-dimensional, BinLists wants 2 bin specifications, one for each dimension. I was hoping to cope with it, at the price of superfluous brackets in the result, with BinLists[data,1,Infinity] but Infinity is understandably rejected here. My second try was then BinLists[data,1,{{-Infinity,+Infinity}}] since both 1 and {{-Infinity,Infinity}} are valid bin specifications. Unfortunately, it appears that BinLists requires all of its bin specifications to be of the same type (either dx, or {xmin,xmax,dx}, or {{b1,b2,...,bn}}).

Of course, I could give any bin specification for the second element, and then flatten the data but this seems quite inelegant, as well as computationally inefficient. An interesting option could be a function BinListsBy, in the spirit of SortBy and related functions, to which I could pass as an argument the function First to achieve my goal.

So here are my questions:

  • Does anyone see a way to solve my problem other than those that I mentioned?
  • Is there a reason why BinLists only accepts bin specifications of the same type?
POSTED BY: Virgile Andreani
5 Replies
Posted 9 years ago

After some thinking, it occurred to me that functionality of the built-in BinLists could be transferred to a user-defined BinListsBy if BinLists itself was used in the implementation.

The function BinListsBy defined below will bin the first level items in data using the values obtained when the the function f is applied to each item. BinListsBy[data,f,options] accepts 2 arguments and an optional sequence of options. f is a function. When f is mapped onto the first level of data it must return a list of values acceptable to BinLists. options is an optional sequence of options which will be passed directly to BinLists. The built-in function BinLists does the real work. (An attached notebook contains the function and examples.)

EDIT: Function modified and file replaced

Define the BinListsBy function

BinListsBy[data_, f_, opts___] := Module[{binBy, binning, select},
  (* function f must return a list acceptable to BinLists *)
  binBy = f /@ data;
  (* construct bins of binBy values *)
  binning = Union/@BinLists[binBy, opts];
  (* selects data elements for which f[element] is in a list *)
  select[l_] := Select[data, MemberQ[l, f[#]] &];
  (* use select to bin the original data according to the binning \
lists *)
  select /@ binning
  ]

Make up some strings as data

In[11]:= data = Table[
  StringJoin@
   Table[FromLetterNumber[
     RandomInteger[{1, 26}]], {RandomInteger[{3, 7}]}],
  {100}
  ]

Out[11]= {"kyiixs", "yblgx", "hjjwffz", "fhmt", "cmdgq", "czya", \
"tirnp", "ykajxq", "qyjgpf", "voge", "dxghgt", "fexr", "bueexkv", \
"yunxf", "ysbf", "pilpc", "mzsolsq", "dtixsu", "qpfzr", "xlqe", \
"mskqarr", "brqeyg", "zmrrs", "czhgqw", "shx", "iiwu", "rwcwnge", \
"piwix", "wdprv", "vpluzh", "bfherb", "pzqb", "fbvhvd", "wzr", \
"ghviirv", "pds", "fwfaogx", "hljp", "uda", "npmh", "cerfxpo", \
"cdkk", "vspiq", "qwxrly", "xxenpd", "ivwvitr", "fgio", "agki", \
"hkcvouh", "jafsf", "resj", "cgrphv", "bqyqvhe", "eutp", "qwhmjzz", \
"dubopa", "xqh", "yglbfg", "ods", "mfgmcg", "ufokyy", "wkrauf", \
"vyccac", "kioicj", "pnavibb", "izhiekh", "ypvlstb", "hkkszt", \
"vixrt", "nhtjky", "fdayiml", "aiy", "efzzu", "vbjpcvv", "uruykdr", \
"zchwbj", "mdhfdqn", "aduvvjc", "xub", "lyb", "xrjmfi", "buxdt", \
"znirgo", "lgfab", "faxnq", "olkph", "hswtjy", "kccmed", "ernc", \
"vfulbb", "dtd", "fomket", "sma", "ivrbog", "oaod", "oljcalq", \
"gpzfixq", "rmueo", "vlnlali", "sykvt"}

This function returns the letter number of first character in a string

In[12]:= firstCharNum[str_] := LetterNumber@Characters[str][[1]]

In[13]:= firstCharNum[data[[1]]]

Out[13]= 11

Bin data in bin widths of 3 by LetterNumber of leading character (TableForm doesn't really work in the Forum.)

In[14]:= BinListsBy[data, firstCharNum, 3] // TableForm

Out[14]//TableForm= TableForm[{{
  "tirnp", "shx", "rwcwnge", "resj", "sma", "rmueo", "sykvt"}, {
  "bueexkv", "brqeyg", "bfherb", "agki", "bqyqvhe", "aiy", "aduvvjc", 
   "buxdt"}, {
  "kyiixs", "iiwu", "ivwvitr", "jafsf", "kioicj", "izhiekh", "kccmed",
    "ivrbog"}, {
  "mzsolsq", "mskqarr", "npmh", "mfgmcg", "nhtjky", "mdhfdqn", "lyb", 
   "lgfab"}, {
  "cmdgq", "czya", "dxghgt", "dtixsu", "czhgqw", "cerfxpo", "cdkk", 
   "cgrphv", "eutp", "dubopa", "efzzu", "ernc", "dtd"}, {
  "qyjgpf", "pilpc", "qpfzr", "piwix", "pzqb", "pds", "qwxrly", 
   "qwhmjzz", "ods", "pnavibb", "olkph", "oaod", "oljcalq"}, {
  "voge", "wdprv", "vpluzh", "wzr", "uda", "vspiq", "ufokyy", 
   "wkrauf", "vyccac", "vixrt", "vbjpcvv", "uruykdr", "vfulbb", 
   "vlnlali"}, {
  "yblgx", "ykajxq", "yunxf", "ysbf", "xlqe", "zmrrs", "xxenpd", 
   "xqh", "yglbfg", "ypvlstb", "zchwbj", "xub", "xrjmfi", "znirgo"}, {
  "hjjwffz", "fhmt", "fexr", "fbvhvd", "ghviirv", "fwfaogx", "hljp", 
   "fgio", "hkcvouh", "hkkszt", "fdayiml", "faxnq", "hswtjy", 
   "fomket", "gpzfixq"}}]
Attachments:
POSTED BY: David Keith
Posted 9 years ago

You're welcome, Virgile. But I do think your BinListsBy would be good, or it would be nice if BinLists could accept an an optional level spec and a function to be applied to data elements before establishing bins and sorting the input into them. That way we would have the full capabilities of BinLists without the need to produce complicated functions for GatherBy.

POSTED BY: David Keith
Posted 9 years ago

It is also worth noticing that if the pairs consist of approximate numbers rounded values can be used to gather pairs whose first elements are close by some designed value:

In[4]:= realPairs = RandomReal[{1, 10}, {30, 2}]

Out[4]= {{5.21667, 6.47609}, {9.31783, 1.36685}, {1.64424, 
  5.90676}, {1.63877, 4.69312}, {8.20514, 9.13797}, {1.43469, 
  7.53501}, {7.32208, 7.86721}, {9.79302, 3.65095}, {3.37067, 
  1.09812}, {4.44002, 8.87933}, {2.05523, 8.75433}, {1.15523, 
  3.88057}, {1.87054, 2.36079}, {7.08747, 7.347}, {1.53206, 
  2.1205}, {6.06558, 6.74602}, {9.85321, 1.59023}, {8.23373, 
  8.98786}, {2.92257, 5.85836}, {4.70867, 9.13353}, {9.7615, 
  7.22106}, {2.21636, 5.98876}, {9.25288, 9.94328}, {8.56543, 
  6.00081}, {9.32182, 2.52546}, {2.91193, 3.88105}, {2.05716, 
  9.22982}, {2.02069, 6.72177}, {5.95095, 1.592}, {9.22202, 3.78433}}

In[6]:= GatherBy[realPairs, Round[#[[1]], .5] &]

Out[6]= {{{5.21667, 6.47609}}, {{9.31783, 1.36685}, {9.25288, 
   9.94328}, {9.32182, 2.52546}}, {{1.64424, 5.90676}, {1.63877, 
   4.69312}, {1.43469, 7.53501}, {1.53206, 2.1205}}, {{8.20514, 
   9.13797}, {8.23373, 8.98786}}, {{7.32208, 7.86721}}, {{9.79302, 
   3.65095}, {9.85321, 1.59023}, {9.7615, 7.22106}}, {{3.37067, 
   1.09812}}, {{4.44002, 8.87933}, {4.70867, 9.13353}}, {{2.05523, 
   8.75433}, {1.87054, 2.36079}, {2.21636, 5.98876}, {2.05716, 
   9.22982}, {2.02069, 6.72177}}, {{1.15523, 3.88057}}, {{7.08747, 
   7.347}}, {{6.06558, 6.74602}, {5.95095, 1.592}}, {{2.92257, 
   5.85836}, {2.91193, 3.88105}}, {{8.56543, 6.00081}}, {{9.22202, 
   3.78433}}}
POSTED BY: David Keith

Thank you for your answer, this is what I was looking for.

POSTED BY: Virgile Andreani
Posted 9 years ago

You could use GatherBy:

(The pure function just presents the first part of each pair to GatherBy.)

In[2]:= pairs = RandomInteger[{1, 10}, {30, 2}]

Out[2]= {{8, 2}, {9, 6}, {7, 10}, {2, 4}, {9, 9}, {10, 2}, {8, 
  10}, {3, 1}, {10, 2}, {2, 4}, {8, 5}, {1, 1}, {2, 8}, {10, 9}, {10, 
  10}, {5, 7}, {4, 6}, {3, 6}, {1, 6}, {6, 10}, {2, 5}, {3, 4}, {8, 
  10}, {2, 10}, {10, 8}, {4, 8}, {1, 7}, {5, 4}, {6, 4}, {8, 10}}

In[3]:= GatherBy[pairs, #[[1]] &]

Out[3]= {{{8, 2}, {8, 10}, {8, 5}, {8, 10}, {8, 10}}, {{9, 6}, {9, 
   9}}, {{7, 10}}, {{2, 4}, {2, 4}, {2, 8}, {2, 5}, {2, 10}}, {{10, 
   2}, {10, 2}, {10, 9}, {10, 10}, {10, 8}}, {{3, 1}, {3, 6}, {3, 
   4}}, {{1, 1}, {1, 6}, {1, 7}}, {{5, 7}, {5, 4}}, {{4, 6}, {4, 
   8}}, {{6, 10}, {6, 4}}}
POSTED BY: David Keith
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract