Message Boards Message Boards

0
|
4963 Views
|
5 Replies
|
2 Total Likes
View groups...
Share
Share this post:

How to bin non-scalar data?

Posted 9 years ago
POSTED BY: Virgile Andreani
5 Replies
Posted 9 years ago

After some thinking, it occurred to me that functionality of the built-in BinLists could be transferred to a user-defined BinListsBy if BinLists itself was used in the implementation.

The function BinListsBy defined below will bin the first level items in data using the values obtained when the the function f is applied to each item. BinListsBy[data,f,options] accepts 2 arguments and an optional sequence of options. f is a function. When f is mapped onto the first level of data it must return a list of values acceptable to BinLists. options is an optional sequence of options which will be passed directly to BinLists. The built-in function BinLists does the real work. (An attached notebook contains the function and examples.)

EDIT: Function modified and file replaced

Define the BinListsBy function

BinListsBy[data_, f_, opts___] := Module[{binBy, binning, select},
  (* function f must return a list acceptable to BinLists *)
  binBy = f /@ data;
  (* construct bins of binBy values *)
  binning = Union/@BinLists[binBy, opts];
  (* selects data elements for which f[element] is in a list *)
  select[l_] := Select[data, MemberQ[l, f[#]] &];
  (* use select to bin the original data according to the binning \
lists *)
  select /@ binning
  ]

Make up some strings as data

In[11]:= data = Table[
  StringJoin@
   Table[FromLetterNumber[
     RandomInteger[{1, 26}]], {RandomInteger[{3, 7}]}],
  {100}
  ]

Out[11]= {"kyiixs", "yblgx", "hjjwffz", "fhmt", "cmdgq", "czya", \
"tirnp", "ykajxq", "qyjgpf", "voge", "dxghgt", "fexr", "bueexkv", \
"yunxf", "ysbf", "pilpc", "mzsolsq", "dtixsu", "qpfzr", "xlqe", \
"mskqarr", "brqeyg", "zmrrs", "czhgqw", "shx", "iiwu", "rwcwnge", \
"piwix", "wdprv", "vpluzh", "bfherb", "pzqb", "fbvhvd", "wzr", \
"ghviirv", "pds", "fwfaogx", "hljp", "uda", "npmh", "cerfxpo", \
"cdkk", "vspiq", "qwxrly", "xxenpd", "ivwvitr", "fgio", "agki", \
"hkcvouh", "jafsf", "resj", "cgrphv", "bqyqvhe", "eutp", "qwhmjzz", \
"dubopa", "xqh", "yglbfg", "ods", "mfgmcg", "ufokyy", "wkrauf", \
"vyccac", "kioicj", "pnavibb", "izhiekh", "ypvlstb", "hkkszt", \
"vixrt", "nhtjky", "fdayiml", "aiy", "efzzu", "vbjpcvv", "uruykdr", \
"zchwbj", "mdhfdqn", "aduvvjc", "xub", "lyb", "xrjmfi", "buxdt", \
"znirgo", "lgfab", "faxnq", "olkph", "hswtjy", "kccmed", "ernc", \
"vfulbb", "dtd", "fomket", "sma", "ivrbog", "oaod", "oljcalq", \
"gpzfixq", "rmueo", "vlnlali", "sykvt"}

This function returns the letter number of first character in a string

In[12]:= firstCharNum[str_] := LetterNumber@Characters[str][[1]]

In[13]:= firstCharNum[data[[1]]]

Out[13]= 11

Bin data in bin widths of 3 by LetterNumber of leading character (TableForm doesn't really work in the Forum.)

In[14]:= BinListsBy[data, firstCharNum, 3] // TableForm

Out[14]//TableForm= TableForm[{{
  "tirnp", "shx", "rwcwnge", "resj", "sma", "rmueo", "sykvt"}, {
  "bueexkv", "brqeyg", "bfherb", "agki", "bqyqvhe", "aiy", "aduvvjc", 
   "buxdt"}, {
  "kyiixs", "iiwu", "ivwvitr", "jafsf", "kioicj", "izhiekh", "kccmed",
    "ivrbog"}, {
  "mzsolsq", "mskqarr", "npmh", "mfgmcg", "nhtjky", "mdhfdqn", "lyb", 
   "lgfab"}, {
  "cmdgq", "czya", "dxghgt", "dtixsu", "czhgqw", "cerfxpo", "cdkk", 
   "cgrphv", "eutp", "dubopa", "efzzu", "ernc", "dtd"}, {
  "qyjgpf", "pilpc", "qpfzr", "piwix", "pzqb", "pds", "qwxrly", 
   "qwhmjzz", "ods", "pnavibb", "olkph", "oaod", "oljcalq"}, {
  "voge", "wdprv", "vpluzh", "wzr", "uda", "vspiq", "ufokyy", 
   "wkrauf", "vyccac", "vixrt", "vbjpcvv", "uruykdr", "vfulbb", 
   "vlnlali"}, {
  "yblgx", "ykajxq", "yunxf", "ysbf", "xlqe", "zmrrs", "xxenpd", 
   "xqh", "yglbfg", "ypvlstb", "zchwbj", "xub", "xrjmfi", "znirgo"}, {
  "hjjwffz", "fhmt", "fexr", "fbvhvd", "ghviirv", "fwfaogx", "hljp", 
   "fgio", "hkcvouh", "hkkszt", "fdayiml", "faxnq", "hswtjy", 
   "fomket", "gpzfixq"}}]
Attachments:
POSTED BY: David Keith
Posted 9 years ago

You're welcome, Virgile. But I do think your BinListsBy would be good, or it would be nice if BinLists could accept an an optional level spec and a function to be applied to data elements before establishing bins and sorting the input into them. That way we would have the full capabilities of BinLists without the need to produce complicated functions for GatherBy.

POSTED BY: David Keith
Posted 9 years ago

It is also worth noticing that if the pairs consist of approximate numbers rounded values can be used to gather pairs whose first elements are close by some designed value:

In[4]:= realPairs = RandomReal[{1, 10}, {30, 2}]

Out[4]= {{5.21667, 6.47609}, {9.31783, 1.36685}, {1.64424, 
  5.90676}, {1.63877, 4.69312}, {8.20514, 9.13797}, {1.43469, 
  7.53501}, {7.32208, 7.86721}, {9.79302, 3.65095}, {3.37067, 
  1.09812}, {4.44002, 8.87933}, {2.05523, 8.75433}, {1.15523, 
  3.88057}, {1.87054, 2.36079}, {7.08747, 7.347}, {1.53206, 
  2.1205}, {6.06558, 6.74602}, {9.85321, 1.59023}, {8.23373, 
  8.98786}, {2.92257, 5.85836}, {4.70867, 9.13353}, {9.7615, 
  7.22106}, {2.21636, 5.98876}, {9.25288, 9.94328}, {8.56543, 
  6.00081}, {9.32182, 2.52546}, {2.91193, 3.88105}, {2.05716, 
  9.22982}, {2.02069, 6.72177}, {5.95095, 1.592}, {9.22202, 3.78433}}

In[6]:= GatherBy[realPairs, Round[#[[1]], .5] &]

Out[6]= {{{5.21667, 6.47609}}, {{9.31783, 1.36685}, {9.25288, 
   9.94328}, {9.32182, 2.52546}}, {{1.64424, 5.90676}, {1.63877, 
   4.69312}, {1.43469, 7.53501}, {1.53206, 2.1205}}, {{8.20514, 
   9.13797}, {8.23373, 8.98786}}, {{7.32208, 7.86721}}, {{9.79302, 
   3.65095}, {9.85321, 1.59023}, {9.7615, 7.22106}}, {{3.37067, 
   1.09812}}, {{4.44002, 8.87933}, {4.70867, 9.13353}}, {{2.05523, 
   8.75433}, {1.87054, 2.36079}, {2.21636, 5.98876}, {2.05716, 
   9.22982}, {2.02069, 6.72177}}, {{1.15523, 3.88057}}, {{7.08747, 
   7.347}}, {{6.06558, 6.74602}, {5.95095, 1.592}}, {{2.92257, 
   5.85836}, {2.91193, 3.88105}}, {{8.56543, 6.00081}}, {{9.22202, 
   3.78433}}}
POSTED BY: David Keith

Thank you for your answer, this is what I was looking for.

POSTED BY: Virgile Andreani
Posted 9 years ago

You could use GatherBy:

(The pure function just presents the first part of each pair to GatherBy.)

In[2]:= pairs = RandomInteger[{1, 10}, {30, 2}]

Out[2]= {{8, 2}, {9, 6}, {7, 10}, {2, 4}, {9, 9}, {10, 2}, {8, 
  10}, {3, 1}, {10, 2}, {2, 4}, {8, 5}, {1, 1}, {2, 8}, {10, 9}, {10, 
  10}, {5, 7}, {4, 6}, {3, 6}, {1, 6}, {6, 10}, {2, 5}, {3, 4}, {8, 
  10}, {2, 10}, {10, 8}, {4, 8}, {1, 7}, {5, 4}, {6, 4}, {8, 10}}

In[3]:= GatherBy[pairs, #[[1]] &]

Out[3]= {{{8, 2}, {8, 10}, {8, 5}, {8, 10}, {8, 10}}, {{9, 6}, {9, 
   9}}, {{7, 10}}, {{2, 4}, {2, 4}, {2, 8}, {2, 5}, {2, 10}}, {{10, 
   2}, {10, 2}, {10, 9}, {10, 10}, {10, 8}}, {{3, 1}, {3, 6}, {3, 
   4}}, {{1, 1}, {1, 6}, {1, 7}}, {{5, 7}, {5, 4}}, {{4, 6}, {4, 
   8}}, {{6, 10}, {6, 4}}}
POSTED BY: David Keith
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract