Note that I changed the code a bit on May 8th, 2017. Now the function is always mapped--independent of the dimensions of the input data.
A common problem is that you want to bin data by (e.g.) the x coordinate, while maintaining some associated data y with it. We can define a new function BinListsBy to bin n-dimensional data in m dimensions:
ClearAll[BinListsBy]
BinListsBy[data_List,binspecs__List]:=Module[{fs,idata,len,out},
len=Length[data];
fs={binspecs};
If[AllTrue[fs, MatchQ[#, {_, _?NumericQ, _?NumericQ} | {_, _?NumericQ, _?NumericQ, _?NumericQ}] &],
idata=Table[Map[f,data],{f,fs[[All,1]]}];
AppendTo[idata,Range[len]];
out=BinLists[Transpose[idata],Sequence@@(Rest/@fs),{0,len+1,len+1}];
out=Part[out,Sequence@@ConstantArray[All,Length[fs]],1,All,-1];
Map[data[[#]] &, out, {Length[fs]}]
,
Print["Your specification is incorrect\[Ellipsis]"];
]
]
BinListsBy::usage=Evaluate[Uncompress@"1:eJztlctqwkAUhn2DvsLpPoUYobelha66qkt1McmcmFPMTHDGSxBfuE/RXKiSRCSOgyi6mTAXPv6f85+cR19+D34fOp0+iS9SWvXT4Xr0/vny0S3Wt2J9Hcx9Fcwo0X25GpYXq+LjOlBuy+fuuHJ6JMOzwOg1GJ7rPW8cWIctgVQFxiRMZNUpbFWhbMbgk1BGFqlCAhKg5kGAStECs53GCc5Kuo96iSis+AYmuBXr4DOFHKQAHSEs2HSOCjiGJLJTP4WYJQmJCYSgJSALIpBh/jQeuV3vHtOzxtQBTnajWqCygnKCJXEd3TN6fRltBVMJBhZ8NTFNa7ksMOZVIzQG8cQpRqFICjbN8yrySmf1tdAA21j9B6gVMzzIbKus6bwGCuWszPLWfz3N54tzeoixcYyEGPVEeohhKKTZWMcL6dWEXMEAsZEeC+G5wSl0IX27GyEX0r8nCrLfxztBtz5sjQryY6EgTUZRiL29mOm8xEGe/Q1ykZxptmeo/wEBvbZm"];
?BinListsBy
Basic Examples
Bin 2D data in only 1 dimension:
data={{29,48},{76,70},{37,78},{60,63},{83,0},{42,44},{26,77},{59,91},{93,46},{38,24},{30,97},{76,60},{98,50},{35,2},{22,17},{90,90},{90,67},{34,22},{97,26},{78,85},{70,55},{59,92},{15,66},{30,84},{45,48},{91,13},{69,94},{10,100},{97,40},{91,87},{74,24},{66,91},{9,93},{67,2},{44,22},{54,96},{67,13},{23,12},{88,2},{35,82},{76,64},{84,32},{62,5},{56,84},{88,68},{25,99},{41,13},{23,4},{40,14},{89,90}};
out=BinListsBy[data,{First,0,100,10}]
out//Grid
The data can be of any type:
data={{1.2,"This"},{5.1,"is"},{4.5,"just"},{3.1,"some"},{2.4,"test"},{7.1,"data"},{6.2,"from"},{3.14,"Sander"},{-0.7,"ok?"}};
BinListsBy[data,{First,-3,10}]
Column[%, Frame -> All]
will output:
{{},{},{{-0.7,ok?}},{},{{1.2,This}},{{2.4,test}},{{3.1,some},{3.14,Sander}},{{4.5,just}},{{5.1,is}},{{6.2,from}},{{7.1,data}},{},{}}
such that it is binned using the first element, and the associate data is kept.
Bin based on the length of a string:
data={"This","is","just","some","test","data","from","Sander","ok?"};
BinListsBy[data,{StringLength,1,10}]
giving:
{{},{is},{ok?},{This,just,some,test,data,from},{},{Sander},{},{},{}}
Of course we can also bin 2D data in a single dimension based on the length of a string:
data={{1.2,"This"},{5.1,"is"},{4.5,"just"},{3.1,"some"},{2.4,"test"},{7.1,"data"},{6.2,"from"},{3.14,"Sander"},{-0.7,"ok?"}};
BinListsBy[data,{StringLength@*Last,1,10}]
Column[%,Frame->All]
giving:
{{},{{5.1,is}},{{-0.7,ok?}},{{1.2,This},{4.5,just},{3.1,some},{2.4,test},{7.1,data},{6.2,from}},{},{{3.14,Sander}},{},{},{}}
where now strings of equal length are binned using a {1,10} specification.
Scope
We can also do higher number of dimensions:
data={{1.2,2.3,"This"},{5.1,1.2 ,"is"},{4.5,3.1,"just"},{3.1,1.87,"some"},{2.4,1.6,"test"},{7.1,1.4,"data"},{6.2,7.3,"from"},{3.14,9.1,"Sander"},{-0.7,0.8,"ok?"}};
BinListsBy[data,{First,-2,8},{#[[2]]&,0,10},{StringLength@*Last,1,7,1}];
Grid[Map[Column,%,{2}],Frame->All]
which nicely outputs a 3D list of binned data.
Using Composition and RightComposition or pure functions can make elaborate functionality. Here we bin based on number of unique characters:
data={"This","is","just","some","test","data","from","Sander","ok?"};
BinListsBy[data,{Length@*DeleteDuplicates@*Characters@*ToLowerCase,1,10}]
giving:
{{},{is},{test,data,ok?},{This,just,some,from},{},{Sander},{},{},{}}
Note that the data can have different lengths and types or structures:
data={{1.3,{1,1}},{2.3,{1,1,1}},{4.14,1,2,"test"},{2.5,{2,1},2},{4.3}};
BinListsBy[data,{First,0,6}];
Column[%]
Or by a criteria that works based on aggregate functions (Total, Mean):
data={{1,2},{4,5},{-2,5},{3,6},{2,3},{1,1}}
BinListsBy[data,{Total,1,10}];
Column[%]
Properties & Relations
If the data is purely numeric, real, and not ragged, one can specify very large bins for BinLists to capture all the data in the other dimension:
data={{29,48},{76,70},{37,78},{60,63},{83,0},{42,44},{26,77},{59,91},{93,46},{38,24},{30,97},{76,60},{98,50},{35,2},{22,17},{90,90},{90,67},{34,22},{97,26},{78,85},{70,55},{59,92},{15,66},{30,84},{45,48},{91,13},{69,94},{10,100},{97,40},{91,87},{74,24},{66,91},{9,93},{67,2},{44,22},{54,96},{67,13},{23,12},{88,2},{35,82},{76,64},{84,32},{62,5},{56,84},{88,68},{25,99},{41,13},{23,4},{40,14},{89,90}};
BinLists[data,{0,100,10},{-10000,10000,20000}][[All,1]]===BinListsBy[data,{First,0,100,10}]
Neat Examples
One can bin 1D data in 2D:
data={"This","is","just","some","test","data","from","Sander","ok?"};
BinListsBy[data,{Length@*DeleteDuplicates@*Characters@*ToLowerCase,1,10},{StringLength,0,10}]//Grid
So we first bin by unique characters, and then by string length:
Final thoughts
I've been advocating for this functionality for a while, it is basically BinLists but with an extra By like GatherBy and SortBy such that nD data can be binned in mD (which is currently not possible!). It is indeed very useful for data processing where it is very common to bin by a certain value, while wanting to maintain some auxiliary data (for example an ID of the data, or time-stamp or ...).