3
|
4492 Views
|
5 Replies
|
8 Total Likes
View groups...
Share
GROUPS:

# Binning high-dimensional data into every sector of the hypercube

Posted 12 years ago
 I'm trying to find an elegant way to bin data into every bisector of a hypercube that is generalizable to any number of dimensions. For example with just 3 dims,In[75]:= exdat = Tuples[{0.5, 1.5}, 3]Out[75]= {{0.5, 0.5, 0.5}, {0.5, 0.5, 1.5}, {0.5, 1.5, 0.5}, {0.5, 1.5, 1.5}, {1.5, 0.5, 0.5}, {1.5, 0.5, 1.5},   {1.5, 1.5, 0.5}, {1.5, 1.5, 1.5}} the correct answer would be{{{0.5, 0.5, 0.5}}, {{0.5, 0.5, 1.5}}, {{0.5, 1.5, 0.5}}, {{0.5, 1.5, 1.5}},  {{1.5, 0.5, 0.5}}, {{1.5, 0.5, 1.5}}, {{1.5, 1.5, 0.5}}, {{1.5, 1.5, 1.5}}}(one list per sector, splitting each dimension at 1.0)Is there a form of BinLists that does this? *Computational expense note: Immediately, I'm working in 8 dimensions and have ~100k elements to bin. It may be smart to not store empty lists and instead label the lists that are nonempty.
5 Replies
Sort By:
Posted 12 years ago
 Hello Zach,I'm not sure if this is what you would call elegant. It doesn't do one list per sector (i.e., empty lists don't appear as one of the bins), but it shouldn't be hard to figure out which elements are missing.exdat = Tuples[{0.5, 1.5}, 3]Gather[exdat, (Floor /@ #1 == Floor /@ #2) &]moredat = RandomReal[{0, 2}, {100, 4}];tmp = Gather[moredat, (Floor /@ #1 == Floor /@ #2) &];Length /@ tmp(* for my random seed*){9, 9, 4, 10, 2, 5, 9, 6, 8, 5, 7, 7, 3, 6, 4, 6}Perhaps you already guessed this one, but dismissed it because it is going to be computationally expensive to do those 8 dimensional list SameQ's. Craig
Posted 12 years ago
 Given the computational expense note, and assuming there is no form of BinLists that does this, I think the following does what you want, labeling the elements by bin.altBin3D[list_] := SortBy[Floor[#[[1]]] -> #&/@ GatherBy[list, Floor], First];In[6]:= altBin3D[Tuples[{0.5, 1.5}, 3]] // AbsoluteTimingOut[6]= {0.000074, {{0, 0, 0} -> {{0.5, 0.5, 0.5}}, {0, 0, 1} -> {{0.5, 0.5, 1.5}}, {0, 1, 0} -> {{0.5, 1.5, 0.5}}, {0, 1, 1} -> {{0.5, 1.5, 1.5}}, {1, 0, 0} -> {{1.5, 0.5, 0.5}}, {1, 0, 1} -> {{1.5, 0.5, 1.5}}, {1, 1, 0} -> {{1.5, 1.5, 0.5}}, {1, 1, 1} -> {{1.5, 1.5, 1.5}}}}A solution which fills in blank bins and doesn't label them, by constructing the list of bins and then using the labeled bins to replace them, isbin3D[list_] :=  Replace[Flatten[   Table[{i, j, k},     Evaluate[     Sequence @@       Transpose[       Prepend[Through[{Floor[Min /@#] &, Floor[Max /@#] &}[Transpose[list]]], {i, j, k}]]]], 2],   Append[Floor[#[[1]]] -> #&/@ GatherBy[list, Floor], _ -> {}], {1}]In[9]:= bin3D[Tuples[{0.5, 1.5}, 3]] // AbsoluteTimingOut[9]= {0.000189, {{{0.5, 0.5, 0.5}}, {{0.5, 0.5, 1.5}}, {{0.5, 1.5, 0.5}}, {{0.5, 1.5, 1.5}}, {{1.5, 0.5, 0.5}}, {{1.5, 0.5, 1.5}}, {{1.5, 1.5, 0.5}}, {{1.5, 1.5, 1.5}}}}but it looks to be as much as 1000 times slower (~.3 seconds vs ~.005 seconds for RandomReal[{-10, 10}, {1000, 3}]).
Posted 12 years ago
 Oops; that's obviously a factor of 100, not 1000. And over 10 of that (down to ~.01s) is recovered if you wrap the Append[...] in Dispatch.
Posted 12 years ago
 Similar to Jeremy's method, but slightly more general in terms of how to do the split:bin[vals_, center_: 0] :=  Sort[Map[#[[1, 2]] -> #[[All, 1]] &,   GatherBy[Transpose[{vals, UnitStep[vals - center]}], Last]]]vals = RandomReal[{-10, 10}, {100000, 8}];Example:Timing[sv = bin[vals];] Out[47]= {0.452403, Null}
Posted 12 years ago
 It occurss to me one can also cook up something using a NearestFunction computed from those box centers. This will be slower but might generalize better for certain types of problem e.g. trisecting the dimensions.