Message Boards Message Boards

3
|
4223 Views
|
5 Replies
|
8 Total Likes
View groups...
Share
Share this post:

Binning high-dimensional data into every sector of the hypercube

Posted 11 years ago
I'm trying to find an elegant way to bin data into every bisector of a hypercube that is generalizable to any number of dimensions.
For example with just 3 dims,
In[75]:= exdat = Tuples[{0.5, 1.5}, 3]
Out[75]= {{0.5, 0.5, 0.5}, {0.5, 0.5, 1.5}, {0.5, 1.5, 0.5}, {0.5, 1.5, 1.5}, {1.5, 0.5, 0.5}, {1.5, 0.5, 1.5}, 
  {1.5, 1.5, 0.5}, {1.5, 1.5, 1.5}} 

the correct answer would be
{{{0.5, 0.5, 0.5}}, {{0.5, 0.5, 1.5}}, {{0.5, 1.5, 0.5}}, {{0.5, 1.5, 1.5}},  {{1.5, 0.5, 0.5}}, {{1.5, 0.5, 1.5}}, 
{{1.5, 1.5, 0.5}}, {{1.5, 1.5, 1.5}}}
(one list per sector, splitting each dimension at 1.0)

Is there a form of BinLists that does this?

*Computational expense note: Immediately, I'm working in 8 dimensions
and have ~100k elements to bin. It may be smart to not store empty lists
and instead label the lists that are nonempty.
POSTED BY: Zach Bjornson
5 Replies
Hello Zach,
I'm not sure if this is what you would call elegant. It doesn't do one list per sector (i.e., empty lists don't appear as one of the bins), but it shouldn't be hard to figure out which elements are missing.
exdat = Tuples[{0.5, 1.5}, 3]
Gather[exdat, (Floor /@ #1 == Floor /@ #2) &]

moredat = RandomReal[{0, 2}, {100, 4}];
tmp = Gather[moredat, (Floor /@ #1 == Floor /@ #2) &];
Length /@ tmp
(* for my random seed*)
{9, 9, 4, 10, 2, 5, 9, 6, 8, 5, 7, 7, 3, 6, 4, 6}
Perhaps you already guessed this one, but dismissed it because it is going to be computationally expensive to do those 8 dimensional list SameQ's. 

Craig
POSTED BY: W. Craig Carter
Given the computational expense note, and assuming there is no form of BinLists that does this, I think the following does what you want, labeling the elements by bin.
altBin3D[list_] := SortBy[Floor[#[[1]]] -> #&/@ GatherBy[list, Floor], First];
In[6]:= altBin3D[Tuples[{0.5, 1.5}, 3]] // AbsoluteTiming
Out[6]= {0.000074, {{0, 0, 0} -> {{0.5, 0.5, 0.5}}, {0, 0, 1} -> {{0.5, 0.5, 1.5}}, {0, 1, 0} -> {{0.5, 1.5, 0.5}}, {0, 1, 1} -> {{0.5, 1.5, 1.5}}, {1, 0, 0} -> {{1.5, 0.5, 0.5}}, {1, 0, 1} -> {{1.5, 0.5, 1.5}}, {1, 1, 0} -> {{1.5, 1.5, 0.5}}, {1, 1, 1} -> {{1.5, 1.5, 1.5}}}}
A solution which fills in blank bins and doesn't label them, by constructing the list of bins and then using the labeled bins to replace them, is
bin3D[list_] := 
 Replace[Flatten[
   Table[{i, j, k}, 
    Evaluate[
     Sequence @@ 
      Transpose[
       Prepend[Through[{Floor[Min /@#] &, Floor[Max /@#] &}[Transpose[list]]], {i, j, k}]]]], 2], 
  Append[Floor[#[[1]]] -> #&/@ GatherBy[list, Floor], _ -> {}], {1}]
In[9]:= bin3D[Tuples[{0.5, 1.5}, 3]] // AbsoluteTiming
Out[9]= {0.000189, {{{0.5, 0.5, 0.5}}, {{0.5, 0.5, 1.5}}, {{0.5, 1.5, 0.5}}, {{0.5, 1.5, 1.5}}, {{1.5, 0.5, 0.5}}, {{1.5, 0.5, 1.5}}, {{1.5, 1.5, 0.5}}, {{1.5, 1.5, 1.5}}}}

but it looks to be as much as 1000 times slower (~.3 seconds vs ~.005 seconds for RandomReal[{-10, 10}, {1000, 3}]).
POSTED BY: Jeremy Michelson
Oops; that's obviously a factor of 100, not 1000. And over 10 of that (down to ~.01s) is recovered if you wrap the Append[...] in Dispatch.
POSTED BY: Jeremy Michelson
Similar to Jeremy's method, but slightly more general in terms of how to do the split:
bin[vals_, center_: 0] :=
  Sort[Map[#[[1, 2]] -> #[[All, 1]] &,
   GatherBy[Transpose[{vals, UnitStep[vals - center]}], Last]]]


vals = RandomReal[{-10, 10}, {100000, 8}];

Example:
Timing[sv = bin[vals];] 
Out[47]= {0.452403, Null}
POSTED BY: Daniel Lichtblau
It occurss to me one can also cook up something using a NearestFunction computed from those box centers. This will be slower but might generalize better for certain types of problem e.g. trisecting the dimensions.
POSTED BY: Daniel Lichtblau
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract