Group Abstract Group Abstract

Message Boards Message Boards

6
|
4.8K Views
|
11 Replies
|
39 Total Likes
View groups...
Share
Share this post:

Can Wolfram's new compiler technology speed up this AABB tree code?

Posted 1 year ago

An axes aligned bounding box (AABB) tree, a type of bounding volume hierarchy, is a spatial data structure designed to efficiently find overlapping bounding boxes. It is often used to find broad-level collisions in applications such as collision detection and ray tracing.

Question

I have an implementation in WL that uses some Compile and DataStructure, but is mostly top-level WL code. I'm looking to increase performance, while staying in the WL ecosystem.

Are there any tips to speed up my code? In particular any tips to port this over to compiled code using FunctionCompile & friends? Side note, would be nice if this was a builtin DataStructure :)

I've posted the code below. Feel free to ask any questions about how it works, etc.

Examples

Basic example

Bounding boxes are of the form $\{\{x_{min}, x_{max}\}, \{y_{min}, y_{max}\}, \{z_{min}, z_{max}\}\}$.

Let's create the AABB tree over 3 bounding boxes and make 1 query bounding box:

bboxes = {
  {{0, 1}, {0, 1}, {0, 1}},
  {{0.5, 1.5}, {0.25, 1.5}, {0.3, 0.7}}, 
  {{0.7, 0.8}, {0.8, 2}, {0.2, 0.8}}
};
testbbox = {{0.6, 0.9}, {1.1, 1.4}, {0.6, 0.9}};

Create the data structure:

aabbtree = AABBTree[bboxes];

Find the indices of the bounding boxes that overlap testbbox:

overlaps = OverlappingBBoxes[aabbtree, testbbox]
(* {2, 3} *)

Visualize (query box in blue, overlapping in red, disjoint in green):

Graphics3D[{
  {FaceForm[Green], Cuboid @@@ Transpose[Delete[bboxes, Partition[overlaps, 1]], {1, 3, 2}]},
  {FaceForm[Red], Cuboid @@@ Transpose[bboxes[[overlaps]], {1, 3, 2}]},
  {Blue, Cuboid @@ Transpose[testbbox]}
 }]

4000 bounding boxes, 1 query

randomBBox[n_] :=
  With[{llcorners = RandomReal[{-10, 10}, {n, 3}]},
    Transpose[{llcorners, llcorners + RandomReal[{1, 1}, {n, 3}]}, {3, 1, 2}]
  ]

SeedRandom[1];
bboxes = randomBBox[4000];
testbbox = randomBBox[1][[1]];

aabbtree = AABBTree[bboxes]; // AbsoluteTiming
(* {0.018875, Null} *)

(overlaps = OverlappingBBoxes[aabbtree, testbbox]) // AbsoluteTiming
(* {0.000692, {1817, 3935, 878, 87}} *)

Graphics3D[Cuboid @@@ Transpose[bboxes, {1, 3, 2}]]

Graphics3D[{
  {FaceForm[Green], Cuboid @@@ Transpose[Delete[bboxes, Partition[overlaps, 1]], {1, 3, 2}]},
  {FaceForm[Red], Cuboid @@@ Transpose[bboxes[[overlaps]], {1, 3, 2}]},
  {Blue, Cuboid @@ Transpose[testbbox]}},
 PlotRange -> testbbox,
 PlotRangePadding -> Scaled[0.25]
]

100000 bounding boxes, 100000 queries

SeedRandom[1];
bboxes = randomBBox[100000];
testbboxes = randomBBox[100000];

aabbtree = AABBTree[bboxes]; // AbsoluteTiming
(* {0.235465, Null} *)

OverlappingBBoxes[aabbtree, testbboxes[[1]]]; // AbsoluteTiming
(* {0.001988, Null} *)

I suspect this can be sped up substantially:

Scan[OverlappingBBoxes[aabbtree, #]&, testbboxes]; // AbsoluteTiming
(* {44.8445, Null} *)

Code

AABBTree[bboxes_] :=
    Block[{$RecursionLimit = ∞},
       iAABBTree[1, MapIndexed[Join, Flatten /@ bboxes][[Ordering[bboxes[[All, 1, 1]]]]]]
    ]

iAABBTree[d_, bboxestagged_] :=
    Block[{len, n, x, Sleft, Scent, Sright, node},
       len = Length[bboxestagged];
       If[len <= 16, Return[centerNode3D[d, Null, bboxestagged]]];

       n = Ordering[bboxestagged[[All, 2d-1]], {Quotient[len+1, 2]}][[1]];
       x = Mean[bboxestagged[[n, 2d-1 ;; 2d]]];

       {Sleft, Scent, Sright} = partitionIntervals3D[d, bboxestagged, x];

       node = centerNode3D[d, x, Scent];
       If[Length[Sleft] > 0, node["SetLeft", iAABBTree[Mod[d+1, 3, 1], Sleft]];];
       If[Length[Sright] > 0, node["SetRight", iAABBTree[Mod[d+1, 3, 1], Sright]];];

       node
    ]

centerNode3D[d_, x_, {}] := CreateDataStructure["BinaryTree", {x, {}, {}, {}, {}, {}, {}, {}, -$MaxMachineNumber, $MaxMachineNumber, d}]

(* {center ordinate, xmins, ymins, zmins, xmaxs, ymaxs, zmaxs, inds, dmin, dmax, d} *)
(* xmins are the xmin values of all bboxes in this node, ymins, etc follow suit *)
(* inds are the respective indices of the bboxes in this node *)
(* dmin & dmax are the bounds of all bounding boxes in dimension d (the splitting dimension) *)
centerNode3D[d_, x_, intstagged_] := 
    CreateDataStructure["BinaryTree", 
       {
         x, 
         #1, #2, 
         #3, #4, 
         #5, #6, 
         Developer`ToPackedArray[#7, Integer], 
         Switch[d, 1, #1[[1]], 2, Min[#3], 3, Min[#5]], Max[Switch[d, 1, #2, 2, #4, 3, #6]], 
         d
       }& @@ Transpose[intstagged]
    ]

partitionIntervals3D[d_, intstagged_, x_] :=
    Block[{data, spec},
       data = Partition[partitionIntervals3DC[d, intstagged, x], 7];
       spec = Round[data[[-1]]];

       {data[[1 ;; spec[[1]]]], data[[spec[[1]]+1 ;; spec[[2]]]], data[[spec[[2]]+1 ;; -2]]}
    ]

partitionIntervals3DC = Compile[{{d, _Integer}, {intstagged, _Real, 2}, {x, _Real}},
    Block[{mnp, mxp, cright = 0., ccenter = 0., bagright = Internal`Bag[Most[{0.}]], bagcenter = Internal`Bag[Most[{0.}]], bagleft = Internal`Bag[Most[{0.}]]},
       mnp = 2d-1;
       mxp = mnp+1;
       Do[
         If[Compile`GetElement[i, mxp] < x,
          Internal`StuffBag[bagright, i, 1];
          cright++,
          If[Compile`GetElement[i, mnp] > x,
              Internal`StuffBag[bagleft, i, 1],
              Internal`StuffBag[bagcenter, i, 1];
              ccenter++
          ]
         ],
         {i, intstagged}
       ];

       Join[Internal`BagPart[bagright, All], Internal`BagPart[bagcenter, All], Internal`BagPart[bagleft, All], {cright, cright + ccenter, 0., 0., 0., 0., 0.}]
    ],
    CompilationTarget -> "C",
    RuntimeOptions -> "Speed"
];

OverlappingBBoxes[tree_?DataStructureQ, bbox:{{x1_, x2_}, {y1_, y2_}, {z1_, z2_}}] :=
    Block[{bag, stack, node, a, b, d1, d2, d3, d4, d5, d6, d7, d8, d9, d10, d11},
       bag = Internal`Bag[];
       stack = CreateDataStructure["Stack"];
       stack["Push", tree];

       While[!stack["EmptyQ"],
         node = stack["Pop"];
         {d1, d2, d3, d4, d5, d6, d7, d8, d9, d10, d11} = node["Data"];
         {a, b} = bbox[[d11]];
         If[Length[d2] > 0 && (d9 <= a <= d10 || d9 <= b <= d10 || a <= d9 <= d10 <= b),
          Internal`StuffBag[bag, Pick[d8, bboxOverlaps[d2, d3, d4, d5, d6, d7, x1, x2, y1, y2, z1, z2], 1], 1];
         ];

         If[d1 =!= Null, 
          If[a < d1 && !node["LeftNullQ"], stack["Push", node["Left"]]];
          If[d1 < b && !node["RightNullQ"], stack["Push", node["Right"]]];
         ];
       ];

       Internal`BagPart[bag, All]
    ]

bboxOverlaps = Compile[{{epx1, _Real}, {epx2, _Real}, {epy1, _Real}, {epy2, _Real}, {epz1, _Real}, {epz2, _Real}, {x1, _Real}, {x2, _Real}, {y1, _Real}, {y2, _Real}, {z1, _Real}, {z2, _Real}}, 
    Boole @ And[
       epx1 <= x1 <= epx2 || epx1 <= x2 <= epx2 || (x1 <= epx1 <= epx2 <= x2),
       epy1 <= y1 <= epy2 || epy1 <= y2 <= epy2 || (y1 <= epy1 <= epy2 <= y2),
       epz1 <= z1 <= epz2 || epz1 <= z2 <= epz2 || (z1 <= epz1 <= epz2 <= z2)
    ],
    CompilationTarget -> "C",
    Parallelization -> True,
    RuntimeOptions -> "Speed",
    RuntimeAttributes -> {Listable}
];
POSTED BY: Greg Hurst
11 Replies
Posted 1 year ago

[ This will probably be my final update to the code, it's gotten quite fast! ]

With a few more improvements, the call to OverlappingBBoxes mapped over the 100000 testbboxes went from 44.85 seconds now down to 0.68 seconds (a 65x speedup).

This means:

  • In this particular example, it takes my machine 6.8 μs per query box to find intersecting bboxes out of 100000 possibilities.
  • In this example, the 100000 test bboxes have 9269710 intersections all together, so that's 74 nanoseconds per intersection.
  • My machine runs at 3.5 GHz, so back of the envelope calculation gives ~260 CPU instructions per intersection over this dataset.

My final improvements:

  • Use "BinaryTree"::["PackedArray"::["Real64", 1]] instead of "BinaryTree"::["InertExpression"] to avoid some casting.
  • Avoid pushing to the stack when at least one binary tree child is Null.
  • Ditch "DynamicArray" all together in favor of a simple implementation.
    • I don't know why the built in "DynamicArray" is much slower. Maybe a bunch of checks that can somehow be turned off?
  • Ditch "Stack" and instead use "LinkedList" and treat it as a stack instead.
    • Don't know why the built in "Stack" is so slow compared to "LinkedList".
  • Discovered Native`SetArrayElementUnary and Native`GetPartBinary, but I don't know if that gave me any performance boosts.

Take aways from this endeavor:

  • The new compiler is amazing and I will be using it often from here on out.
  • I love how TargetSystem -> All lets you compile code for different operating systems.
  • Documentation is extremely limited and I was very frustrated learning how to use FunctionCompile
    • I feel I have a good understanding now, but I did so inspecting the source code in the Components folder and so
      • Who knows what I incorrectly learned
      • Who knows what I overlooked
      • Who knows what undocumented things I discovered will break in future versions
  • I still think this code can be sped up more, given that ditching "DynamicArray" and "Stack" sped things up. Perhaps a simple implementation of "LinkedList" and "BinaryTree" would also show a performance boost.
  • Multi-threaded operations look possible but I didn't look into them.

POSTED BY: Greg Hurst
Posted 1 year ago

I was able to refactor the code to get OverlappingBBoxes mapped over the 100000 testbboxes to go from 44.85 seconds down to 1.3 seconds (a 34.5x speedup). I suspect there still room for improvement.

Code attached this time rather than being pasted in the post.

Improvements:

  • I changed the flow of my code a bit:
    • Place bboxes that overlap with the splitting plane into the left node (as opposed to keeping separate).
    • For bboxes in a node, sort them by their splittingdim+1 dimension. So now in OverlappingBBoxes we can binary search and quickly eliminate ~50% bboxes to manually check.

In lieu of proper documentation I found what I believe to be the source code at

FileNameJoin[{$InstallationDirectory, "SystemFiles", "Components", "Compile"}]

Based on what I saw:

  • I think Native`UncheckedBlock is similar to "RuntimeOptions" -> "Speed" in Compile.
    • With help from a 2019 WTC presentation I was able to inspect the IR with Compile`CompileToIR and noticed tons of check were omitted when this was used.
    • Not all checks are omitted with this though. NullNode exceptions in binary trees are still checked for example.
    • Is there any other way to turn off more checks?
  • Based off the MSE answer here we have some control over what gets inlined.
    • Based on FunctionInlineInformation.m in the source code, we can use the option "InlinePolicy" -> "MaximumCallCount" -> 1 to inline a function that is only called in one spot.
    • I don't know if functions are inlined at a later step, so perhaps this is superfluous.
  • list[[i]] seems to avoid bounds checks with Native`UncheckedBlock, but multi arg Part does not.
    • Looks like with UncheckedBlock, unary Part gets converted to

      Native`Unchecked[Compile`GetArrayElement]
      

but multi arg Part does not. * I refactored my AABB node to store a flat list rather than a matrix.

Attachments:
POSTED BY: Greg Hurst
Posted 1 year ago
POSTED BY: Greg Hurst

That seems like a valid use of TypeDeclaration. However, I've myself not worked yet with the "Product" type in any significant examples so I can't comment if it is a suitable use for that type.

I'm not sure why it's that bit slower either, but the returned value you get I think is valid. What's failed I think is the attempt to get its full form. This is the same behavior you commented separately with the other questions (point no. 2). I tried to answer it there too

Posted 1 year ago
POSTED BY: Greg Hurst
Posted 1 year ago

Here are some questions I gathered while working through compiling this code. If these warrant a new post, please let me know!

  • Are DataStructures in compiled code documented? I saw zero mention / examples of it other than a mention of Typed on each data structure ref page: Typed[x,"BinaryTree"] give x the type "BinaryTree".
  • Is it possible to return a binary tree that is not "BinaryTree"::["InertExpression"]? For example I tried returning "BinaryTree"::["PackedArray"["Real64", 2]] and the return value was of the form DataStructure[some internal symbol, $Failed]. Code compiled fine though.
  • Are not all data structure methods supported in the compiler? For the binary tree I tried node["SetLeft", expr] and the compiler told me "SetLeft" is an unknown method. Luckily in this case BinaryTree`SetLeft[node, expr] works just fine.
  • When would I use ExtensibleVector v.s. DynamicArray? The functionality of DynamicArray seems to be a proper subset of ExtensibleVector, so why would I ever use DynamicArray? Performance perhaps?
  • If ExtensibleVector / DynamicArray are storing integers only, will the footprint be as efficient as having a packed array? Seems I can treat the "Elements" as "ListVector" or "PackedArray" seamlessly, but I don't know if I'm copying data when doing so.
  • Is it possible to return a list of elements of different types? "ListVector" can return a ragged list as long as all elements are the same type.
  • Documentation of the compiler seems quite sparse.
    • Took me awhile to discover "Rule" is a compilable type, I didn't see it mentioned anywhere in the docs, other than a couple examples.
    • CompilerOptions ref page has little to no information in it. I found an MSE post where people found some info on it though.
    • All examples seem to be toy examples illustrating a certain feature. Those are great, but some 'real world' examples would be super helpful
POSTED BY: Greg Hurst
Posted 1 year ago

With the great nudge from @José Antonio Fernández I was able to compile the construction of the AABBTree and refined the OverlappingBBoxes function a bit more. Code posted below.

Example

Initialization:

randomBBox[n_] :=
  With[{llcorners = RandomReal[{-10, 10}, {n, 3}]},
    Transpose[{llcorners, llcorners + RandomReal[{1, 1}, {n, 3}]}, {3, 1, 2}]
  ]

SeedRandom[1];
bboxes = randomBBox[100000];
testbboxes = randomBBox[100000];

Construction used to take 0.235465 seconds (so a 2x speedup):

tree = AABBTree[bboxes]; // AbsoluteTiming
(* {0.107196, Null} *)

Testing all 100000 bboxes used to take 44.8445 seconds (so a 6.5x speedup):

Scan[OverlappingBBoxes[tree, #] &, testbboxes]; // AbsoluteTiming
(* {6.9374, Null} *)

Code

decl = FunctionDeclaration[
    centerNode3D,
    Typed[{"MachineInteger", "PackedArray"::["Real64", 2], "Real64"} -> "BinaryTree"] @ Function[{d, intstagged, x},
       Module[{preamble},
         preamble = ConstantArray[0., 7];
         preamble[[1]] = x;
         preamble[[2]] = Min[intstagged[[All, 2d-1]]];
         preamble[[3]] = Max[intstagged[[All, 2d]]];
         preamble[[4]] = Cast[d, "Real64"];
         CreateDataStructure["BinaryTree", Cast[Append[intstagged, preamble], "InertExpression"]]
       ]
    ]
];

pureAABBTree = Function[{Typed[bboxes, "PackedArray"::["Real64", 3]]},
    Module[{iAABBTree, bboxestagged},
       iAABBTree = Function[{Typed[d, "MachineInteger"], Typed[bboxestagged, "PackedArray"::["Real64", 2]]},
         Module[{len, n, x, mnp, mxp, loc, r=0, l=0, c=0, Sleft, Scent, Sright, node},
          len = Length[bboxestagged];

          n = Ordering[bboxestagged[[All, 2d-1]]][[Quotient[len+1, 2]]];
          x = bboxestagged[[n, 2d-1]];

          If[len <= 100, Return[centerNode3D[d, bboxestagged, x]]];

          mnp = 2d-1;
          mxp = mnp+1;

          loc = ConstantArray[0, Length[bboxestagged]];

          Do[
              If[bboxestagged[[i, mxp]] < x,
                 l++;
                 loc[[i]] = 1,
                 If[bboxestagged[[i, mnp]] <= x,
                   c++;
                   loc[[i]] = 2,
                   r++
                 ]
              ],
              {i, Length[bboxestagged]}
          ];

          Sleft = ConstantArray[0., {l, 7}];
          Scent = ConstantArray[0., {c, 7}];
          Sright = ConstantArray[0., {r, 7}];

          r = l = c = 1;
          Do[
              If[loc[[i]] === 0,
                 Sright[[r++]] = bboxestagged[[i]];,
                 If[loc[[i]] === 1,
                   Sleft[[l++]] = bboxestagged[[i]],
                   Scent[[c++]] = bboxestagged[[i]];
                 ]
              ],
              {i, Length[bboxestagged]}
          ];

          node = centerNode3D[d, Scent, x];

          If[l > 1, BinaryTree`SetLeft[node, iAABBTree[Mod[d+1, 3, 1], Sleft]]];
          If[r > 1, BinaryTree`SetRight[node, iAABBTree[Mod[d+1, 3, 1], Sright]]];

          node
         ]
       ];

       bboxestagged = Table[Append[Flatten[bboxes[[i]]], Cast[i, "Real64"]], {i, Length[bboxes]}][[Ordering[bboxes[[All, 1, 1]]]]];

       iAABBTree[1, bboxestagged]
    ]
];

declbbox = FunctionDeclaration[
    bboxOverlaps, 
    Typed[ConstantArray["Real64", 12] -> "Boolean"] @ Function[
       {epx1, epx2, epy1, epy2, epz1, epz2, x1, x2, y1, y2, z1, z2}, 
       And[
         epx1 <= x1 <= epx2 || epx1 <= x2 <= epx2 || (x1 <= epx1 <= epx2 <= x2),
         epy1 <= y1 <= epy2 || epy1 <= y2 <= epy2 || (y1 <= epy1 <= epy2 <= y2),
         epz1 <= z1 <= epz2 || epz1 <= z2 <= epz2 || (z1 <= epz1 <= epz2 <= z2)
       ]
    ]
];

pureOverlappingBBoxes = Function[
    {
       Typed[tree, "BinaryTree"], 
       Typed[bbox, "PackedArray"::["Real64", 2]]
    },
    Block[{inds, stack, x1, x2, y1, y2, z1, z2, node, a, b, data, x, min, max, d},
       inds = TypeHint[CreateDataStructure["ExtensibleVector"], "ExtensibleVector"::["MachineInteger"]];
       stack = TypeHint[CreateDataStructure["Stack"], "Stack"::["BinaryTree"]];
       stack["Push", tree];

       x1 = bbox[[1, 1]]; x2 = bbox[[1, 2]];
       y1 = bbox[[2, 1]]; y2 = bbox[[2, 2]];
       z1 = bbox[[3, 1]]; z2 = bbox[[3, 2]];

       While[!stack["EmptyQ"],
         node = stack["Pop"];
         data = Cast[node["Data"], "PackedArray"::["Real64", 2]];
         If[Length[data] <= 1, Continue[]];
         x = data[[-1, 1]];
         min = data[[-1, 2]];
         max = data[[-1, 3]];
         d = Round[data[[-1, 4]]];
         a = bbox[[d, 1]];
         b = bbox[[d, 2]];
         If[min <= a <= max || min <= b <= max || (a <= min && max <= b),
          Do[
              If[bboxOverlaps[data[[i, 1]], data[[i, 2]], data[[i, 3]], data[[i, 4]], data[[i, 5]], data[[i, 6]], x1, x2, y1, y2, z1, z2],
                 inds["Append", Round[data[[i, 7]]]];
              ],
              {i, Length[data]-1}
          ]
         ];

         If[a < x && !node["LeftNullQ"], stack["Push", node["Left"]]];
         If[x < b && !node["RightNullQ"], stack["Push", node["Right"]]];
       ];

       inds["Elements"]
    ]
];

AABBTree = 
  FunctionCompile[decl, pureAABBTree, 
   CompilerOptions -> {"AbortHandling" -> False, 
     "LLVMOptimization" -> "ClangOptimization"[3], 
     "OptimizationLevel" -> 3}];

OverlappingBBoxes = 
  FunctionCompile[declbbox, pureOverlappingBBoxes, 
   CompilerOptions -> {"AbortHandling" -> False, 
     "LLVMOptimization" -> "ClangOptimization"[3], 
     "OptimizationLevel" -> 3}];
POSTED BY: Greg Hurst
Posted 1 year ago
POSTED BY: Greg Hurst
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard