Message Boards Message Boards

6
|
2084 Views
|
11 Replies
|
39 Total Likes
View groups...
Share
Share this post:

Can Wolfram's new compiler technology speed up this AABB tree code?

Posted 7 months ago

An axes aligned bounding box (AABB) tree, a type of bounding volume hierarchy, is a spatial data structure designed to efficiently find overlapping bounding boxes. It is often used to find broad-level collisions in applications such as collision detection and ray tracing.

Question

I have an implementation in WL that uses some Compile and DataStructure, but is mostly top-level WL code. I'm looking to increase performance, while staying in the WL ecosystem.

Are there any tips to speed up my code? In particular any tips to port this over to compiled code using FunctionCompile & friends? Side note, would be nice if this was a builtin DataStructure :)

I've posted the code below. Feel free to ask any questions about how it works, etc.

Examples

Basic example

Bounding boxes are of the form $\{\{x_{min}, x_{max}\}, \{y_{min}, y_{max}\}, \{z_{min}, z_{max}\}\}$.

Let's create the AABB tree over 3 bounding boxes and make 1 query bounding box:

bboxes = {
  {{0, 1}, {0, 1}, {0, 1}},
  {{0.5, 1.5}, {0.25, 1.5}, {0.3, 0.7}}, 
  {{0.7, 0.8}, {0.8, 2}, {0.2, 0.8}}
};
testbbox = {{0.6, 0.9}, {1.1, 1.4}, {0.6, 0.9}};

Create the data structure:

aabbtree = AABBTree[bboxes];

Find the indices of the bounding boxes that overlap testbbox:

overlaps = OverlappingBBoxes[aabbtree, testbbox]
(* {2, 3} *)

Visualize (query box in blue, overlapping in red, disjoint in green):

Graphics3D[{
  {FaceForm[Green], Cuboid @@@ Transpose[Delete[bboxes, Partition[overlaps, 1]], {1, 3, 2}]},
  {FaceForm[Red], Cuboid @@@ Transpose[bboxes[[overlaps]], {1, 3, 2}]},
  {Blue, Cuboid @@ Transpose[testbbox]}
 }]

4000 bounding boxes, 1 query

randomBBox[n_] :=
  With[{llcorners = RandomReal[{-10, 10}, {n, 3}]},
    Transpose[{llcorners, llcorners + RandomReal[{1, 1}, {n, 3}]}, {3, 1, 2}]
  ]

SeedRandom[1];
bboxes = randomBBox[4000];
testbbox = randomBBox[1][[1]];

aabbtree = AABBTree[bboxes]; // AbsoluteTiming
(* {0.018875, Null} *)

(overlaps = OverlappingBBoxes[aabbtree, testbbox]) // AbsoluteTiming
(* {0.000692, {1817, 3935, 878, 87}} *)

Graphics3D[Cuboid @@@ Transpose[bboxes, {1, 3, 2}]]

Graphics3D[{
  {FaceForm[Green], Cuboid @@@ Transpose[Delete[bboxes, Partition[overlaps, 1]], {1, 3, 2}]},
  {FaceForm[Red], Cuboid @@@ Transpose[bboxes[[overlaps]], {1, 3, 2}]},
  {Blue, Cuboid @@ Transpose[testbbox]}},
 PlotRange -> testbbox,
 PlotRangePadding -> Scaled[0.25]
]

100000 bounding boxes, 100000 queries

SeedRandom[1];
bboxes = randomBBox[100000];
testbboxes = randomBBox[100000];

aabbtree = AABBTree[bboxes]; // AbsoluteTiming
(* {0.235465, Null} *)

OverlappingBBoxes[aabbtree, testbboxes[[1]]]; // AbsoluteTiming
(* {0.001988, Null} *)

I suspect this can be sped up substantially:

Scan[OverlappingBBoxes[aabbtree, #]&, testbboxes]; // AbsoluteTiming
(* {44.8445, Null} *)

Code

AABBTree[bboxes_] :=
    Block[{$RecursionLimit = ∞},
       iAABBTree[1, MapIndexed[Join, Flatten /@ bboxes][[Ordering[bboxes[[All, 1, 1]]]]]]
    ]

iAABBTree[d_, bboxestagged_] :=
    Block[{len, n, x, Sleft, Scent, Sright, node},
       len = Length[bboxestagged];
       If[len <= 16, Return[centerNode3D[d, Null, bboxestagged]]];

       n = Ordering[bboxestagged[[All, 2d-1]], {Quotient[len+1, 2]}][[1]];
       x = Mean[bboxestagged[[n, 2d-1 ;; 2d]]];

       {Sleft, Scent, Sright} = partitionIntervals3D[d, bboxestagged, x];

       node = centerNode3D[d, x, Scent];
       If[Length[Sleft] > 0, node["SetLeft", iAABBTree[Mod[d+1, 3, 1], Sleft]];];
       If[Length[Sright] > 0, node["SetRight", iAABBTree[Mod[d+1, 3, 1], Sright]];];

       node
    ]

centerNode3D[d_, x_, {}] := CreateDataStructure["BinaryTree", {x, {}, {}, {}, {}, {}, {}, {}, -$MaxMachineNumber, $MaxMachineNumber, d}]

(* {center ordinate, xmins, ymins, zmins, xmaxs, ymaxs, zmaxs, inds, dmin, dmax, d} *)
(* xmins are the xmin values of all bboxes in this node, ymins, etc follow suit *)
(* inds are the respective indices of the bboxes in this node *)
(* dmin & dmax are the bounds of all bounding boxes in dimension d (the splitting dimension) *)
centerNode3D[d_, x_, intstagged_] := 
    CreateDataStructure["BinaryTree", 
       {
         x, 
         #1, #2, 
         #3, #4, 
         #5, #6, 
         Developer`ToPackedArray[#7, Integer], 
         Switch[d, 1, #1[[1]], 2, Min[#3], 3, Min[#5]], Max[Switch[d, 1, #2, 2, #4, 3, #6]], 
         d
       }& @@ Transpose[intstagged]
    ]

partitionIntervals3D[d_, intstagged_, x_] :=
    Block[{data, spec},
       data = Partition[partitionIntervals3DC[d, intstagged, x], 7];
       spec = Round[data[[-1]]];

       {data[[1 ;; spec[[1]]]], data[[spec[[1]]+1 ;; spec[[2]]]], data[[spec[[2]]+1 ;; -2]]}
    ]

partitionIntervals3DC = Compile[{{d, _Integer}, {intstagged, _Real, 2}, {x, _Real}},
    Block[{mnp, mxp, cright = 0., ccenter = 0., bagright = Internal`Bag[Most[{0.}]], bagcenter = Internal`Bag[Most[{0.}]], bagleft = Internal`Bag[Most[{0.}]]},
       mnp = 2d-1;
       mxp = mnp+1;
       Do[
         If[Compile`GetElement[i, mxp] < x,
          Internal`StuffBag[bagright, i, 1];
          cright++,
          If[Compile`GetElement[i, mnp] > x,
              Internal`StuffBag[bagleft, i, 1],
              Internal`StuffBag[bagcenter, i, 1];
              ccenter++
          ]
         ],
         {i, intstagged}
       ];

       Join[Internal`BagPart[bagright, All], Internal`BagPart[bagcenter, All], Internal`BagPart[bagleft, All], {cright, cright + ccenter, 0., 0., 0., 0., 0.}]
    ],
    CompilationTarget -> "C",
    RuntimeOptions -> "Speed"
];

OverlappingBBoxes[tree_?DataStructureQ, bbox:{{x1_, x2_}, {y1_, y2_}, {z1_, z2_}}] :=
    Block[{bag, stack, node, a, b, d1, d2, d3, d4, d5, d6, d7, d8, d9, d10, d11},
       bag = Internal`Bag[];
       stack = CreateDataStructure["Stack"];
       stack["Push", tree];

       While[!stack["EmptyQ"],
         node = stack["Pop"];
         {d1, d2, d3, d4, d5, d6, d7, d8, d9, d10, d11} = node["Data"];
         {a, b} = bbox[[d11]];
         If[Length[d2] > 0 && (d9 <= a <= d10 || d9 <= b <= d10 || a <= d9 <= d10 <= b),
          Internal`StuffBag[bag, Pick[d8, bboxOverlaps[d2, d3, d4, d5, d6, d7, x1, x2, y1, y2, z1, z2], 1], 1];
         ];

         If[d1 =!= Null, 
          If[a < d1 && !node["LeftNullQ"], stack["Push", node["Left"]]];
          If[d1 < b && !node["RightNullQ"], stack["Push", node["Right"]]];
         ];
       ];

       Internal`BagPart[bag, All]
    ]

bboxOverlaps = Compile[{{epx1, _Real}, {epx2, _Real}, {epy1, _Real}, {epy2, _Real}, {epz1, _Real}, {epz2, _Real}, {x1, _Real}, {x2, _Real}, {y1, _Real}, {y2, _Real}, {z1, _Real}, {z2, _Real}}, 
    Boole @ And[
       epx1 <= x1 <= epx2 || epx1 <= x2 <= epx2 || (x1 <= epx1 <= epx2 <= x2),
       epy1 <= y1 <= epy2 || epy1 <= y2 <= epy2 || (y1 <= epy1 <= epy2 <= y2),
       epz1 <= z1 <= epz2 || epz1 <= z2 <= epz2 || (z1 <= epz1 <= epz2 <= z2)
    ],
    CompilationTarget -> "C",
    Parallelization -> True,
    RuntimeOptions -> "Speed",
    RuntimeAttributes -> {Listable}
];
POSTED BY: Greg Hurst
11 Replies
Posted 7 months ago

[ This will probably be my final update to the code, it's gotten quite fast! ]

With a few more improvements, the call to OverlappingBBoxes mapped over the 100000 testbboxes went from 44.85 seconds now down to 0.68 seconds (a 65x speedup).

This means:

  • In this particular example, it takes my machine 6.8 μs per query box to find intersecting bboxes out of 100000 possibilities.
  • In this example, the 100000 test bboxes have 9269710 intersections all together, so that's 74 nanoseconds per intersection.
  • My machine runs at 3.5 GHz, so back of the envelope calculation gives ~260 CPU instructions per intersection over this dataset.

My final improvements:

  • Use "BinaryTree"::["PackedArray"::["Real64", 1]] instead of "BinaryTree"::["InertExpression"] to avoid some casting.
  • Avoid pushing to the stack when at least one binary tree child is Null.
  • Ditch "DynamicArray" all together in favor of a simple implementation.
    • I don't know why the built in "DynamicArray" is much slower. Maybe a bunch of checks that can somehow be turned off?
  • Ditch "Stack" and instead use "LinkedList" and treat it as a stack instead.
    • Don't know why the built in "Stack" is so slow compared to "LinkedList".
  • Discovered Native`SetArrayElementUnary and Native`GetPartBinary, but I don't know if that gave me any performance boosts.

Take aways from this endeavor:

  • The new compiler is amazing and I will be using it often from here on out.
  • I love how TargetSystem -> All lets you compile code for different operating systems.
  • Documentation is extremely limited and I was very frustrated learning how to use FunctionCompile
    • I feel I have a good understanding now, but I did so inspecting the source code in the Components folder and so
      • Who knows what I incorrectly learned
      • Who knows what I overlooked
      • Who knows what undocumented things I discovered will break in future versions
  • I still think this code can be sped up more, given that ditching "DynamicArray" and "Stack" sped things up. Perhaps a simple implementation of "LinkedList" and "BinaryTree" would also show a performance boost.
  • Multi-threaded operations look possible but I didn't look into them.

POSTED BY: Greg Hurst
Posted 7 months ago

I was able to refactor the code to get OverlappingBBoxes mapped over the 100000 testbboxes to go from 44.85 seconds down to 1.3 seconds (a 34.5x speedup). I suspect there still room for improvement.

Code attached this time rather than being pasted in the post.

Improvements:

  • I changed the flow of my code a bit:
    • Place bboxes that overlap with the splitting plane into the left node (as opposed to keeping separate).
    • For bboxes in a node, sort them by their splittingdim+1 dimension. So now in OverlappingBBoxes we can binary search and quickly eliminate ~50% bboxes to manually check.

In lieu of proper documentation I found what I believe to be the source code at

FileNameJoin[{$InstallationDirectory, "SystemFiles", "Components", "Compile"}]

Based on what I saw:

  • I think Native`UncheckedBlock is similar to "RuntimeOptions" -> "Speed" in Compile.
    • With help from a 2019 WTC presentation I was able to inspect the IR with Compile`CompileToIR and noticed tons of check were omitted when this was used.
    • Not all checks are omitted with this though. NullNode exceptions in binary trees are still checked for example.
    • Is there any other way to turn off more checks?
  • Based off the MSE answer here we have some control over what gets inlined.
    • Based on FunctionInlineInformation.m in the source code, we can use the option "InlinePolicy" -> "MaximumCallCount" -> 1 to inline a function that is only called in one spot.
    • I don't know if functions are inlined at a later step, so perhaps this is superfluous.
  • list[[i]] seems to avoid bounds checks with Native`UncheckedBlock, but multi arg Part does not.
    • Looks like with UncheckedBlock, unary Part gets converted to

      Native`Unchecked[Compile`GetArrayElement]
      

but multi arg Part does not. * I refactored my AABB node to store a flat list rather than a matrix.

Attachments:
POSTED BY: Greg Hurst
Posted 7 months ago

I tried to create a 'more natural' AABB tree node using TypeDeclaration. This ended up giving a 2x slowdown during construction -- not sure if I used TypeDeclaration properly.

Basically it makes more sense to tag the data in the node, e.g.

<|"splitloc" -> "Real64", "splitdim" -> "MachineInteger", "min" -> "Real64", 
  "max" -> "Real64", "indices" -> "PackedArray"::["MachineInteger", 1]|>

The return value is also invalid (I think):

tree = AABBTree[bboxes]; // AbsoluteTiming
(* {0.237855, Null} *)

tree // FullForm
(* DataStructure["BinaryTree`AABBNode",$Failed] *)

Code

nodedec = TypeDeclaration["Product", "AABBNode", <|"splitloc" -> "Real64", "splitdim" -> "MachineInteger", "min" -> "Real64", "max" -> "Real64", "indices" -> "PackedArray"::["MachineInteger", 1]|>];

$btype = "BinaryTree"::["AABBNode"];

With[{$btype = $btype, $max = $MaxMachineNumber, $min = -$MaxMachineNumber},
decl2 = FunctionDeclaration[
    centerNode3D,
    Typed[{"MachineInteger", "PackedArray"::["Real64", 3], "PackedArray"::["MachineInteger", 1], "Real64"} -> $btype] @ Function[{d, bboxes, inds, x},
       Module[{min, max, data},
         min = If[Length[inds] > 0, Min[bboxes[[inds, d, 1]]], $max];
         max = If[Length[inds] > 0, Max[bboxes[[inds, d, 2]]], $min];
         data = CreateTypeInstance["AABBNode", <|"splitloc" -> x, "splitdim" -> d, "min" -> min, "max" -> max, "indices" -> inds|>];
         CreateDataStructure["BinaryTree", data]
       ]
    ]
]];

With[{$btype = $btype, $intvec = "PackedArray"::["MachineInteger", 1]},
pureAABBTree = Function[{Typed[bboxes, "PackedArray"::["Real64", 3]]},
    Module[{iAABBTree, inds},
       iAABBTree = Function[{Typed[d, "MachineInteger"], Typed[inds, $intvec]},
         Module[{len, n, x, loc, r=0, l=0, c=0, j, Sleft, Scent, Sright, node},
          len = Length[inds];

          n = Ordering[bboxes[[inds, d, 1]]][[Quotient[len+1, 2]]];
          x = bboxes[[inds[[n]], d, 1]];

          If[len <= 100, Return[TypeHint[centerNode3D[d, bboxes, inds, x], $btype]]];

          loc = ConstantArray[0, len];

          Do[
              j = inds[[i]];
              If[bboxes[[j, d, 1]] < x,
                 l++;
                 loc[[i]] = 1,
                 If[bboxes[[j, d, 2]] > x,
                   r++;,
                   c++;
                   loc[[i]] = 2
                 ]
              ],
              {i, len}
          ];

          Sleft = ConstantArray[0, l];
          Scent = ConstantArray[0, c];
          Sright = ConstantArray[0, r];

          r = l = c = 1;
          Do[
              j = inds[[i]];
              If[loc[[i]] === 0,
                 Sright[[r++]] = j;,
                 If[loc[[i]] === 1,
                   Sleft[[l++]] = j,
                   Scent[[c++]] = j;
                 ]
              ],
              {i, len}
          ];

          node = TypeHint[centerNode3D[d, bboxes, Scent, x], $btype];

          If[l > 1, BinaryTree`SetLeft[node, iAABBTree[Mod[d+1, 3, 1], Sleft]]];
          If[r > 1, BinaryTree`SetRight[node, iAABBTree[Mod[d+1, 3, 1], Sright]]];

          TypeHint[node, $btype]
         ]
       ];

       inds = Typed[KernelFunction[InversePermutation], {$intvec} -> $intvec][Ordering[bboxes[[All, 1, 1]]]];

       TypeHint[iAABBTree[1, inds], $btype]
    ]
]];

AABBTree = 
 FunctionCompile[{nodedec, decl2}, pureAABBTree, 
  CompilerOptions -> {"AbortHandling" -> False, 
    "LLVMOptimization" -> "ClangOptimization"[3], 
    "OptimizationLevel" -> 3}];
POSTED BY: Greg Hurst

That seems like a valid use of TypeDeclaration. However, I've myself not worked yet with the "Product" type in any significant examples so I can't comment if it is a suitable use for that type.

I'm not sure why it's that bit slower either, but the returned value you get I think is valid. What's failed I think is the attempt to get its full form. This is the same behavior you commented separately with the other questions (point no. 2). I tried to answer it there too

Posted 7 months ago

Here are some questions I gathered while working through compiling this code. If these warrant a new post, please let me know!

  • Are DataStructures in compiled code documented? I saw zero mention / examples of it other than a mention of Typed on each data structure ref page: Typed[x,"BinaryTree"] give x the type "BinaryTree".
  • Is it possible to return a binary tree that is not "BinaryTree"::["InertExpression"]? For example I tried returning "BinaryTree"::["PackedArray"["Real64", 2]] and the return value was of the form DataStructure[some internal symbol, $Failed]. Code compiled fine though.
  • Are not all data structure methods supported in the compiler? For the binary tree I tried node["SetLeft", expr] and the compiler told me "SetLeft" is an unknown method. Luckily in this case BinaryTree`SetLeft[node, expr] works just fine.
  • When would I use ExtensibleVector v.s. DynamicArray? The functionality of DynamicArray seems to be a proper subset of ExtensibleVector, so why would I ever use DynamicArray? Performance perhaps?
  • If ExtensibleVector / DynamicArray are storing integers only, will the footprint be as efficient as having a packed array? Seems I can treat the "Elements" as "ListVector" or "PackedArray" seamlessly, but I don't know if I'm copying data when doing so.
  • Is it possible to return a list of elements of different types? "ListVector" can return a ragged list as long as all elements are the same type.
  • Documentation of the compiler seems quite sparse.
    • Took me awhile to discover "Rule" is a compilable type, I didn't see it mentioned anywhere in the docs, other than a couple examples.
    • CompilerOptions ref page has little to no information in it. I found an MSE post where people found some info on it though.
    • All examples seem to be toy examples illustrating a certain feature. Those are great, but some 'real world' examples would be super helpful
POSTED BY: Greg Hurst

At the moment I only have an answer for two of these questions

  • Point no. 2: You can definitely return data structures of any compiled type, from compiled code (as the evaluator doesn't know about compiled types, only about expressions). The front end renders them differently on screen, and nothing seems to work with them outside compiled code. I think the expression you mention with a $Failed is just a failure when attempting to get the full form of the object, but it's perfectly valid for its purpose. I wonder if the failure to show the full form is meaningful and on purpose. Even if it's compiled stuff, you can for example successfully show the full form of a compiled function. I'll try to find out more

  • Point no. 6: I think the way to achieve that is with the type "InertExpression". It allows you to put anything inside. On the counterpart, some functions won't work if you apply them to elements of the type "InertExpression", so I understand this is one of the reasons why you wouldn't just make "ListVector"::["InertExpression"] all the time if you know you have uniform types. (There may be more reasons that I'd also like to know)

Maybe someone else could give an answer for the rest of points or I can try to find out more and ask

Posted 7 months ago

Thanks for the info.

For point 4 I think the answer is use DynamicArray when you are only appending, for performance reasons.

I found the source code for data structures in

FileNameJoin[{$InstallationDirectory, "SystemFiles", "Components", 
  "Compile", "TypeSystem", "Declarations", "SpecificTypes", 
  "DataStructures"}]

and by the looks of it, it just is performance reasons.

And for point 5, I think the data will be treated like packed, based on looking through the code.

POSTED BY: Greg Hurst
Posted 7 months ago

With the great nudge from @José Antonio Fernández I was able to compile the construction of the AABBTree and refined the OverlappingBBoxes function a bit more. Code posted below.

Example

Initialization:

randomBBox[n_] :=
  With[{llcorners = RandomReal[{-10, 10}, {n, 3}]},
    Transpose[{llcorners, llcorners + RandomReal[{1, 1}, {n, 3}]}, {3, 1, 2}]
  ]

SeedRandom[1];
bboxes = randomBBox[100000];
testbboxes = randomBBox[100000];

Construction used to take 0.235465 seconds (so a 2x speedup):

tree = AABBTree[bboxes]; // AbsoluteTiming
(* {0.107196, Null} *)

Testing all 100000 bboxes used to take 44.8445 seconds (so a 6.5x speedup):

Scan[OverlappingBBoxes[tree, #] &, testbboxes]; // AbsoluteTiming
(* {6.9374, Null} *)

Code

decl = FunctionDeclaration[
    centerNode3D,
    Typed[{"MachineInteger", "PackedArray"::["Real64", 2], "Real64"} -> "BinaryTree"] @ Function[{d, intstagged, x},
       Module[{preamble},
         preamble = ConstantArray[0., 7];
         preamble[[1]] = x;
         preamble[[2]] = Min[intstagged[[All, 2d-1]]];
         preamble[[3]] = Max[intstagged[[All, 2d]]];
         preamble[[4]] = Cast[d, "Real64"];
         CreateDataStructure["BinaryTree", Cast[Append[intstagged, preamble], "InertExpression"]]
       ]
    ]
];

pureAABBTree = Function[{Typed[bboxes, "PackedArray"::["Real64", 3]]},
    Module[{iAABBTree, bboxestagged},
       iAABBTree = Function[{Typed[d, "MachineInteger"], Typed[bboxestagged, "PackedArray"::["Real64", 2]]},
         Module[{len, n, x, mnp, mxp, loc, r=0, l=0, c=0, Sleft, Scent, Sright, node},
          len = Length[bboxestagged];

          n = Ordering[bboxestagged[[All, 2d-1]]][[Quotient[len+1, 2]]];
          x = bboxestagged[[n, 2d-1]];

          If[len <= 100, Return[centerNode3D[d, bboxestagged, x]]];

          mnp = 2d-1;
          mxp = mnp+1;

          loc = ConstantArray[0, Length[bboxestagged]];

          Do[
              If[bboxestagged[[i, mxp]] < x,
                 l++;
                 loc[[i]] = 1,
                 If[bboxestagged[[i, mnp]] <= x,
                   c++;
                   loc[[i]] = 2,
                   r++
                 ]
              ],
              {i, Length[bboxestagged]}
          ];

          Sleft = ConstantArray[0., {l, 7}];
          Scent = ConstantArray[0., {c, 7}];
          Sright = ConstantArray[0., {r, 7}];

          r = l = c = 1;
          Do[
              If[loc[[i]] === 0,
                 Sright[[r++]] = bboxestagged[[i]];,
                 If[loc[[i]] === 1,
                   Sleft[[l++]] = bboxestagged[[i]],
                   Scent[[c++]] = bboxestagged[[i]];
                 ]
              ],
              {i, Length[bboxestagged]}
          ];

          node = centerNode3D[d, Scent, x];

          If[l > 1, BinaryTree`SetLeft[node, iAABBTree[Mod[d+1, 3, 1], Sleft]]];
          If[r > 1, BinaryTree`SetRight[node, iAABBTree[Mod[d+1, 3, 1], Sright]]];

          node
         ]
       ];

       bboxestagged = Table[Append[Flatten[bboxes[[i]]], Cast[i, "Real64"]], {i, Length[bboxes]}][[Ordering[bboxes[[All, 1, 1]]]]];

       iAABBTree[1, bboxestagged]
    ]
];

declbbox = FunctionDeclaration[
    bboxOverlaps, 
    Typed[ConstantArray["Real64", 12] -> "Boolean"] @ Function[
       {epx1, epx2, epy1, epy2, epz1, epz2, x1, x2, y1, y2, z1, z2}, 
       And[
         epx1 <= x1 <= epx2 || epx1 <= x2 <= epx2 || (x1 <= epx1 <= epx2 <= x2),
         epy1 <= y1 <= epy2 || epy1 <= y2 <= epy2 || (y1 <= epy1 <= epy2 <= y2),
         epz1 <= z1 <= epz2 || epz1 <= z2 <= epz2 || (z1 <= epz1 <= epz2 <= z2)
       ]
    ]
];

pureOverlappingBBoxes = Function[
    {
       Typed[tree, "BinaryTree"], 
       Typed[bbox, "PackedArray"::["Real64", 2]]
    },
    Block[{inds, stack, x1, x2, y1, y2, z1, z2, node, a, b, data, x, min, max, d},
       inds = TypeHint[CreateDataStructure["ExtensibleVector"], "ExtensibleVector"::["MachineInteger"]];
       stack = TypeHint[CreateDataStructure["Stack"], "Stack"::["BinaryTree"]];
       stack["Push", tree];

       x1 = bbox[[1, 1]]; x2 = bbox[[1, 2]];
       y1 = bbox[[2, 1]]; y2 = bbox[[2, 2]];
       z1 = bbox[[3, 1]]; z2 = bbox[[3, 2]];

       While[!stack["EmptyQ"],
         node = stack["Pop"];
         data = Cast[node["Data"], "PackedArray"::["Real64", 2]];
         If[Length[data] <= 1, Continue[]];
         x = data[[-1, 1]];
         min = data[[-1, 2]];
         max = data[[-1, 3]];
         d = Round[data[[-1, 4]]];
         a = bbox[[d, 1]];
         b = bbox[[d, 2]];
         If[min <= a <= max || min <= b <= max || (a <= min && max <= b),
          Do[
              If[bboxOverlaps[data[[i, 1]], data[[i, 2]], data[[i, 3]], data[[i, 4]], data[[i, 5]], data[[i, 6]], x1, x2, y1, y2, z1, z2],
                 inds["Append", Round[data[[i, 7]]]];
              ],
              {i, Length[data]-1}
          ]
         ];

         If[a < x && !node["LeftNullQ"], stack["Push", node["Left"]]];
         If[x < b && !node["RightNullQ"], stack["Push", node["Right"]]];
       ];

       inds["Elements"]
    ]
];

AABBTree = 
  FunctionCompile[decl, pureAABBTree, 
   CompilerOptions -> {"AbortHandling" -> False, 
     "LLVMOptimization" -> "ClangOptimization"[3], 
     "OptimizationLevel" -> 3}];

OverlappingBBoxes = 
  FunctionCompile[declbbox, pureOverlappingBBoxes, 
   CompilerOptions -> {"AbortHandling" -> False, 
     "LLVMOptimization" -> "ClangOptimization"[3], 
     "OptimizationLevel" -> 3}];
POSTED BY: Greg Hurst

Dear Greg, I can make some comments on how I would approach it. I have started learning how to use the compiler not long ago, so please take them with some caution...

Comments:

  • I would focus on compiling the whole function "OverlappingBBoxes". You left out from the compilation the "While" loop, which should benefit the most

  • I would indeed start using "FunctionCompile" and the new functions. The difference would be that you need to put the body of your function defined as a pure function using "Function", and wrap with "Typed" all of the arguments. From these types, the inferencer will try to work out the types of everything else, but sometimes you may also need to type variables or expressions in the body in order to guide it

  • Check out the docs for the "Compiled Types" to see what you need to put in "Typed"

  • I would define the function "bboxOverlaps" separately, using "FunctionDeclaration", and include this declaration as an argument for "FunctionCompile" in "OverlappingBBoxes". This way you'll have the symbol "bboxOverlaps" available to use there

  • To my knowledge, FunctionCompile does not take the options "Parallelization", "RuntimeOptions", or "RuntimeAttributes", so I would remove them. In order to use the function "bboxOverlaps" in parallel, I'd map it using OpenMP`ParallelMap (which works pretty much like ParallelMap). This differs with respect to the approach of making the function Listable as you did. Now the function "bboxOverlaps" won't thread automatically over these lists and you'd have to use "Thread" yourself on the arguments. (I'm not sure if there's an alternative way to make it Listable and avoid this step)

Some additional comments:

  • There's some new ways I think to achieve what RuntimeOptions->"Speed" was doing, but I wouldn't focus too much on that

  • Is there a reason why you use Internal Bag? I am not familiar with them at all, but I briefly checked them out and to me it seems the new DynamicArray built-in data structure should do the job. The new data structures work very well with FunctionCompile, and I'm not sure if the type of Internal`Bag could be problematic (maybe not)

I hope this helps you in some way. I started working out a little bit your code but not enough to get to the end and give you some code. You can reply to my comment if you think anything is not clear or if I can help you in some other way

Posted 7 months ago

Hey thanks, I was able to get it to work! I haven't touched anything with parallelization yet, but will look into that soon.

Here's the compiled code:

decl = FunctionDeclaration[
    bboxOverlaps, 
    Typed[ConstantArray["Real64", 12] -> "Boolean"] @ Function[
       {epx1, epx2, epy1, epy2, epz1, epz2, x1, x2, y1, y2, z1, z2}, 
       And[
         epx1 <= x1 <= epx2 || epx1 <= x2 <= epx2 || (x1 <= epx1 <= epx2 <= x2),
         epy1 <= y1 <= epy2 || epy1 <= y2 <= epy2 || (y1 <= epy1 <= epy2 <= y2),
         epz1 <= z1 <= epz2 || epz1 <= z2 <= epz2 || (z1 <= epz1 <= epz2 <= z2)
       ]
    ]
];

pureOverlappingBBoxes = Function[
    {
       Typed[tree, "BinaryTree"], 
       Typed[bbox, "PackedArray"::["Real64", 2]]
    },
    Block[{inds, stack, x1, x2, y1, y2, z1, z2, node, a, b, data, d1, d2, d3, d4, d5, d6, d7, d8, d9, d10, d11},
       inds = TypeHint[CreateDataStructure["ExtensibleVector"], "ExtensibleVector"::["MachineInteger"]];
       stack = TypeHint[CreateDataStructure["Stack"], "Stack"::["BinaryTree"]];
       stack["Push", tree];

       x1 = bbox[[1, 1]]; x2 = bbox[[1, 2]];
       y1 = bbox[[2, 1]]; y2 = bbox[[2, 2]];
       z1 = bbox[[3, 1]]; z2 = bbox[[3, 2]];

       While[!stack["EmptyQ"],
         node = stack["Pop"];
         data = node["Data"];
         d1 = Cast[data[[1]], "Real64"];
         d2 = Cast[data[[2]], "PackedArray"::["Real64", 1]];
         d3 = Cast[data[[3]], "PackedArray"::["Real64", 1]];
         d4 = Cast[data[[4]], "PackedArray"::["Real64", 1]];
         d5 = Cast[data[[5]], "PackedArray"::["Real64", 1]];
         d6 = Cast[data[[6]], "PackedArray"::["Real64", 1]];
         d7 = Cast[data[[7]], "PackedArray"::["Real64", 1]];
         d8 = Cast[data[[8]], "PackedArray"::["MachineInteger", 1]];
         d9 = Cast[data[[9]], "Real64"];
         d10 = Cast[data[[10]], "Real64"];
         d11 = Cast[data[[11]], "MachineInteger"];
         a = bbox[[d11, 1]];
         b = bbox[[d11, 2]];
         If[Length[d2] > 0 && (d9 <= a <= d10 || d9 <= b <= d10 || a <= d9 <= d10 <= b),
          Do[
              If[bboxOverlaps[d2[[i]], d3[[i]], d4[[i]], d5[[i]], d6[[i]], d7[[i]], x1, x2, y1, y2, z1, z2],
                 inds["Append", d8[[i]]];
              ],
              {i, Length[d8]}
          ]
         ];

         If[a < d1 && !node["LeftNullQ"], stack["Push", node["Left"]]];
         If[d1 < b && !node["RightNullQ"], stack["Push", node["Right"]]];
       ];

       inds["Elements"]
    ]
];

(* takes 44 seconds to compile... *)
OverlappingBBoxes = FunctionCompile[decl, pureOverlappingBBoxes];

I also had to modify iAABBTree to make the data in the binary tree a bit more consistent:

iAABBTree[d_, bboxestagged_] :=
    Block[{len, n, x, Sleft, Scent, Sright, node},
       len = Length[bboxestagged];

       n = Ordering[bboxestagged[[All, 2d-1]], {Quotient[len+1, 2]}][[1]];
       x = Mean[bboxestagged[[n, 2d-1 ;; 2d]]];

       If[len <= 16, Return[centerNode3D[d, x, bboxestagged]]];

       {Sleft, Scent, Sright} = partitionIntervals3D[d, bboxestagged, x];

       node = centerNode3D[d, x, Scent];
       If[Length[Sleft] > 0, node["SetLeft", iAABBTree[Mod[d+1, 3, 1], Sleft]];];
       If[Length[Sright] > 0, node["SetRight", iAABBTree[Mod[d+1, 3, 1], Sright]];];

       node
    ]

And now the call that took 44 seconds now takes 11:

SeedRandom[1];
bboxes = randomBBox[100000];
testbboxes = randomBBox[100000];

aabbtree = AABBTree[bboxes]; // AbsoluteTiming
(* {0.262679, Null} *)

Scan[OverlappingBBoxes[aabbtree, #] &, testbboxes]; // AbsoluteTiming
(* {11.5295, Null} *)

Question

Notice how I took all the elements of the binary tree's data and cast them to their appropriate types. I wonder if casting from InertExpression to each type is adding a slowdown. Is there a way to instead convey this information in Typed[tree, "BinaryTree"]?

e.g. something like Typed[tree, "BinaryTree"::[{"Real64", "PackedArray"::["Real64", 1], ...}]]

POSTED BY: Greg Hurst

Thank you too for posting the results!

You are completely right that you can directly give the tree the appropriate specific type, here "BinaryTree"::["PackedArray"["Real64", 2]]. But there is a subtlety. You are passing to the function “OverlappingBBoxes” a tree that was created in top-level code. Data structures that are created in top-level code are made to work with expressions, so that they can work outside of compiled code (because that’s what the evaluator knows about).

This means that you’re passing this tree that has type "BinaryTree"::["InertExpression”]. If, at the same time, you declare the function to receive a tree of type "BinaryTree"::["PackedArray"["Real64", 2]], the system will not find a matching declaration for the argument you passed and will raise an inconsistency error.

One solution can be to construct your tree with compiled code, and directly use the specific compiled type "BinaryTree"::["PackedArray"["Real64", 2]] at construction time. (With compiled code you have the option to use the compiled types, in addition to the more general “InertExpression" that works in compiled and uncompiled code.) So for example, you can declare the function "centerNode3D" with type:

Typed[{"MachineInteger", "PackedArray"::["Real64", 2], "Real64"} -> "BinaryTree"::["PackedArray"["Real64", 2]]]

and type the "tree" argument in "pureOverlappingBBoxes" as

Typed[tree, "BinaryTree"::["PackedArray"["Real64", 2]]]

and this will allow you to get rid of the casts.

As an alternative, this second type annotation can be replaced by typing the value given to "data" in "pureOverlappingBBoxes":

data = TypeHint[node["Data"], "PackedArray"::["Real64", 2]];

This is a good example of one of the points I commented about annotating values in the body of the function to guide the inferencer. You have the option to type somewhere and/or somewhere else, as long as it's sufficient and consistent. (I precisely used this alternative to have the inferencer work out and inform me of the type of the tree, because I didn't know its format!)

That said, in my computer the code with and without the casts is running pretty much with the same timing. Let me know if you see any differences

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract