Message Boards Message Boards

2
|
8175 Views
|
0 Replies
|
2 Total Likes
View groups...
Share
Share this post:

Quad tree index in 10 lines with recursive queries and nested Associations

Posted 8 years ago

I'd like to discuss some technical aspects and implications of this answer I posted to Stackexchange to a new user's question "How to use classical data structure in Mathematica".

The index is both space and time efficient, and a preliminary timing study over 6 orders of magnitude shows n*log(n) time complexity.

No point copying the entire answer here, but wanted to summarize the significance as it uses features introduced in V10 (2014) that many people are not familiar with. The index is implemented in just a few lines of code using (1) nested Association data structure, and (2) recursive Query as the key processing engine.

(1) Nested Associations, or tree-structured data, are hierarchies of Key-Value associations in which the values are themselves associations.

• Relational tables are special cases of 2-level nested associations with symmetrical branches - ie in which every row contains the same Keys (which is space-inefficient but very usable in practice anyway).

• Further, Mathematica allows keys to be very general, not just strings, but also numbers, symbols (including truth-values), composite structures like lists, associations, basically any expression except Mathematica patterns.

• In this application, GroupBy is used to group points into quadrants. There are keys corresponding to non-empty quadrants as output by GroupBy, initially as a pair of truth-values, eg {True,False}. In a subsequent step these keys are mapped to the container Rectangle.

• Keys in a nested association, whether strings or composite, can be used - as they are here to implement Trie (aka prefix-tree) data structures in which the index is distributed across the hierarchy. Here the prefix is a Rectangle that (geometrically) contains all suffix Rectangles deeper in the nested association.

(2) The semantics of recursive Queries are well suited to trie indexing. In this application, this query is groupByQuad (note it appears both on the LHS and RHS of the definition).

groupByQuad[bucketSize_][box_Rectangle, data_List] := 
  If[Length[data] > bucketSize, 
   data // GroupBy[
      Thread[#pos > boxMean[box] ] & /* boxClassify[box]] // 
    associationKeyValueMap[groupByQuad[bucketSize]], data];

• Of course we imagine that Map and other functional operators are doing the heavy lifting but this functionality is decoupled from the user.

• Here bucketSize determines how finely the points are separated and ultimate depth of the trie. The remaining parameters in the subvalues (box, data) appear as a sequence because of how nested key-values are accessed, via the helper associationKeyValueMap.

• The conditional queries the data at any given level: if there are more points at a given level than the bucketSize, then these are recursively grouped into quads using the index-specific helper functions boxMean and boxClassify, otherwise, the data is left inert - this separation contributes to the efficiency.

This is the second recursive query that I've defined - this one strictly for demonstration purposes, but the first one, keyTrie, I use routinely to map entire filesystems of data in memory so that the keys are mapped to the folder names, hence the data files can be accessed in the same sequence and ease as you would on a command line.

In sum, recursive queries coupled with nested associations, and the ability of the latter to employ structured keys (as opposed to just strings) gives programmers a radically simple, and efficient toolkit to implement classical data structures in the area of data analysis (the data payload rides along with the the other values, eg the point coordinates here).

Note I plan to include this description in my forthcoming, tutorial-style book Functional Data Analysis.

Al

POSTED BY: Alan Calvitti
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract