Message Boards Message Boards

GROUPS:

[WSS18] Min-Heap Implementation

Posted 2 months ago
505 Views
|
9 Replies
|
8 Total Likes
|

Here is a Min-Heap package I wrote for my WSS project, Optimal Packing of Bidisperse Circles . There were 2 main challenges in this implementation: dealing with in-place modification of function arguments, and implementing the equivalent to a pop() operation, extracting in-place the last element of the list without copying the whole list.

I stored each heap (balanced binary tree) in a list, where each node at position i has children nodes at positions $2i$ and $2i+1$, and parent node at Floor[ $\frac{i}{2}$]. A Min-Heap satisfies the Min-Heap property: the value at each node is strictly less than or equal to each value at its children nodes.

I wrote 6 functions for this package (and another 6 for the Project-specific version): MinHeapQ, HeapifyUp, HeapifyDown, HInsert, HExtractMin and HBuild. Here are their descriptions:

MinHeapQ

This function tests if the list argument is a MinHeap or not, in $O(n)$ tests.

MinHeapQ[heap_List]:=And@@Table[heap[[Floor[i/2]]]<= heap[[i]], {i, 2, heap//Length}];

HeapifyUp

This function corrects a single error at index $i$ in the Min-Heap, recursively swapping it with its parent node until the Min-Heap property is satisfied. The HeapifyUp operation is asymptotic $O(\log n)$ complexity, given that it in the worst case traverses a whole branch of the tree, from a leaf upwards.

HeapifyUp:=Function[{heap, i}, Which[
(*at root*)    
i==1,
    heap,

(*satisfies minHeap prop, return*)
heap[[i]]>= heap[[Floor[i/2]]],
    heap,

(*if parent greater, swap with parent, and recursively call at new position*)    
heap[[i]]<= heap[[Floor[i/2]]],
    {heap[[i]], heap[[Floor[i/2]]]}={heap[[Floor[i/2]]], heap[[i]]};
    HeapifyUp[heap, Floor[i/2]];
],
HoldFirst
]

To deal with the in-place modification issue, I wrapped the Which cases in Funtion[, HoldFirst], allowing the heap list to be passed Unevaluated.

HeapifyDown

In an analogous manner, HeapifyDown corrects a violation at position $i$, comparing with child nodes and swapping with the smaller of the 2 (or 1, or not swapping) if its value is less than $i$'s value. The HeapifyDown operation is also asymptotic $O(\log n)$ complexity, through the same explanation above.

HeapifyDown:=Function[{heap, i}, With[{n = heap//Length}, Which[ 
    (*at leaf*)
    n< 2i,
       heap,

    (*only left child exists, swap*)       
    n<=2i && heap[[i]]>= heap[[2i]], 
       {heap[[i]], heap[[2i]]}={heap[[2i]], heap[[i]]};
       HeapifyDown[heap, 2i];,

    (*only left child exists, property satisfied*)     
    n<=2i && heap[[i]]<=  heap[[2i]], 
       heap,

    (*property satisfied*)     
    heap[[i]]<= heap[[2i]] && heap[[i]]<= heap[[2i+1]], 
       heap,

    (*swap with left child*)       
    heap[[i]]>=Min[heap[[2i]], heap[[2i+1]]] && heap[[2i]]<=  heap[[2i+1]],
       {heap[[i]], heap[[2i]]}={heap[[2i]], heap[[i]]};
       HeapifyDown[heap, 2i];,

    (*swap with right child*)
    heap[[i]]>=  Min[heap[[2i]], heap[[2i+1]]] && heap[[2i]]>heap[[2i+1]],
       {heap[[i]], heap[[2i+1]]}={heap[[2i+1]], heap[[i]]};
       HeapifyDown[heap, 2i+1];
 ] ],
 HoldFirst
]

HInsert

This function inserts an element into the heap, by appending it to the end of the heap and applying HeapifyUp into the element is at its correct position in the heap (and overall, the heap satisfies the Min-Heap property again). The insert operation has $O(\log n)$ asymptotic complexity.

HInsert:=Function[{heap, element},
AppendTo[heap, element];
HeapifyUp[heap, heap//Length];,
HoldFirst
]

HBuild

HBuild builds a heap out of a list in an in place manner, by applying HeapifyDown to all non-leaf nodes. It can be shown that the complexity of this operation is $O(n)$, linear in input size, not $O(n \log n)$, as follows: At height $k$, the number of nodes is given by $2^k$. We can thus write the total complexity as the sum $$ T = \sum_{k=0}^h (h-k)2^k $$ where $h = Log_2 n$ is the height of the tree. We can evaluate this sum in Mathematica:

In[2]:= Sum[(Log2[n] - k) 2^k, {k , 0, Log2[n]}]

Out[2]= (-2 Log[2] + 2 n Log[2] - Log[n])/Log[2]

Which is clearly $O(n)$. As for the implementation, here it is:

BuildH:=Function[{list},
Map[HeapifyDown[list, #]&, Range[Floor[(list//Length)/2], 1, -1]];,
HoldFirst
]

HExtractMin

HExtractMin was by far the hardest part of this package, and definitely one of the hardest in my project. This function should return the minimum element in the heap, a.k.a heap[[1]], delete it from the heap, while maintaining the Min-Heap property. The traditional algorithm for this procedure is as follows:

  • Swap the first element and the last element
  • Pop the last element
  • HeapifyDown the first element to correct Min-Heap Property

However, there is no pop operation in the WL. There are functions like Drop, Take and Span, but they all copy the list, so have $O(n)$ complexity and thus are too slow. The Drop operation on a $10^6$ element list runs $10$ times slower than the HeapifyDown operation! I addressed this issue with a simple amortization algorithm:

The algorithm will require a global counter dcount which counts the number of deletion operations (initialized internally at 0). The procedure:

  • Increment dcount.
  • Store the first element (minimum), then swap it with the last element.
  • Set the last element to infinity
  • (Amortization Step) If dcount more than half the length of the heap,
  • Reset dcount to $0$, Select out all Infinity elements

  • Rebuild the heap with HBuild
  • HeapifyDown the first element

Every time extractMin is performed, the counter is incremented and a null element [Infinity] is left at the bottom (maintaining heap size) . After $n$ such operations, the asymptotic cost of a Select operation is $O(n)$, making the amortized cost of the "pop" $O(1)$. In a $10^6$ element list, the Select operation takes about 1 second, and happens every $5. 10^5$ deletes. Clearly, the amortized cost is efficient as is, but can still be tuned in the package as needed. I arbitrarily choose half the size to correspond to (roughly) the size of a next layer of leaves, all infinity (but, of course, the infinities aren't all at the leaves).

Anyway, here is the implementation:

HExtractMin:=Function[{heap}, Block[ {n = heap//Length, x = heap[[1]]},

(*increment delete counter*)
dcount++;

(*swap last element*)
{heap[[1]], heap[[n]]}={heap[[n]], heap[[1]]};

(*set infinity at the end*)
heap[[n]] = Infinity;

(*Amortization case*)
If[dcount > n / 2, 
    dcount = 0; 
    heap = Select[heap, #!= Infinity&];
    BuildH[heap];
];

(*Corrects swapped position*)
HeapifyDown[1];

(*output minimum*)
x
],
HoldFirst
]

HeapifyDown Test

To conclude this post, I ran a couple tests on HeapifyDown up to where my memory could hold:

HeapifyDown test

And, as we can see, the evaluation time is quite linear, and pretty fast.

If any optimizations or better ExtractMin algorithms are found, send me them! Also, checkout this implementation of a Augmented Min-Heap I wrote for my whole project (in the notebook at the end).

Attachments:
9 Replies

A language for binary tree based programming is Lisp, which is based upon solely that (yet complete programs can be written using it).

Mathematica has infinite List type and evaluation of List elements as functions or data, in a way mm can do the job of Lisp albeit a little differently; not so rigidly. you have to study Lisp to know what I mean.

guide/DiscreteMathematics shows ways to create, use, traverse binary and also complex trees

Pretty cool!

there is a problem with the Big O notation in "school text books": they never asked Intel

for example: the old textbooks count sorting a[i] as a single index and a[i][j] as double (killing the O, making it a slow sort)

but Intel chips do the [i][j] in a single clock now (a feature to help bad code written by big companies?): so the O is wrong in the textbooks

mathematica as you now know is a great tool for testing computer science, algorithms, more. some have the theory an OS could be run by it - though no one has show this "actually tried" yet

It's nice to see Mathematica used to learn/teach computer science concepts.

I just have a suggestion for wrapping up your functions into a package. Do take a look at the standard packages structure:

Issues with your package:

  • It can't be loaded with Needs["MinHeap`"] because the package filename does not match the context name
  • It creates symbols like n (and other private symbols) in the public MinHeap` context. Never export unnecessary symbols or there will be conflicts. I'm sure that trying to add some of these common symbols (n, x, etc.) to a heap will now cause trouble.
  • Being["`Private`"] must have a backtick at the start too—you missed it.

Some questions:

  • Why are you restricting functions like HInsert to work with assigned symbols only, and not with lists? E.g. why is it necessary to do HInsert[someSymbol, element] instead of HInset[{...}, element]?
  • I didn't fully understand the role of the dcount global variable. It seems to be used for all heaps. Or is the package able to work only with a single heap?

Thanks for the suggestions & issues. I'll fix them at the next opportunity.

To address your questions,

  • I didn't know how to modify in-place arguments of functions, without using HoldAll/HoldFirst or Unevaluated. With HoldAll, the heap argument can't be unassigned symbols (or at least I couldn't figure out how).
  • My mistake. I essentially took the packing project code and put it in a package, where I only used a single heap. I'll think of a solution next opportunity I have as well.

Did you do large-scale time tests? The function HInsert looks like it has an O(n) component due to the AppendTo.

I would recommend starting with a list of fixed length maxLength and perhaps initialized e.g. ConstantArray[0,maxLength], a currentLength counter, and a function that makes a new list of length say 1.4*prior length whenever currentLength hits the current maxLength (which of course also gets modified at that point).

[Not sure if I should show this, but there already exist spelunking tools that one could use to find it.] Here is some internal code we have used for heaps. First step is to force autoload of some integer linear programming code (because I know already there is some heap code in there).

Reduce[{x - 3 y == 7, x >= -10, y <= 20}, {x, y}, Integers];

Now find out what available functionality is related to heaps.

?? *`*Heap*

(* CalculateUnits`UnitCommonSymbols`
HeapedBushels   MetricHeapedBushels

 Combinatorica`
Heapify

 NIntegrate`
Heap    HeapElements   HeapInsert    HeapMerge
HeapDelete  HeapEmptyQ   HeapLookup    HeapTopElement

 System`IntegerLinearDump`
HeapAdd HeapGet *)

So we have some stuff in the context "System``IntegerLinearDump``" (apologies for the doubled back-ticks, I don't know how to format them to get proper single ones). We'll put that on the context path and then have a look at Information.

AppendTo[$ContextPath, "System`IntegerLinearDump`"];

?? HeapAdd

(* System`IntegerLinearDump`HeapAdd
Attributes[HeapAdd]={HoldFirst}

HeapAdd[h_,a_,k_]:=Block[
  {hole=k+1,parent,val=a[[1]]},
  parent=Floor[hole/2];
  While[parent!=0&&h[[parent,1]]>val,
    h[[hole]]=h[[parent]];
    hole=parent;
    parent=Floor[hole/2]];
  h[[hole]]=a] *)

?? HeapGet

(* System`IntegerLinearDump`HeapGet
Attributes[HeapGet]={HoldFirst} 

HeapGet[h_,k_]:=Block[
  {ans=h[[1]],hole=1,c1=2,c2,val=h[[k,1]]},
  While[c1<k,
  c2=c1+1;
  If[c2<k,If[h[[c1,1]]<h[[c2,1]],If[h[[c1,1]]<val,
    h[[hole]]=h[[c1]];hole=c1;c1=2 hole,c1=k],
    If[h[[c2,1]]<val,h[[hole]]=h[[c2]];hole=c2;c1=2 hole,c1=k]],
    If[h[[c1,1]]<val,h[[hole]]=h[[c1]];hole=c1;c1=2 hole,c1=k]]];
   h[[hole]]=h[[k]];ans] *)

The callers of HeapAdd are responsible for checking that the length requirement is met. When exceeded, they double the heap by using Join to a List of same size, with all zeros for elements. In typical usage this will be a packed array if elements are in the machine integer range.

Hey, I only large scale tested the functions I ended up using on the project. I didn't know AppendTo was so slow. Many thanks for the suggestions though!

enter image description here - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming!

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract