Group Abstract Group Abstract

Message Boards Message Boards

10
|
19.7K Views
|
10 Replies
|
18 Total Likes
View groups...
Share
Share this post:

LTemplate: a package for faster LibraryLink development

Posted 10 years ago
POSTED BY: Szabolcs Horvát
10 Replies

Version 0.5 is now released.

The highlights of this release are:

  • Expanded documentation, many usage examples to aid in learning
  • Greatly expanded SparseArray support
  • Experimental support for RawArray and Image

Please go to GitHub to see a more detailed changelog.

As always, any feedback is welcome.


Update: LTemplate 0.5.1 is now available. Hopefully it fixes most 0.5 problems. The documentation and examples have been further expanded.

POSTED BY: Szabolcs Horvát

enter image description here - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming!

POSTED BY: EDITORIAL BOARD

I haven't tested this, it's a distilled version of my actual code, but it shouldn't be far off:

https://gist.github.com/taliesinb/a3385002601421b3e8e2

For RawArrays, I actually found a nice notebook from Piotr about it, but I don't want to share it without his permission. I've drawn his attention to this thread.

POSTED BY: Taliesin Beynon

Thank you! I'm looking forward to it as I am quite curious about what RawArrays can be used for.

I have never used RawArrays before, partly because they are undocumented and partly because they did not seem all that useful for pure-Mathematica programming. Today I spelunked a bit, looked at the RawArray functions in Developer`, noticed that Normal and equality comparison (==, ===) works on them.

I am hoping to be able to use them to represent special data structures directly, but also memory-efficiently, as an (immutable) Mathematica expressions, and thus integrate them much better into Mathematica.

Currently LTemplate is very much focused on working with opaque and mutable objects. The memory is allocated and managed completely on the C side, and Mathematica only has a reference to C-side objects (as in integer, using managed library expressions). These mutable objects are not a good fit for pure-Mathematica programming, so when I used LTemplate in a published package, I completely hid them from the end user.

What can we do with RawArrays in Mathematica other than create them, convert them to a list, or compare them?

I imagine applications such as representing the state of a fast random number generator, and integrating it to Mathematica's RNG framework. I once wanted to do this with LTemplate and used the managed library expression ID as the Mathematica-side "state". But it turned out that Mathematica's RNG framework requires that states be comparable with == (this doesn't seem to be documented, but if they are not equality-comparable, things break). So it won't work this way. It would be necessary to represent the state as a Mathematica list. But that is messy and more work than I'd want to do because Mathematica integers (mint) may not map to what a particular RNG implementation might use internally, and also the mint size differs between platforms. I don't think 32-bit platforms will go away just yet: the Raspberry Pi is 32-bit. With a RawArray whose internal type is known and fixed this would be easier. I guess in principle a RawArray["Byte",...] could store an arbitrary C++ POD type. (Of course there's also applications like storing images, sounds, compressed byte-stream, etc.)

I will be away for ~10 starting today, so I'll only be able to check responses afterwards.

POSTED BY: Szabolcs Horvát

LibraryLink has picked up support for RawArrays, though that's not documented since RawArrays are themselves not documented. Let me know if you are interested in that and I'll share how to do that.

Yes, I would be quite interested in this if you are willing to share, and debugf as well :-) It is a good idea to make it possible to turn on/off such output.

POSTED BY: Szabolcs Horvát

Thanks for doing the benchmark, that's useful to know.

I wasn't trying to suggest that JSON makes sense for numeric data, or as a replacement for MathLink. I think the criteria for when to use JSON as a protocol are: 1) performance is not the bottleneck, but rather developer time 2) the data is not numeric, e.g. one or many floating point values that need to roundtrip accurately 3) the structure is fairly complicated structurally, e.g. involves associations in some natural way or multiple fields.

And if you are in a situation that JSON is also naturally emitted by a third party libraries, parsing it on the Mathematica side directly is obviously preferably to translating the JSON to MathLink calls on the C++ side.

In our particular application, we use JSON here and there for transmitting 'metadata' back to Mathematica. Actual numeric tensors are communicated using the RawArray interface.

POSTED BY: Taliesin Beynon

Hi Taliesin,

Thank you for the comments and tips! When I read your post, what immediately struck me was: why are you not using MathLink for this instead of JSON? MathLink must surely be faster than serializing to a textual representation. Or is it?

So I tried it out:

  • I generate a list of integer arrays of random lengths between 0..10 (I keep them short to eliminate any possible packed array advantage MathLink might or might not have). Integer can be up to 1000000000.
  • Then I send this to Mathematica using either MathLink or JSON (using RapidJSON, which claims to be very fast).

And indeed, the JSON version is faster ...

This generates a list of $2^{21}$ tiny integer lists:

In[36]:= obj@"generate"[2^21]

Transfer using MathLink:

In[38]:= expr = obj@"getML"[]; // AbsoluteTiming    
Out[38]= {1.94122, Null}

Transfer using JSON:

In[40]:= AbsoluteTiming[
 expr2 = Developer`ReadRawJSONString[obj@"getJSON"[]];
 obj@"releaseJSONBuffer"[];
 ]

Out[40]= {1.33406, Null}

In[41]:= expr == expr2    
Out[41]= True

Th JSON version is indeed faster.

But how is that possible? Doesn't MathLink use a binary representation for this, and shouldn't that take up less space and be faster?

Effectively this is how I transferred the data using MathLink

<!-- language: lang-c -->
std::vector<std::vector<int>> list;
...
MLPutFunction(link, "List", list.size())
for (const auto &vec: list)
    MLPutInteger32List(link, vec.data(), vec.size());

I did notice that the result does take up a lot more space in Mathematica than in JSON serialization:

In[31]:= Developer`WriteRawJSONString[expr] // ByteCount
Out[31]= 147593352

In[32]:= ByteCount[expr]
Out[32]= 352419368

That is understandable because: in JSON a 32-bit integer is only 10 digits or less, i.e. 10 bytes. In Mathematica each (non-packed-array-member) integer is 8 bytes plus some meta information totalling to 16 bytes according to ByteCount.

But MathLink should be more efficient than that: given that I use MLPutInteger32List and I am not putting each integer one bye one, it should in principle be able to transfer them in some "packed" format, furthermore it should only use 32-bit (not 64) for each, until they are read by the kernel.

Does this mean that MathLink is due for an update? Or does it have some inherent limitation which prevents it from being more efficient than it already is? Or perhaps we see the function call overhead compared to a header-only (thus fully inlineable) JSON library? It is should definitely be possible to make a binary format faster than a text-based JSON (maybe Cap'n Proto which you mentioned before, or similar).

If I generated random-length lists in the length range 0..100 instead of 0..10, then the performance advantage of JSON goes away.

In[43]:= obj@"generate"[2^18]

In[44]:= expr = obj@"getML"[]; // AbsoluteTiming    
Out[44]= {1.47904, Null}

In[45]:= AbsoluteTiming[
 expr2 = Developer`ReadRawJSONString[obj@"getJSON"[]];
 obj@"releaseJSONBuffer"[];
 ]
Out[45]= {1.78779, Null}

Another question: If you use JSON transfer for a machine learning application, isn't it a problem with that converting from binary to decimal and back doesn't leave floating point numbers intact? There may be a very small rounding error.

POSTED BY: Szabolcs Horvát

This is pretty cool, Szabolcs, thanks for sharing this with the community.

One thing you might consider adding in future is something we've been taking advantage of for the neural network implementation we're working on for 11, which is to use the new, fast RawJSON import/export facility to handle things like multiple return values, associations, lists of strings, lists of booleans, tuples of mixed types, and other such things that don't have a native LibraryLink representation.

You can use Developer`ReadRawJSONString and Developer`WriteRawJSONString on the Mathematica side to efficiently serialize/deserialize all but large numeric tensors to/from UTF8 JSON strings, which can then be sent/received to the C++ side. On the C++ side you can use a header-only JSON parsing library to parse these and with a few simple functions turn them into things like std::vectors or hashmaps of strings and so on. It's possible to use templates to even make JSON (de)serializers that handle arbitrary kinds of nested vectors, tuples, hashmaps and so on without having to write any boilerplate.

As long as the bottleneck isn't serialization, it makes it much easier to work with weird data types using this technique. Large tensors of course are better suited to going through the normal LibraryLink mechanism, and returning multiple such tensors is better suited to your current approach.

Alternatively, you can use the ExpressionJSON variants of the above Developer functions to send arbitrary symbolic expressions over, so that your C++ program can process and emit things that aren't just lists and associations of the basic types, but could be e.g. polynomials or Entities or Quantities or what-have-you.

LibraryLink has picked up support for RawArrays, though that's not documented since RawArrays are themselves not documented. Let me know if you are interested in that and I'll share how to do that.

I also have a snippet of code that let's you write debugf("fmtstring", arg1, arg2...) from the C/C++ side and have that print immediately to your notebook on the Mathematica side, that's invaluable when debugging. This can be turned on and off via an EnableDebugPrint[] function so you can leave it in production code and turn it out when you need to.

POSTED BY: Taliesin Beynon
POSTED BY: Szabolcs Horvát
POSTED BY: Szabolcs Horvát
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard