Group Abstract Group Abstract

Message Boards Message Boards

10
|
18.8K Views
|
10 Replies
|
18 Total Likes
View groups...
Share
Share this post:

LTemplate: a package for faster LibraryLink development

Posted 10 years ago
POSTED BY: Szabolcs Horvát
10 Replies

Version 0.5 is now released.

The highlights of this release are:

  • Expanded documentation, many usage examples to aid in learning
  • Greatly expanded SparseArray support
  • Experimental support for RawArray and Image

Please go to GitHub to see a more detailed changelog.

As always, any feedback is welcome.


Update: LTemplate 0.5.1 is now available. Hopefully it fixes most 0.5 problems. The documentation and examples have been further expanded.

POSTED BY: Szabolcs Horvát

This is pretty cool, Szabolcs, thanks for sharing this with the community.

One thing you might consider adding in future is something we've been taking advantage of for the neural network implementation we're working on for 11, which is to use the new, fast RawJSON import/export facility to handle things like multiple return values, associations, lists of strings, lists of booleans, tuples of mixed types, and other such things that don't have a native LibraryLink representation.

You can use Developer`ReadRawJSONString and Developer`WriteRawJSONString on the Mathematica side to efficiently serialize/deserialize all but large numeric tensors to/from UTF8 JSON strings, which can then be sent/received to the C++ side. On the C++ side you can use a header-only JSON parsing library to parse these and with a few simple functions turn them into things like std::vectors or hashmaps of strings and so on. It's possible to use templates to even make JSON (de)serializers that handle arbitrary kinds of nested vectors, tuples, hashmaps and so on without having to write any boilerplate.

As long as the bottleneck isn't serialization, it makes it much easier to work with weird data types using this technique. Large tensors of course are better suited to going through the normal LibraryLink mechanism, and returning multiple such tensors is better suited to your current approach.

Alternatively, you can use the ExpressionJSON variants of the above Developer functions to send arbitrary symbolic expressions over, so that your C++ program can process and emit things that aren't just lists and associations of the basic types, but could be e.g. polynomials or Entities or Quantities or what-have-you.

LibraryLink has picked up support for RawArrays, though that's not documented since RawArrays are themselves not documented. Let me know if you are interested in that and I'll share how to do that.

I also have a snippet of code that let's you write debugf("fmtstring", arg1, arg2...) from the C/C++ side and have that print immediately to your notebook on the Mathematica side, that's invaluable when debugging. This can be turned on and off via an EnableDebugPrint[] function so you can leave it in production code and turn it out when you need to.

POSTED BY: Taliesin Beynon

Version 0.3 of LTemplate is available now: https://github.com/szhorvat/LTemplate

Major changes since 0.2:

  • mlstream.h auxiliary header that makes it easier to handle function arguments and return values with MathLink-based passing
  • preliminary sparse array support
  • expanded documentation
  • a skeleton project is now included to make it quick and easy to set up a complex multiplatform LTemplate-based application
  • many fixes
POSTED BY: Szabolcs Horvát

enter image description here - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming!

POSTED BY: EDITORIAL BOARD
POSTED BY: Szabolcs Horvát
POSTED BY: Szabolcs Horvát

I haven't tested this, it's a distilled version of my actual code, but it shouldn't be far off:

https://gist.github.com/taliesinb/a3385002601421b3e8e2

For RawArrays, I actually found a nice notebook from Piotr about it, but I don't want to share it without his permission. I've drawn his attention to this thread.

POSTED BY: Taliesin Beynon

LibraryLink has picked up support for RawArrays, though that's not documented since RawArrays are themselves not documented. Let me know if you are interested in that and I'll share how to do that.

Yes, I would be quite interested in this if you are willing to share, and debugf as well :-) It is a good idea to make it possible to turn on/off such output.

POSTED BY: Szabolcs Horvát

Thanks for doing the benchmark, that's useful to know.

I wasn't trying to suggest that JSON makes sense for numeric data, or as a replacement for MathLink. I think the criteria for when to use JSON as a protocol are: 1) performance is not the bottleneck, but rather developer time 2) the data is not numeric, e.g. one or many floating point values that need to roundtrip accurately 3) the structure is fairly complicated structurally, e.g. involves associations in some natural way or multiple fields.

And if you are in a situation that JSON is also naturally emitted by a third party libraries, parsing it on the Mathematica side directly is obviously preferably to translating the JSON to MathLink calls on the C++ side.

In our particular application, we use JSON here and there for transmitting 'metadata' back to Mathematica. Actual numeric tensors are communicated using the RawArray interface.

POSTED BY: Taliesin Beynon

Hi Taliesin,

Thank you for the comments and tips! When I read your post, what immediately struck me was: why are you not using MathLink for this instead of JSON? MathLink must surely be faster than serializing to a textual representation. Or is it?

So I tried it out:

  • I generate a list of integer arrays of random lengths between 0..10 (I keep them short to eliminate any possible packed array advantage MathLink might or might not have). Integer can be up to 1000000000.
  • Then I send this to Mathematica using either MathLink or JSON (using RapidJSON, which claims to be very fast).

And indeed, the JSON version is faster ...

This generates a list of $2^{21}$ tiny integer lists:

In[36]:= obj@"generate"[2^21]

Transfer using MathLink:

In[38]:= expr = obj@"getML"[]; // AbsoluteTiming    
Out[38]= {1.94122, Null}

Transfer using JSON:

In[40]:= AbsoluteTiming[
 expr2 = Developer`ReadRawJSONString[obj@"getJSON"[]];
 obj@"releaseJSONBuffer"[];
 ]

Out[40]= {1.33406, Null}

In[41]:= expr == expr2    
Out[41]= True

Th JSON version is indeed faster.

But how is that possible? Doesn't MathLink use a binary representation for this, and shouldn't that take up less space and be faster?

Effectively this is how I transferred the data using MathLink

<!-- language: lang-c -->
std::vector<std::vector<int>> list;
...
MLPutFunction(link, "List", list.size())
for (const auto &vec: list)
    MLPutInteger32List(link, vec.data(), vec.size());

I did notice that the result does take up a lot more space in Mathematica than in JSON serialization:

In[31]:= Developer`WriteRawJSONString[expr] // ByteCount
Out[31]= 147593352

In[32]:= ByteCount[expr]
Out[32]= 352419368

That is understandable because: in JSON a 32-bit integer is only 10 digits or less, i.e. 10 bytes. In Mathematica each (non-packed-array-member) integer is 8 bytes plus some meta information totalling to 16 bytes according to ByteCount.

But MathLink should be more efficient than that: given that I use MLPutInteger32List and I am not putting each integer one bye one, it should in principle be able to transfer them in some "packed" format, furthermore it should only use 32-bit (not 64) for each, until they are read by the kernel.

Does this mean that MathLink is due for an update? Or does it have some inherent limitation which prevents it from being more efficient than it already is? Or perhaps we see the function call overhead compared to a header-only (thus fully inlineable) JSON library? It is should definitely be possible to make a binary format faster than a text-based JSON (maybe Cap'n Proto which you mentioned before, or similar).

If I generated random-length lists in the length range 0..100 instead of 0..10, then the performance advantage of JSON goes away.

In[43]:= obj@"generate"[2^18]

In[44]:= expr = obj@"getML"[]; // AbsoluteTiming    
Out[44]= {1.47904, Null}

In[45]:= AbsoluteTiming[
 expr2 = Developer`ReadRawJSONString[obj@"getJSON"[]];
 obj@"releaseJSONBuffer"[];
 ]
Out[45]= {1.78779, Null}

Another question: If you use JSON transfer for a machine learning application, isn't it a problem with that converting from binary to decimal and back doesn't leave floating point numbers intact? There may be a very small rounding error.

POSTED BY: Szabolcs Horvát
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard