Message Boards Message Boards

GROUPS:

Thoughts on a Python interface, and why ExternalEvaluate is just not enough

Posted 1 year ago
4184 Views
|
22 Replies
|
56 Total Likes
|

ExternalEvaluate, introduced in M11.2, is a nice initiative. It enables limited communication with multiple languages, including Python, and appears to be designed to be relatively easily extensible (see ExternalEvaluate`AddHeuristic if you want to investigate, though I wouldn't invest in this until it becomes documented).

My great fear, however, is that with ExternalEvaluate Wolfram will consider the question of a Python interface settled.

This would be a big mistake. A general framework, like ExternalEvaluate, that aims to work with any language and relies on passing code (contained in a string) to an evaluator and getting JSON back, will never be fast enough or flexible enough for practical scientific computing.

Consider a task as simple as computing the inverse of a $100\times100$ Mathematica matrix using Python (using numpy.linalg.inv).

I challenge people to implement this with ExternalEvaluate. It's not possible to do it in a practically useful way. The matrix has to be sent as code, and piecing together code from strings just can't replace structured communication. The result will need to be received as something encodable to JSON. This has terrible performance due to multiple conversions, and even risks losing numerical precision.

Just sending and receiving a tiny list of 10000 integers takes half a second (!)

In[6]:= ExternalEvaluate[py, "range(10000)"]; // AbsoluteTiming
Out[6]= {0.52292, Null}

Since I am primarily interested in scientific and numerical computing (as I believe most M users are), I simply won't use ExternalEvaluate much, as it's not suitable for this purpose. What if we need to do a mesh transformation that Mathematica can't currently handle, but there's a Python package for it? It's exactly the kind of problem I am looking to apply Python for. I have in fact done mesh transformations using MATLAB toolboxes directly from within Mathematica, using MATLink, while doing the rest of the processing in Mathematica. But I couldn't do this with ExternalEvaluate/Python in a reasonable way.

In 2017, any scientific computing system needs to have a Python interface to be taken seriously. MATLAB has one, and it is practically usable for numerical/scientific problems.


A Python interface

I envision a Python interface which works like this:

  • The MathLink/WSTP API is exposed to Python, and serves as the basis of the system. MathLink is good at transferring large numerical arrays efficiently.
  • Fundamental data types (lists, dictionaries, bignums, etc.) as well as datatypes critical for numerical computing (numpy arrays) can be transferred efficiently and bidirectionally. Numpy arrays in particular must translate to/from packed arrays in Mathematica with the lowest possible overhead.
  • Python functions can be set up to be called from within Mathematica, with automatic argument translation and return type translation. E.g.,

    PyFun["myfun"][ (* myfun is a function defined in Python *)
        {1,2,3} (* a list *), 
        PyNum[{1,2,3}] (* cast to numpy array, since the interpretation of {1,2,3} is ambiguous *), 
        PySet[{1,2,3}] (* cast to a set *)
    ]
    
  • The system should be user-extensible to add translations for new datatypes, e.g. a Python class that is needed frequently for some application.

  • The primary mode of operation should be that Python is run as a slave (subprocess) of Mathematica. But there should be a second mode of operation where both Mathematica and Python are being used interactively, and they are able to send/receive structured data to/from each other on demand.
  • As a bonus: Python can also call back to Mathematica, so e.g. we can use a numerical optimizer available in Python to find the minimum of a function defined in Mathematica
  • An interface whose primary purpose is to call Mathematica from Python is a different topic, but can be built on the same data translation framework described above.

The development of such an interface should be driven by real use cases. Ideally, Wolfram should talk to users who use Mathematica for more than fun and games, and do scientific computing as part of their daily work, with multiple tools (not just M). Start with a number of realistic problems, and make sure the interface can help in solving them. As a non-trivial test case for the datatype-extension framework, make sure people can set up auto-translation for SymPy objects, or a Pandas dataframe, or a networkx graph. Run FindMinimum on a Python function and make sure it performs well. (In a practical scenario this could be a function implementing a physics simulation rather than a simple formula.) As a performance stress test, run Plot3D (which triggers a very high number of evaluations) on a Python function. Performance and usability problems will be exposed by such testing early, and then the interface can be designed in such a way as to make these problems at least solvable (if not immediately solved in the first version). I do not believe that they are solvable with the ExternalEvaluate design.

Of course, this is not the only possible design for an interface. J/Link works differently: it has handles to Java-side objects. But it also has a different goal. Based on my experience with MATLink and RLink, I believe that for practical scientific/numerical computing, the right approach is what I outlined above, and that the performance of data structre translation is critical.


ExternalEvaluate

Don't get me wrong, I do think that the ExternalEvaluate framework is a very useful initiative, and it has its place. I am saying this because I looked at its source code and it appears to be easily extensible. R has zeromq and JSON capabilities, and it looks like one could set it up to work with ExternalEvaluate in a day or so. So does Perl, anyone want to give it a try? ExternalEvaluate is great because it is simple to use and works (or can be made to work) with just about any interpreted language that speaks JSON and zeromq. But it is also, in essence, a quick and dirty hack (that's extensible in a quick and dirty way), and won't be able to scale to the types of problems I mentioned above.


MathLink/WSTP

Let me finally say a few words about why MathLink/WSTP are critical for Mathematica, and what should be improved about them.

I believe that any serious interface should be built on top of MathLink. Since Mathematica already has a good interface capable of inter-process communication, that is designed to work well with Mathematica, and designed to handle numerical and symbolic data efficiently, use it!!

Two things are missing:

  • Better documentation and example programs, so more people will learn MathLink

  • If the MathLink library (not Mathematica!) were open source, people would be able to use it to link to libraries which are licensed under the GPL. Even a separate open source implementation that only supports shared memory passing would be sufficient—no need to publish the currently used code in full. Many scientific libraries are licensed under the GPL, often without their authors even realizing that they are practically preventing them from being used from closed source systems like Mathematica (due to the need to link to the MathLink libraries). To be precise, GPL licensed code can be linked with Mathematica, but the result cannot be shared with anyone. I have personally requested the author of a certain library to grant an exception for linking to Mathematica, and they did not grant it. Even worse, I am not sure they understood the issue. The authors of other libraries cannot grant such a permission because they themselves are using yet other GPL's libraries.

    MathLink already has a more permissive license than Mathematica. Why not go all the way and publish an open source implementation?

I am hoping that Wolfram will fix these two problems, and encourage people to create MathLink-based interfaces to other systems. (However, I also hope that Wolfram will create a high-quality Python link themselves instead of relying on the community.)

I have talked about the potential of Mathematica as a glue-language at some Wolfram events in France, and I believe that the capability to interface external libraries/systems easily is critical for Mathematica's future, and so is a healthy third-party package ecosystem.

22 Replies

Dear Szabolcs,

as usual your post is very helpful and instructive. Thank you for your posts both here on on StackExchange. They make my life much easier.

Thanks,

Marco

As you've already mentioned: Mathematica does have facility to work with other languages.

Also are you running Mathematica Online? Access your docs remotely on iPhone and etc? Can you expect Notebooks to work properly remotely if they require zombie python drivers and interfacing linked to a particular Desktop code base / kernel?

Your main argument doesn't make sense. Mathematica does not force a choice that external programs use text (more than Mathematica itself does) and does not force you to use JSON: if you do so it's due to choices you made not Mathematica's limitations.

I don't see an issue in that Mathematica already has a few ways to work with external programs (by linking, by text mode, other) and also can run multiple Mathematica Evaluate processes. Why can't you run an interface, the Mathematica documentation seems to indicate it can drive one, also has built-in helpers to use the existing front-end to customize small but very-connected/powerful interface. I'm not sure your right either. Mathematica owning it's side of the math-link is because it spawns it (same as any software would), and doesn't mean you've been prevented to use your computer or software as you need.

Personally I don't want Mathematica to have GPL "drivers" inside it: I run an apple and mathematica specifically so I am no hostage to whatever hack anyone in the GPL has "a gpg key" to force on to me (ie, unix compatibility changes, continual security leaks, etc). I doubt my wish is true. Are you saying Mathematica should have GPL drivers linked in such a way that when shipped a notebook could spawn a zombie? So to take control over the Mathematica product and monitor people, if there were even a single "bug" in the driver it? I'm sure your not :) But you can see where seemless integration with GPL "gpg only uploads" software leads to: control by anonymous uploaders.

Python itself is a continual issue that users who compile it are forced into upgrading their compilers which force upgrades of their desktops: the language changes in such a way that old code no longer runs (it's not safe to work with - you write code today, and 1yr from now one's efforts may be ruined or need extensive rewriting). I complain Mathematica shouldn't make incompatible language changes because proffessors work their hearts out to write notebooks they hope others can use (intact), but they do ignore me at times, but mostly are good about it: today's GPL is not.

Not everyone is "into GPL" because GPL is not the smaller tighter community it used to be (college and government watched) thing it used to be. RFC used to mean something, IETF used to mean something: today they are both ignored. GPL is very political these days (who gets keys, how used) and has drawn lines in the sand about what country will make the machines it runs on "correctly" (ie, ARM v. Intel). I see no end to that in the near future: too many fixes and every fix would be hotly debated.

That being said I enjoy the older GPL world very much and still use it for awk(1), for Mathematica 4, and things like that.

I hate to be so long in response. But I don't think simply saying "Mathematica should link to GPL code the way GPL code would prefer it", is a good idea for everything involved which is quite too much.

I cannot tell you more at the moment, but would like to assure you that this is the beginning of the story, not the end. There are development initiatives in that general direction and I encourage you to stay patient and look forward to more exciting things coming.

Posted 1 year ago

Hello, I would like to request the Wolfram developers consider doing a good Python interface to Mathematica. The C interface seems to be discouraged and I looked over the J/Link interface and just thought..."man, do I really want to program in Java".

Personally, I feel having a Python interface would be fun. Also, academia is currently bursting out of the seams with Python programmers, so it would make business sense for Wolfram to tie Mathematica into Python.

Wolframscript is OK, but Python has classes which allow for easier storage and organization. Python is much easier to read and I would venture to say too much Wolframscript becomes "write once, read never," or, at least, it becomes pretty dense and obscure.

I think it would be great (and much easier) to be able to call the immense power of Mathematica from Python when writing command line scripts.

Thanks for the consideration!

Clarification: I had posted this as a separate topic, but I guess the moderators of the forum decided it fit better with this topic and moved my post here. Because of this, I haven't yet read the rest of the thread, but will do so now.

The C interface seems to be discouraged

I really do not think this is the case.

and I looked over the J/Link interface and just thought..."man, do I really want to program in Java"

<!-- -->

Python is much easier to read and I would venture to say too much Wolframscript becomes "write once, read never," or, at least, it becomes pretty dense and obscure.

You seem to be looking to use a different language than Mathematica (Wolfram Language) because you are not satisfied with it. This is not the purpose of these links.

These links exist to make functionality that is not available in Mathematica accessible without leaving Mathematica.

For example, to simulate a physical system, you would need a fast low-level language such as C++, C or Fortran. No high-level language like Mathematica or Python will ever be able to compete in this area. But then you may want to run an optimization on one of the parameters of the simulation, map the parameter space in an efficient manner (e.g. using adaptive sampling), or just do a quick visualization to see what your simulation is doing. This is much easier and quicker to do in Mathematica.

Every tool has a purpose: you may be able to hammer a nail with a screwdriver, but it's not going to be very effective. These linking technologies make it possible to use the right tool for the right purpose. Mathematica is particularly good at interactive work/exploration. In my experience, it is great to use it as the centre of my workflow, and control other tools from it.

As you said, Python is increasingly popular for scientific work. There are Python libraries implementing functionality that Mathematica does not have at this moment. I would find it extremely useful to be able to access some of this functionality while staying in the same system, using the same familiar plotting and data wrangling functions, etc. Currently, if I need a specific library that only has a Python interface, I am forced to use not only this library but everything else from Python as well (e.g. plotting) because there is no efficient communication between the systems. I am confined within a single system, and can't pick the best tool for the task.

@Szabolcs

Very clear!

Hopefully, LLVM compilation will allow for a much faster Wolfram Language execution (eagerly waiting to hear the latest news on the WTC: how much does it cover? still the 99.99% of the language? How automatic is it getting? Can we imagine full automation/transparent to the user? What is the current optimization level / % of pure C?). Put together with an eventual evolution of the parallelization technology, a project that I'm still waiting to discover of its (eventual) existence..., and we will need to link with other languages on much fewer occasions... focusing less time on optimization, and more time on the main purpose of the algorithms.

Posted 1 year ago

Hi Szabolcs, I put a note on my original post...I made a separate thread requesting that an interface be developed allowing Python to access Mathematica and I think the moderators decided to move my post here. So, now I'm here. Although we are talking different things...you want to access Python through Mathematica, I want to access Mathematica through Python, it is similar.

I agree with you that Mathematica's great strength and Python's great weakness is plotting. Also, Jupyter notebooks just can't compare with Mathematica notebooks. And, although a lot of the same functionality of Mathematica can be found in libraries such as SymPy, SciPy, and friends, it makes a big difference to have everything integrated and documented under one roof.

J/Link is a two way thing and you can use J/Link to call Mathematica from Java. That was the purpose of my original post, which I thought was to be a separate thread...to ask that a similar interface be developed to allow Python to call Mathematica functionality more transparently.

I am in total agreement with what you said here:

... Python is increasingly popular for scientific work. There are Python libraries implementing functionality that Mathematica does not have at this moment. I would find it extremely useful to be able to access some of this functionality while staying in the same system, using the same familiar plotting and data wrangling functions, etc. Currently, if I need a specific library that only has a Python interface, I am forced to use not only this library but everything else from Python as well (e.g. plotting) because there is no efficient communication between the systems. I am confined within a single system, and can't pick the best tool for the task.

and would just add that I would like to access Mathematica from Python for much the same purposes.

Hi Stephan,

I made a separate thread requesting that an interface be developed allowing Python to access Mathematica and I think the moderators decided to move my post here. So, now I'm here.

I completely missed that, and I think I misunderstood what you meant. With that context now it makes sense.

Posted 1 year ago

Hi Szabolcs and thanks for the reply. Actually, I was a bit embarrassed about what happened, since I don't make it a habit of posting unrelated thoughts in other people's threads :-)

Back to the topic of this discussion, I would like to emphasize I agree about the need for a greater integration of Mathematica with Python. Thinking about it, I've decided that calling Python from Mathematica would be just as helpful as calling Mathematica from Python. Totally agree that the speed of data transfer is critical for things like computational geometry, as you mentioned, or other stuff like machine learning, and also about the fundamental importance of numpy arrays. And about how much an obstacle context switching is...although you can hack together anything, once you start dealing with multiple frameworks the focus becomes more on the programming and less on the problem at hand. And then, just the amount of libraries out there in Python that focus on scientific computing, data analysis and machine learning that would be great to just plug into Mathematica.

I didn't quite follow your argument on why MathLink/WSTP API is a better model then J/Link for a Python interface.

For myself, I find it much easier to put together simple looping constructs...procedural programming...in Python than with Mathematica. Although I like functional programming and see its power, I find that it can get kind of dense. Also, I like having classes around and doing object oriented programming. For example, I really like doing a list of items like so:

class Item:
    def __init__(self, i, parent):
        self.number = i
        self.parent = parent

class List:
    def __init__(self, n):
        self.list = []
        for i in range(n):
            self.list.append(Item(i, self))

Using OO programming (sparingly, of course, mostly just to hold data) can really help to reason through what you are doing and I find Mathematica lacks such facilities, or, at least, I have not learned enough of Mathematica to not miss classes. Using the above structure, it is easy for me to put metadata type functions in the List class and item-specific functions in the Item class and loop through the list to examine the items, etc. But, what's missing is calling Mathematica straight from Python (or visa versa)!

Thanks again

Posted 9 months ago

I would also like to add my vote and support for Szabolcs' request to WRI to implement a low-level and fast interface to Python. My motivation is to get access to Python's superior machine learning libraries.

B

Posted 8 months ago

Just tried out the updated External-related functions in v11.3 this afternoon, and I feel really excited about writing Python code and getting results directly within the notebook! And now numpy-related objects get much better support from Mathematica, which thus can be directly read in without any conversion. In general, I really like this update and very happy to see the power of both systems, Mathematica and python, got combined in a synergistic way.

I'm Riccardo Di Virgilio, currently working at WRI and one contributors of the python implementation for ExternalEvaluate. Data transfer in the first implementation was not efficient, but we have been working to improve it, and with M12 we will ship a much more efficient data transfer thanks to WXF binary format. We are able to serialize a lot of built in data types including integer, float, decimals, datetime, time, complex, fractions, list, tuples, associations, etc...

My current setup using a 2015 2.5 GHz Intel Core i7 macbook pro provides a 500% performance boost over the example that was posted at the beginning of this thread.

In[2]:= ExternalEvaluate[session, "range(10000)"]; // AbsoluteTiming
Out[2]= {0.093617, Null}

We also developed an efficient conversion from numpy arrays to NumericArray.

In[7]:= ExternalEvaluate[session, "import numpy; numpy.ndarray(10000).reshape(4, 2500)"]; //
AbsoluteTiming
Out[7]= {0.001233, Null}

Another very nice feature is an inspectable traceback which provides a very nice interface to debug your python code which is just not possibile from the command line.

Attaching a screenshot.Python traceback

Thank you for showing this @Riccardo ! I do have the M12 prerelease, but I did not know about these improvements.

Performance was one of the deal-breakers for me. The other one was that there was no way to send structured data to Python. Converting expressions to Python code, which is then parsed and run by Python, is not a good way to do this. Was this fixed in M12? If yes, could you give an example please? If no, are there plans to fix it?

The simple use case I've been suggesting: take a matrix m = RandomReal[1, {100,100}]. How do I compute its eigenvalues (and perhaps eigenvectors) in Python? This includes all the basics one might expect from a language interface: send structured data, call functions, receive structured data.

About WXF:

What your post implies, but you did not spell it out, is that WXF is an openly documented format for storing and transferring Mathematica expressions. I was not aware of this, but from your post it sounded like you must have a decoder for it on the Python side. Will you make this decoder available to the rest of us, perhaps even open source? What about decoders for other languages?

I am also wondering about how this relates to MathLink. I always imagined MathLink to be based on a very similar binary format.

IMO a reasonable way to implement an interface to another language would be to first expose the MathLink API to it. Then we would have a means to transfer expressions back and forth. Why did you choose ZeroMQ and WXF instead of MathLink? Is it simply because Python already speaks ZMQ, or is the performance better with WXF? I remember once I found that transferring certain expressions as JSON, which isn't even binary, was faster than using MathLink, quite shocking!

Finally, I always thought that open sourcing MathLink would be beneficial because it would work around interfacing with GPL'd (or other copyleft) libraries. Is the semi-open WXF a step in this direction?

For what I know we are planning to have a way to convert mathematica expressions automatically in python cells, I think the future syntax might look something like print(Range[10]).

We are working on a python client library that we plan to release it on github in the future that allows to import/export WXF data files from python and to interact with a kernel using MathLink.

you can take a look at our code by evaluating "import wolframclient; str(wolframclient)" in a python cell. the library is still under active development, and we are in the process of writing documentation for it, which is why the code is not released yet.

Unfortunately I'm not in the position to reply to your other questions about mathlink or why we choose ZMQ over mathlink, but we might change the implementation.

Sincerely. Riccardo.

print(Range[10]) – do you mean a mix of Python and Mathematica syntax?

What really matters to me is to be able to use this interface to solve practical problems that come up day-to-day in my work. The current ExternalEvaluate is not capable of this. I'd like to suggest to let real user needs drive development, and also to set priorities based on such needs.

Let me give examples:

  • Call some minor scientifically oriented function that exists in some Python library but not Mathematica. E.g. compute a spherical Voronoi tessellation. This should be a no-fuss at most three-lines-of-code task.

  • A concrete application I have in mind is using the networkx library. Example task: compute a minimum weight cycle basis (not available in Mma). This won't be a three-line task because the involved data structures are more complex. But it should be not too hard to set up a framework for transferring graphs back and forth between the two systems, and once that is done, it should be easy to call any functions (like the cycle basis computation).

Python can open up a gateway to a huge number of useful libraries many of which are completely unavailable to Mathematica at this moment. Thus a Python interface should be taken very seriously, and preferably optimized specifically for Python (instead of making it generic and work with any language, like JavaScript).

Example: One task I had to solve recently was to call certain ITK functions from Mathematica (image processing). Like any high profile scientific library, ITK has a Python interface. In fact, it has two, one of them being specifically optimized for scripting language. Right now I had to use LibraryLink to make it work with good performance. It took more than a day to set up a framework for it, and even after that was done, each new function I need to access takes a 5-10 minute setup process. The kind of Python interface I wish for would make this task easy and seamless: directly transfer image data as a NumericArray/RawArray with negligible overhead, call the function, retrieve the result.

To sum up:

Please let real-world applications drive the development of this functionality. Ask users what they imagine doing with a Python interface. Ask those users who use Mathematica daily to get real work done.

We are more then welcome to listen to customers for this functionality, which is why we are going to release this library as open source code when ready.

As I mentioned this library allows you to import / export arbitrary mathematica expressions using WXF, and this format is optimized also for transfer PackedArray and NumericArray, and for most types the conversion is automatic.

You can create a dump in mathematica and export it, or you can programmatically start a kernel from python and retrieve the result of an arbitrary computation.

Automatic data conversion from WL to python is not done because WXF importer in python has been implemented very recently so it takes time to add this functionality (you might expect automatic conversion of DateObject to python datetime out of the box which right now works only in the opposite direction).

Take a look at the source code in wolframclient.serializers and wolframclient.deserializers, keep in mind that everything is subject to changes right now, but at least you might get an idea of what can be done.

One nice thing about WXF is that is support associations. Another (from a user perspective) is that it is simple and well documented. Consider e.g. the simple problem if detecting a packed array on a MathLink link. It took a bit of experimentation and guesswork with MathLink. With the open WXF specification, it is immediately clear what is possible, what is not possible, and what is the best way to do something.

I wonder if it is meant as a replacement for MathLink when developing links to other systems. I also wonder if it performs better. At least the decoding could probably be made faster as we do not depend on a closed library now, but can implement our own decoder. Of it only handles encoding/decoding, and does not give us a ready-to-use means of data transfer like MathLink does.

Even with LibraryLink, now it might be easier to transfer complex expressions encoded as WXF and transferred as a byte-type RawArray.

Posted 2 months ago

Hi Szabolcs, Where can I find the WXF specification? I can only find the Import Export possibilities but isn't that only disk based? Or can it also be used in streams? Happy to learn more thx

Hi! It's here: http://reference.wolfram.com/language/tutorial/WXFFormatDescription.html As far as I can tell, it's a complete specification.

This was not really advertised when it came out and I completely missed its significance. In particular, I missed the fact that it's fully documented, which is precisely what makes it useful.

Posted 2 months ago

Here's a teaser for something I've been working on for a bit. I've now gotten things working so I can run a python and Mathematica concurrently in respective notebook interfaces:

enter image description here

Posted 2 months ago

This looks really exciting! Do you have a documentation on this? Or, any related public project people can join?

Posted 2 months ago

Give this a look: http://community.wolfram.com/groups/-/m/t/1468475

If you want to contribute to the repo be my guest.

There are also some nice ideas from @Szabolcs Horvát here that I think really should be pursued in terms of making this more extensible.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract