Group Abstract

Message Boards

WOLFRAM COMMUNITY

31.8K Views

25 Replies

70 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Thoughts on a Python interface, and why ExternalEvaluate is just not enough

Szabolcs Horvát

Posted 8 years ago

`ExternalEvaluate`, introduced in M11.2, is a nice initiative. It enables limited communication with multiple languages, including Python, and appears to be designed to be relatively easily extensible (see ExternalEvaluate`AddHeuristic if you want to investigate, though I wouldn't invest in this until it becomes documented). My great fear, however, is that with `ExternalEvaluate` Wolfram will consider the question of a Python interface settled. This would be a big mistake. A general framework, like `ExternalEvaluate`, that aims to work with any language and relies on passing code (contained in a string) to an evaluator and getting JSON back, will never be fast enough or flexible enough for practical scientific computing. Consider a task as simple as computing the inverse of a $100\times100$ Mathematica matrix using Python (using `numpy.linalg.inv`). I challenge people to implement this with `ExternalEvaluate`. It's not possible to do it in a practically useful way. The matrix has to be sent as code, and piecing together code from strings just can't replace structured communication. The result will need to be received as something encodable to JSON. This has terrible performance due to multiple conversions, and even risks losing numerical precision. Just sending and receiving a tiny list of 10000 integers takes half a second (!) In[6]:= ExternalEvaluate[py, "range(10000)"]; // AbsoluteTiming Out[6]= {0.52292, Null} Since I am primarily interested in scientific and numerical computing (as I believe most M users are), I simply won't use `ExternalEvaluate` much, as it's not suitable for this purpose. What if we need to do a mesh transformation that Mathematica can't currently handle, but there's a Python package for it? It's exactly the kind of problem I am looking to apply Python for. I have in fact done mesh transformations using MATLAB toolboxes directly from within Mathematica, using MATLink, while doing the rest of the processing in Mathematica. But I couldn't do this with ExternalEvaluate/Python in a reasonable way. In 2017, any scientific computing system needs to have a Python interface to be taken seriously. MATLAB has one, and it is practically usable for numerical/scientific problems. A Python interface I envision a Python interface which works like this: The MathLink/WSTP API is exposed to Python, and serves as the basis of the system. MathLink is good at transferring large numerical arrays efficiently. Fundamental data types (lists, dictionaries, bignums, etc.) as well as datatypes critical for numerical computing (numpy arrays) can be transferred efficiently and bidirectionally. Numpy arrays in particular must translate to/from packed arrays in Mathematica with the lowest possible overhead. Python functions can be set up to be called from within Mathematica, with automatic argument translation and return type translation. E.g., PyFun["myfun"][ (* myfun is a function defined in Python ) {1,2,3} ( a list ), PyNum[{1,2,3}] ( cast to numpy array, since the interpretation of {1,2,3} is ambiguous ), PySet[{1,2,3}] ( cast to a set ) ] The system should be user-extensible to add translations for new datatypes, e.g. a Python class that is needed frequently for some application. The primary mode of operation should be that Python is run as a slave (subprocess) of Mathematica. But there should be a second mode of operation where both Mathematica and Python are being used interactively, and they are able to send/receive structured data to/from each other on demand. As a bonus: Python can also call back to Mathematica, so e.g. we can use a numerical optimizer available in Python to find the minimum of a function defined in Mathematica An interface whose primary purpose is to call Mathematica from Python is a different topic, but can be built on the same data translation framework described above. The development of such an interface should be driven by real use cases. Ideally, Wolfram should talk to users who use Mathematica for more than fun and games, and do scientific computing as part of their daily work, with multiple tools (not just M). Start with a number of realistic problems, and make sure the interface can help in solving them. As a non-trivial test case for the datatype-extension framework, make sure people can set up auto-translation for SymPy objects, or a Pandas dataframe, or a networkx graph. Run `FindMinimum` on a Python function and make sure it performs well. (In a practical scenario this could be a function implementing a physics simulation rather than a simple formula.) As a performance stress test, run `Plot3D` (which triggers a very high number of evaluations) on a Python function. Performance and usability problems will be exposed by such testing early, and then the interface can be designed* in such a way as to make these problems at least solvable (if not immediately solved in the first version). I do not believe that they are solvable with the `ExternalEvaluate` design. Of course, this is not the only possible design for an interface. J/Link works differently: it has handles to Java-side objects. But it also has a different goal. Based on my experience with MATLink and RLink, I believe that for practical scientific/numerical computing, the right approach is what I outlined above, and that the performance of data structre translation is critical. ExternalEvaluate Don't get me wrong, I do think that the `ExternalEvaluate` framework is a very useful initiative, and it has its place. I am saying this because I looked at its source code and it appears to be easily extensible. R has zeromq and JSON capabilities, and it looks like one could set it up to work with `ExternalEvaluate` in a day or so. So does Perl, anyone want to give it a try? `ExternalEvaluate` is great because it is simple to use and works (or can be made to work) with just about any interpreted language that speaks JSON and zeromq. But it is also, in essence, a quick and dirty hack (that's extensible in a quick and dirty way), and won't be able to scale to the types of problems I mentioned above. MathLink/WSTP Let me finally say a few words about why MathLink/WSTP are critical for Mathematica, and what should be improved about them. I believe that any serious interface should be built on top of MathLink. Since Mathematica already has a good interface capable of inter-process communication, that is designed to work well with Mathematica, and designed to handle numerical and symbolic data efficiently, use it!! Two things are missing: Better documentation and example programs, so more people will learn MathLink If the MathLink library (not Mathematica!) were open source, people would be able to use it to link to libraries which are licensed under the GPL. Even a separate open source implementation that only supports shared memory passing would be sufficientno need to publish the currently used code in full. Many scientific libraries are licensed under the GPL, often without their authors even realizing that they are practically preventing them from being used from closed source systems like Mathematica (due to the need to link to the MathLink libraries). To be precise, GPL licensed code can be linked with Mathematica, but the result cannot be shared with anyone. I have personally requested the author of a certain library to grant an exception for linking to Mathematica, and they did not grant it. Even worse, I am not sure they understood the issue. The authors of other libraries cannot grant such a permission because they themselves are using yet other GPL's libraries. MathLink already has a more permissive license than Mathematica. Why not go all the way and publish an open source implementation? I am hoping that Wolfram will fix these two problems, and encourage people to create MathLink-based interfaces to other systems. (However, I also hope that Wolfram will create a high-quality Python link themselves instead of relying on the community.) I have talked about the potential of Mathematica as a glue-language at some Wolfram events in France, and I believe that the capability to interface external libraries/systems easily is critical for Mathematica's future, and so is a healthy third-party package ecosystem.

POSTED BY: Szabolcs Horvát

25 Replies

Sort By:

Arnoud Buzing

Arnoud Buzing, Wolfram Research

Posted 4 months ago

Szabolcs, we have made massive improvements to the ExternalEvaluate framework of the last years especially w.r.t virtual environments. Wondering what your current thoughts are on this topic?

POSTED BY: Arnoud Buzing

Max Coplan

Posted 6 years ago

Thoughts on additions in 12.0?

POSTED BY: Max Coplan

Szabolcs Horvát

Posted 6 years ago

@Max Coplan Thoughts on additions in 12.0? I don't have time to write a detailed response so I'll just say that the improvements in 12.0 are large, and it's heading in the right direction. But there is still some way to go. I am already making use of it: https://mathematica.stackexchange.com/questions/195380/how-can-i-use-the-python-library-networkx-from-mathematica The data transfer from Python -> Mathematica is now structured, fast and customizable through the Wolfram Client for Python. My biggest wish is that this be implemented for the Mathematica -> Python direction as well for M12.1.

POSTED BY: Szabolcs Horvát

b3m2a1  

Posted 7 years ago

Here's a teaser for something I've been working on for a bit. I've now gotten things working so I can run a python and Mathematica concurrently in respective notebook interfaces:

POSTED BY: b3m2a1  

Ting Sun

Posted 7 years ago

This looks really exciting! Do you have a documentation on this? Or, any related public project people can join?

POSTED BY: Ting Sun

b3m2a1  

Posted 7 years ago

Give this a look: http://community.wolfram.com/groups/-/m/t/1468475 If you want to contribute to the repo be my guest. There are also some nice ideas from @Szabolcs Horvát here that I think really should be pursued in terms of making this more extensible.

POSTED BY: b3m2a1  

Szabolcs Horvát

Posted 7 years ago

One nice thing about WXF is that is support associations. Another (from a user perspective) is that it is simple and well documented. Consider e.g. the simple problem if detecting a packed array on a MathLink link. It took a bit of experimentation and guesswork with MathLink. With the open WXF specification, it is immediately clear what is possible, what is not possible, and what is the best way to do something. I wonder if it is meant as a replacement for MathLink when developing links to other systems. I also wonder if it performs better. At least the decoding could probably be made faster as we do not depend on a closed library now, but can implement our own decoder. Of it only handles encoding/decoding, and does not give us a ready-to-use means of data transfer like MathLink does. Even with LibraryLink, now it might be easier to transfer complex expressions encoded as WXF and transferred as a byte-type RawArray.

POSTED BY: Szabolcs Horvát

l van Veen

l van Veen, Hewlett-Packard Enterprise

Posted 7 years ago

Hi Szabolcs, Where can I find the WXF specification? I can only find the Import Export possibilities but isn't that only disk based? Or can it also be used in streams? Happy to learn more thx

POSTED BY: l van Veen

Szabolcs Horvát

Posted 7 years ago

Hi! It's here: http://reference.wolfram.com/language/tutorial/WXFFormatDescription.html As far as I can tell, it's a complete specification. This was not really advertised when it came out and I completely missed its significance. In particular, I missed the fact that it's fully documented, which is precisely what makes it useful.

POSTED BY: Szabolcs Horvát

Riccardo Di Virgilio

Riccardo Di Virgilio, Wolfram

Posted 7 years ago

We are more then welcome to listen to customers for this functionality, which is why we are going to release this library as open source code when ready. As I mentioned this library allows you to import / export arbitrary mathematica expressions using WXF, and this format is optimized also for transfer PackedArray and NumericArray, and for most types the conversion is automatic. You can create a dump in mathematica and export it, or you can programmatically start a kernel from python and retrieve the result of an arbitrary computation. Automatic data conversion from WL to python is not done because WXF importer in python has been implemented very recently so it takes time to add this functionality (you might expect automatic conversion of DateObject to python datetime out of the box which right now works only in the opposite direction). Take a look at the source code in wolframclient.serializers and wolframclient.deserializers, keep in mind that everything is subject to changes right now, but at least you might get an idea of what can be done.

POSTED BY: Riccardo Di Virgilio

Szabolcs Horvát

Posted 7 years ago

`print(Range[10])` do you mean a mix of Python and Mathematica syntax? What really matters to me is to be able to use this interface to solve practical problems that come up day-to-day in my work. The current ExternalEvaluate is not capable of this. I'd like to suggest to let real user needs drive development, and also to set priorities based on such needs. Let me give examples: Call some minor scientifically oriented function that exists in some Python library but not Mathematica. E.g. compute a spherical Voronoi tessellation. This should be a no-fuss at most three-lines-of-code task. A concrete application I have in mind is using the networkx library. Example task: compute a minimum weight cycle basis (not available in Mma). This won't be a three-line task because the involved data structures are more complex. But it should be not too hard to set up a framework for transferring graphs back and forth between the two systems, and once that is done, it should be easy to call any functions (like the cycle basis computation). Python can open up a gateway to a huge number of useful libraries many of which are completely unavailable to Mathematica at this moment. Thus a Python interface should be taken very seriously, and preferably optimized specifically for Python (instead of making it generic and work with any language, like JavaScript). Example: One task I had to solve recently was to call certain ITK functions from Mathematica (image processing). Like any high profile scientific library, ITK has a Python interface. In fact, it has two, one of them being specifically optimized for scripting language. Right now I had to use LibraryLink to make it work with good performance. It took more than a day to set up a framework for it, and even after that was done, each new function I need to access takes a 5-10 minute setup process. The kind of Python interface I wish for would make this task easy and seamless: directly transfer image data as a NumericArray/RawArray with negligible overhead, call the function, retrieve the result. To sum up: Please let real-world applications drive the development of this functionality. Ask users what they imagine doing with a Python interface. Ask those users who use Mathematica daily to get real work done.

POSTED BY: Szabolcs Horvát

Riccardo Di Virgilio

Riccardo Di Virgilio, Wolfram

Posted 7 years ago

POSTED BY: Riccardo Di Virgilio

Szabolcs Horvát

Posted 7 years ago

Thank you for showing this @Riccardo ! I do have the M12 prerelease, but I did not know about these improvements. Performance was one of the deal-breakers for me. The other one was that there was no way to send structured data to Python. Converting expressions to Python code, which is then parsed and run by Python, is not a good way to do this. Was this fixed in M12? If yes, could you give an example please? If no, are there plans to fix it? The simple use case I've been suggesting: take a matrix `m = RandomReal[1, {100,100}]`. How do I compute its eigenvalues (and perhaps eigenvectors) in Python? This includes all the basics one might expect from a language interface: send structured data, call functions, receive structured data. About WXF: What your post implies, but you did not spell it out, is that WXF is an openly documented format for storing and transferring Mathematica expressions. I was not aware of this, but from your post it sounded like you must have a decoder for it on the Python side. Will you make this decoder available to the rest of us, perhaps even open source? What about decoders for other languages? I am also wondering about how this relates to MathLink. I always imagined MathLink to be based on a very similar binary format. IMO a reasonable way to implement an interface to another language would be to first expose the MathLink API to it. Then we would have a means to transfer expressions back and forth. Why did you choose ZeroMQ and WXF instead of MathLink? Is it simply because Python already speaks ZMQ, or is the performance better with WXF? I remember once I found that transferring certain expressions as JSON, which isn't even binary, was faster than using MathLink, quite shocking! Finally, I always thought that open sourcing MathLink would be beneficial because it would work around interfacing with GPL'd (or other copyleft) libraries. Is the semi-open WXF a step in this direction?

POSTED BY: Szabolcs Horvát

Riccardo Di Virgilio

Riccardo Di Virgilio, Wolfram

Posted 7 years ago

I'm Riccardo Di Virgilio, currently working at WRI and one contributors of the python implementation for ExternalEvaluate. Data transfer in the first implementation was not efficient, but we have been working to improve it, and with M12 we will ship a much more efficient data transfer thanks to WXF binary format. We are able to serialize a lot of built in data types including integer, float, decimals, datetime, time, complex, fractions, list, tuples, associations, etc... My current setup using a 2015 2.5 GHz Intel Core i7 macbook pro provides a 500% performance boost over the example that was posted at the beginning of this thread. In[2]:= ExternalEvaluate[session, "range(10000)"]; // AbsoluteTiming Out[2]= {0.093617, Null} We also developed an efficient conversion from numpy arrays to NumericArray. In[7]:= ExternalEvaluate[session, "import numpy; numpy.ndarray(10000).reshape(4, 2500)"]; // AbsoluteTiming Out[7]= {0.001233, Null} Another very nice feature is an inspectable traceback which provides a very nice interface to debug your python code which is just not possibile from the command line. Attaching a screenshot.

POSTED BY: Riccardo Di Virgilio

Ting Sun

Posted 8 years ago

Just tried out the updated `External`-related functions in v11.3 this afternoon, and I feel really excited about writing Python code and getting results directly within the notebook! And now `numpy`-related objects get much better support from Mathematica, which thus can be directly read in without any conversion. In general, I really like this update and very happy to see the power of both systems, Mathematica and python, got combined in a synergistic way.

POSTED BY: Ting Sun

Bernard Gress

Posted 8 years ago

I would also like to add my vote and support for Szabolcs' request to WRI to implement a low-level and fast interface to Python. My motivation is to get access to Python's superior machine learning libraries. B

POSTED BY: Bernard Gress

Stephan Foley

Posted 8 years ago

Hi Szabolcs and thanks for the reply. Actually, I was a bit embarrassed about what happened, since I don't make it a habit of posting unrelated thoughts in other people's threads :-) Back to the topic of this discussion, I would like to emphasize I agree about the need for a greater integration of Mathematica with Python. Thinking about it, I've decided that calling Python from Mathematica would be just as helpful as calling Mathematica from Python. Totally agree that the speed of data transfer is critical for things like computational geometry, as you mentioned, or other stuff like machine learning, and also about the fundamental importance of numpy arrays. And about how much an obstacle context switching is...although you can hack together anything, once you start dealing with multiple frameworks the focus becomes more on the programming and less on the problem at hand. And then, just the amount of libraries out there in Python that focus on scientific computing, data analysis and machine learning that would be great to just plug into Mathematica. I didn't quite follow your argument on why MathLink/WSTP API is a better model then J/Link for a Python interface. For myself, I find it much easier to put together simple looping constructs...procedural programming...in Python than with Mathematica. Although I like functional programming and see its power, I find that it can get kind of dense. Also, I like having classes around and doing object oriented programming. For example, I really like doing a list of items like so: class Item: def __init__(self, i, parent): self.number = i self.parent = parent class List: def __init__(self, n): self.list = [] for i in range(n): self.list.append(Item(i, self)) Using OO programming (sparingly, of course, mostly just to hold data) can really help to reason through what you are doing and I find Mathematica lacks such facilities, or, at least, I have not learned enough of Mathematica to not miss classes. Using the above structure, it is easy for me to put metadata type functions in the List class and item-specific functions in the Item class and loop through the list to examine the items, etc. But, what's missing is calling Mathematica straight from Python (or visa versa)! Thanks again

POSTED BY: Stephan Foley

Stephan Foley

Posted 8 years ago

POSTED BY: Stephan Foley

Szabolcs Horvát

Posted 8 years ago

Hi Stephan, I made a separate thread requesting that an interface be developed allowing Python to access Mathematica and I think the moderators decided to move my post here. So, now I'm here. I completely missed that, and I think I misunderstood what you meant. With that context now it makes sense.

POSTED BY: Szabolcs Horvát

Pedro Fonseca

Pedro Fonseca, SUEZ Treatment Solutions

Posted 8 years ago

@Szabolcs Very clear! Hopefully, LLVM compilation will allow for a much faster Wolfram Language execution (eagerly waiting to hear the latest news on the WTC: how much does it cover? still the 99.99% of the language? How automatic is it getting? Can we imagine full automation/transparent to the user? What is the current optimization level / % of pure C?). Put together with an eventual evolution of the parallelization technology, a project that I'm still waiting to discover of its (eventual) existence..., and we will need to link with other languages on much fewer occasions... focusing less time on optimization, and more time on the main purpose of the algorithms.

POSTED BY: Pedro Fonseca

Stephan Foley

Posted 8 years ago

Hello, I would like to request the Wolfram developers consider doing a good Python interface to Mathematica. The C interface seems to be discouraged and I looked over the J/Link interface and just thought..."man, do I really want to program in Java". Personally, I feel having a Python interface would be fun. Also, academia is currently bursting out of the seams with Python programmers, so it would make business sense for Wolfram to tie Mathematica into Python. Wolframscript is OK, but Python has classes which allow for easier storage and organization. Python is much easier to read and I would venture to say too much Wolframscript becomes "write once, read never," or, at least, it becomes pretty dense and obscure. I think it would be great (and much easier) to be able to call the immense power of Mathematica from Python when writing command line scripts. Thanks for the consideration! Clarification: I had posted this as a separate topic, but I guess the moderators of the forum decided it fit better with this topic and moved my post here. Because of this, I haven't yet read the rest of the thread, but will do so now.

POSTED BY: Stephan Foley

Szabolcs Horvát

Posted 8 years ago

The C interface seems to be discouraged I really do not think this is the case. and I looked over the J/Link interface and just thought..."man, do I really want to program in Java" <!-- --> Python is much easier to read and I would venture to say too much Wolframscript becomes "write once, read never," or, at least, it becomes pretty dense and obscure. You seem to be looking to use a different language than Mathematica (Wolfram Language) because you are not satisfied with it. This is not the purpose of these links. These links exist to make functionality that is not available in Mathematica accessible without leaving Mathematica. For example, to simulate a physical system, you would need a fast low-level language such as C++, C or Fortran. No high-level language like Mathematica or Python will ever be able to compete in this area. But then you may want to run an optimization on one of the parameters of the simulation, map the parameter space in an efficient manner (e.g. using adaptive sampling), or just do a quick visualization to see what your simulation is doing. This is much easier and quicker to do in Mathematica. Every tool has a purpose: you may be able to hammer a nail with a screwdriver, but it's not going to be very effective. These linking technologies make it possible to use the right tool for the right purpose. Mathematica is particularly good at interactive work/exploration. In my experience, it is great to use it as the centre of my workflow, and control other tools from it. As you said, Python is increasingly popular for scientific work. There are Python libraries implementing functionality that Mathematica does not have at this moment. I would find it extremely useful to be able to access some of this functionality while staying in the same system, using the same familiar plotting and data wrangling functions, etc. Currently, if I need a specific library that only has a Python interface, I am forced to use not only this library but everything else from Python as well (e.g. plotting) because there is no efficient communication between the systems. I am confined within a single system, and can't pick the best tool for the task.

POSTED BY: Szabolcs Horvát

Vitaliy Kaurov

Vitaliy Kaurov, WOLFRAM Research

Posted 8 years ago

I cannot tell you more at the moment, but would like to assure you that this is the beginning of the story, not the end. There are development initiatives in that general direction and I encourage you to stay patient and look forward to more exciting things coming.

POSTED BY: Vitaliy Kaurov

Anonymous User

Posted 8 years ago

POSTED BY: Anonymous User

Marco Thiel

Marco Thiel, University of Aberdeen - Dept. of Physics/Mathematics

Posted 8 years ago

Dear Szabolcs, as usual your post is very helpful and instructive. Thank you for your posts both here on on StackExchange. They make my life much easier. Thanks, Marco

POSTED BY: Marco Thiel

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback

Thoughts on a Python interface, and why ExternalEvaluate is just not enough

A Python interface

ExternalEvaluate

MathLink/WSTP