ExternalEvaluate
, introduced in M11.2, is a nice initiative. It enables limited communication with multiple languages, including Python, and appears to be designed to be relatively easily extensible (see ExternalEvaluate`AddHeuristic
if you want to investigate, though I wouldn't invest in this until it becomes documented).
My great fear, however, is that with ExternalEvaluate
Wolfram will consider the question of a Python interface settled.
This would be a big mistake. A general framework, like ExternalEvaluate
, that aims to work with any language and relies on passing code (contained in a string) to an evaluator and getting JSON back, will never be fast enough or flexible enough for practical scientific computing.
Consider a task as simple as computing the inverse of a
$100\times100$ Mathematica matrix using Python (using numpy.linalg.inv
).
I challenge people to implement this with ExternalEvaluate
. It's not possible to do it in a practically useful way. The matrix has to be sent as code, and piecing together code from strings just can't replace structured communication. The result will need to be received as something encodable to JSON. This has terrible performance due to multiple conversions, and even risks losing numerical precision.
Just sending and receiving a tiny list of 10000 integers takes half a second (!)
In[6]:= ExternalEvaluate[py, "range(10000)"]; // AbsoluteTiming
Out[6]= {0.52292, Null}
Since I am primarily interested in scientific and numerical computing (as I believe most M users are), I simply won't use ExternalEvaluate
much, as it's not suitable for this purpose. What if we need to do a mesh transformation that Mathematica can't currently handle, but there's a Python package for it? It's exactly the kind of problem I am looking to apply Python for. I have in fact done mesh transformations using MATLAB toolboxes directly from within Mathematica, using MATLink, while doing the rest of the processing in Mathematica. But I couldn't do this with ExternalEvaluate/Python in a reasonable way.
In 2017, any scientific computing system needs to have a Python interface to be taken seriously. MATLAB has one, and it is practically usable for numerical/scientific problems.
A Python interface
I envision a Python interface which works like this:
- The MathLink/WSTP API is exposed to Python, and serves as the basis of the system. MathLink is good at transferring large numerical arrays efficiently.
- Fundamental data types (lists, dictionaries, bignums, etc.) as well as datatypes critical for numerical computing (numpy arrays) can be transferred efficiently and bidirectionally. Numpy arrays in particular must translate to/from packed arrays in Mathematica with the lowest possible overhead.
Python functions can be set up to be called from within Mathematica, with automatic argument translation and return type translation. E.g.,
PyFun["myfun"][ (* myfun is a function defined in Python *)
{1,2,3} (* a list *),
PyNum[{1,2,3}] (* cast to numpy array, since the interpretation of {1,2,3} is ambiguous *),
PySet[{1,2,3}] (* cast to a set *)
]
The system should be user-extensible to add translations for new datatypes, e.g. a Python class that is needed frequently for some application.
- The primary mode of operation should be that Python is run as a slave (subprocess) of Mathematica. But there should be a second mode of operation where both Mathematica and Python are being used interactively, and they are able to send/receive structured data to/from each other on demand.
- As a bonus: Python can also call back to Mathematica, so e.g. we can use a numerical optimizer available in Python to find the minimum of a function defined in Mathematica
- An interface whose primary purpose is to call Mathematica from Python is a different topic, but can be built on the same data translation framework described above.
The development of such an interface should be driven by real use cases. Ideally, Wolfram should talk to users who use Mathematica for more than fun and games, and do scientific computing as part of their daily work, with multiple tools (not just M). Start with a number of realistic problems, and make sure the interface can help in solving them. As a non-trivial test case for the datatype-extension framework, make sure people can set up auto-translation for SymPy objects, or a Pandas dataframe, or a networkx graph. Run FindMinimum
on a Python function and make sure it performs well. (In a practical scenario this could be a function implementing a physics simulation rather than a simple formula.) As a performance stress test, run Plot3D
(which triggers a very high number of evaluations) on a Python function. Performance and usability problems will be exposed by such testing early, and then the interface can be designed in such a way as to make these problems at least solvable (if not immediately solved in the first version). I do not believe that they are solvable with the ExternalEvaluate
design.
Of course, this is not the only possible design for an interface. J/Link works differently: it has handles to Java-side objects. But it also has a different goal. Based on my experience with MATLink and RLink, I believe that for practical scientific/numerical computing, the right approach is what I outlined above, and that the performance of data structre translation is critical.
ExternalEvaluate
Don't get me wrong, I do think that the ExternalEvaluate
framework is a very useful initiative, and it has its place. I am saying this because I looked at its source code and it appears to be easily extensible. R has zeromq and JSON capabilities, and it looks like one could set it up to work with ExternalEvaluate
in a day or so. So does Perl, anyone want to give it a try? ExternalEvaluate
is great because it is simple to use and works (or can be made to work) with just about any interpreted language that speaks JSON and zeromq. But it is also, in essence, a quick and dirty hack (that's extensible in a quick and dirty way), and won't be able to scale to the types of problems I mentioned above.
MathLink/WSTP
Let me finally say a few words about why MathLink/WSTP are critical for Mathematica, and what should be improved about them.
I believe that any serious interface should be built on top of MathLink. Since Mathematica already has a good interface capable of inter-process communication, that is designed to work well with Mathematica, and designed to handle numerical and symbolic data efficiently, use it!!
Two things are missing:
Better documentation and example programs, so more people will learn MathLink
If the MathLink library (not Mathematica!) were open source, people would be able to use it to link to libraries which are licensed under the GPL. Even a separate open source implementation that only supports shared memory passing would be sufficient—no need to publish the currently used code in full. Many scientific libraries are licensed under the GPL, often without their authors even realizing that they are practically preventing them from being used from closed source systems like Mathematica (due to the need to link to the MathLink libraries). To be precise, GPL licensed code can be linked with Mathematica, but the result cannot be shared with anyone. I have personally requested the author of a certain library to grant an exception for linking to Mathematica, and they did not grant it. Even worse, I am not sure they understood the issue. The authors of other libraries cannot grant such a permission because they themselves are using yet other GPL's libraries.
MathLink already has a more permissive license than Mathematica. Why not go all the way and publish an open source implementation?
I am hoping that Wolfram will fix these two problems, and encourage people to create MathLink-based interfaces to other systems. (However, I also hope that Wolfram will create a high-quality Python link themselves instead of relying on the community.)
I have talked about the potential of Mathematica as a glue-language at some Wolfram events in France, and I believe that the capability to interface external libraries/systems easily is critical for Mathematica's future, and so is a healthy third-party package ecosystem.