Message Boards Message Boards

Thoughts on a Python interface, and why ExternalEvaluate is just not enough

GROUPS:

ExternalEvaluate, introduced in M11.2, is a nice initiative. It enables limited communication with multiple languages, including Python, and appears to be designed to be relatively easily extensible (see ExternalEvaluate`AddHeuristic if you want to investigate, though I wouldn't invest in this until it becomes documented).

My great fear, however, is that with ExternalEvaluate Wolfram will consider the question of a Python interface settled.

This would be a big mistake. A general framework, like ExternalEvaluate, that aims to work with any language and relies on passing code (contained in a string) to an evaluator and getting JSON back, will never be fast enough or flexible enough for practical scientific computing.

Consider a task as simple as computing the inverse of a $100\times100$ Mathematica matrix using Python (using numpy.linalg.inv).

I challenge people to implement this with ExternalEvaluate. It's not possible to do it in a practically useful way. The matrix has to be sent as code, and piecing together code from strings just can't replace structured communication. The result will need to be received as something encodable to JSON. This has terrible performance due to multiple conversions, and even risks losing numerical precision.

Just sending and receiving a tiny list of 10000 integers takes half a second (!)

In[6]:= ExternalEvaluate[py, "range(10000)"]; // AbsoluteTiming
Out[6]= {0.52292, Null}

Since I am primarily interested in scientific and numerical computing (as I believe most M users are), I simply won't use ExternalEvaluate much, as it's not suitable for this purpose. What if we need to do a mesh transformation that Mathematica can't currently handle, but there's a Python package for it? It's exactly the kind of problem I am looking to apply Python for. I have in fact done mesh transformations using MATLAB toolboxes directly from within Mathematica, using MATLink, while doing the rest of the processing in Mathematica. But I couldn't do this with ExternalEvaluate/Python in a reasonable way.

In 2017, any scientific computing system needs to have a Python interface to be taken seriously. MATLAB has one, and it is practically usable for numerical/scientific problems.


A Python interface

I envision a Python interface which works like this:

  • The MathLink/WSTP API is exposed to Python, and serves as the basis of the system. MathLink is good at transferring large numerical arrays efficiently.
  • Fundamental data types (lists, dictionaries, bignums, etc.) as well as datatypes critical for numerical computing (numpy arrays) can be transferred efficiently and bidirectionally. Numpy arrays in particular must translate to/from packed arrays in Mathematica with the lowest possible overhead.
  • Python functions can be set up to be called from within Mathematica, with automatic argument translation and return type translation. E.g.,

    PyFun["myfun"][ (* myfun is a function defined in Python *)
        {1,2,3} (* a list *), 
        PyNum[{1,2,3}] (* cast to numpy array, since the interpretation of {1,2,3} is ambiguous *), 
        PySet[{1,2,3}] (* cast to a set *)
    ]
    
  • The system should be user-extensible to add translations for new datatypes, e.g. a Python class that is needed frequently for some application.

  • The primary mode of operation should be that Python is run as a slave (subprocess) of Mathematica. But there should be a second mode of operation where both Mathematica and Python are being used interactively, and they are able to send/receive structured data to/from each other on demand.
  • As a bonus: Python can also call back to Mathematica, so e.g. we can use a numerical optimizer available in Python to find the minimum of a function defined in Mathematica
  • An interface whose primary purpose is to call Mathematica from Python is a different topic, but can be built on the same data translation framework described above.

The development of such an interface should be driven by real use cases. Ideally, Wolfram should talk to users who use Mathematica for more than fun and games, and do scientific computing as part of their daily work, with multiple tools (not just M). Start with a number of realistic problems, and make sure the interface can help in solving them. As a non-trivial test case for the datatype-extension framework, make sure people can set up auto-translation for SymPy objects, or a Pandas dataframe, or a networkx graph. Run FindMinimum on a Python function and make sure it performs well. (In a practical scenario this could be a function implementing a physics simulation rather than a simple formula.) As a performance stress test, run Plot3D (which triggers a very high number of evaluations) on a Python function. Performance and usability problems will be exposed by such testing early, and then the interface can be designed in such a way as to make these problems at least solvable (if not immediately solved in the first version). I do not believe that they are solvable with the ExternalEvaluate design.

Of course, this is not the only possible design for an interface. J/Link works differently: it has handles to Java-side objects. But it also has a different goal. Based on my experience with MATLink and RLink, I believe that for practical scientific/numerical computing, the right approach is what I outlined above, and that the performance of data structre translation is critical.


ExternalEvaluate

Don't get me wrong, I do think that the ExternalEvaluate framework is a very useful initiative, and it has its place. I am saying this because I looked at its source code and it appears to be easily extensible. R has zeromq and JSON capabilities, and it looks like one could set it up to work with ExternalEvaluate in a day or so. So does Perl, anyone want to give it a try? ExternalEvaluate is great because it is simple to use and works (or can be made to work) with just about any interpreted language that speaks JSON and zeromq. But it is also, in essence, a quick and dirty hack (that's extensible in a quick and dirty way), and won't be able to scale to the types of problems I mentioned above.


MathLink/WSTP

Let me finally say a few words about why MathLink/WSTP are critical for Mathematica, and what should be improved about them.

I believe that any serious interface should be built on top of MathLink. Since Mathematica already has a good interface capable of inter-process communication, that is designed to work well with Mathematica, and designed to handle numerical and symbolic data efficiently, use it!!

Two things are missing:

  • Better documentation and example programs, so more people will learn MathLink

  • If the MathLink library (not Mathematica!) were open source, people would be able to use it to link to libraries which are licensed under the GPL. Even a separate open source implementation that only supports shared memory passing would be sufficient—no need to publish the currently used code in full. Many scientific libraries are licensed under the GPL, often without their authors even realizing that they are practically preventing them from being used from closed source systems like Mathematica (due to the need to link to the MathLink libraries). To be precise, GPL licensed code can be linked with Mathematica, but the result cannot be shared with anyone. I have personally requested the author of a certain library to grant an exception for linking to Mathematica, and they did not grant it. Even worse, I am not sure they understood the issue. The authors of other libraries cannot grant such a permission because they themselves are using yet other GPL's libraries.

    MathLink already has a more permissive license than Mathematica. Why not go all the way and publish an open source implementation?

I am hoping that Wolfram will fix these two problems, and encourage people to create MathLink-based interfaces to other systems. (However, I also hope that Wolfram will create a high-quality Python link themselves instead of relying on the community.)

I have talked about the potential of Mathematica as a glue-language at some Wolfram events in France, and I believe that the capability to interface external libraries/systems easily is critical for Mathematica's future, and so is a healthy third-party package ecosystem.

POSTED BY: Szabolcs Horvát
Answer
1 month ago

Dear Szabolcs,

as usual your post is very helpful and instructive. Thank you for your posts both here on on StackExchange. They make my life much easier.

Thanks,

Marco

POSTED BY: Marco Thiel
Answer
1 month ago

As you've already mentioned: Mathematica does have facility to work with other languages.

Also are you running Mathematica Online? Access your docs remotely on iPhone and etc? Can you expect Notebooks to work properly remotely if they require zombie python drivers and interfacing linked to a particular Desktop code base / kernel?

Your main argument doesn't make sense. Mathematica does not force a choice that external programs use text (more than Mathematica itself does) and does not force you to use JSON: if you do so it's due to choices you made not Mathematica's limitations.

I don't see an issue in that Mathematica already has a few ways to work with external programs (by linking, by text mode, other) and also can run multiple Mathematica Evaluate processes. Why can't you run an interface, the Mathematica documentation seems to indicate it can drive one, also has built-in helpers to use the existing front-end to customize small but very-connected/powerful interface. I'm not sure your right either. Mathematica owning it's side of the math-link is because it spawns it (same as any software would), and doesn't mean you've been prevented to use your computer or software as you need.

Personally I don't want Mathematica to have GPL "drivers" inside it: I run an apple and mathematica specifically so I am no hostage to whatever hack anyone in the GPL has "a gpg key" to force on to me (ie, unix compatibility changes, continual security leaks, etc). I doubt my wish is true. Are you saying Mathematica should have GPL drivers linked in such a way that when shipped a notebook could spawn a zombie? So to take control over the Mathematica product and monitor people, if there were even a single "bug" in the driver it? I'm sure your not :) But you can see where seemless integration with GPL "gpg only uploads" software leads to: control by anonymous uploaders.

Python itself is a continual issue that users who compile it are forced into upgrading their compilers which force upgrades of their desktops: the language changes in such a way that old code no longer runs (it's not safe to work with - you write code today, and 1yr from now one's efforts may be ruined or need extensive rewriting). I complain Mathematica shouldn't make incompatible language changes because proffessors work their hearts out to write notebooks they hope others can use (intact), but they do ignore me at times, but mostly are good about it: today's GPL is not.

Not everyone is "into GPL" because GPL is not the smaller tighter community it used to be (college and government watched) thing it used to be. RFC used to mean something, IETF used to mean something: today they are both ignored. GPL is very political these days (who gets keys, how used) and has drawn lines in the sand about what country will make the machines it runs on "correctly" (ie, ARM v. Intel). I see no end to that in the near future: too many fixes and every fix would be hotly debated.

That being said I enjoy the older GPL world very much and still use it for awk(1), for Mathematica 4, and things like that.

I hate to be so long in response. But I don't think simply saying "Mathematica should link to GPL code the way GPL code would prefer it", is a good idea for everything involved which is quite too much.

POSTED BY: John Hendrickson
Answer
1 month ago

I cannot tell you more at the moment, but would like to assure you that this is the beginning of the story, not the end. There are development initiatives in that general direction and I encourage you to stay patient and look forward to more exciting things coming.

POSTED BY: Vitaliy Kaurov
Answer
1 month ago

Hello, I would like to request the Wolfram developers consider doing a good Python interface to Mathematica. The C interface seems to be discouraged and I looked over the J/Link interface and just thought..."man, do I really want to program in Java".

Personally, I feel having a Python interface would be fun. Also, academia is currently bursting out of the seams with Python programmers, so it would make business sense for Wolfram to tie Mathematica into Python.

Wolframscript is OK, but Python has classes which allow for easier storage and organization. Python is much easier to read and I would venture to say too much Wolframscript becomes "write once, read never," or, at least, it becomes pretty dense and obscure.

I think it would be great (and much easier) to be able to call the immense power of Mathematica from Python when writing command line scripts.

Thanks for the consideration!

Clarification: I had posted this as a separate topic, but I guess the moderators of the forum decided it fit better with this topic and moved my post here. Because of this, I haven't yet read the rest of the thread, but will do so now.

POSTED BY: Stephan Foley
Answer
25 days ago

The C interface seems to be discouraged

I really do not think this is the case.

and I looked over the J/Link interface and just thought..."man, do I really want to program in Java"

<!-- -->

Python is much easier to read and I would venture to say too much Wolframscript becomes "write once, read never," or, at least, it becomes pretty dense and obscure.

You seem to be looking to use a different language than Mathematica (Wolfram Language) because you are not satisfied with it. This is not the purpose of these links.

These links exist to make functionality that is not available in Mathematica accessible without leaving Mathematica.

For example, to simulate a physical system, you would need a fast low-level language such as C++, C or Fortran. No high-level language like Mathematica or Python will ever be able to compete in this area. But then you may want to run an optimization on one of the parameters of the simulation, map the parameter space in an efficient manner (e.g. using adaptive sampling), or just do a quick visualization to see what your simulation is doing. This is much easier and quicker to do in Mathematica.

Every tool has a purpose: you may be able to hammer a nail with a screwdriver, but it's not going to be very effective. These linking technologies make it possible to use the right tool for the right purpose. Mathematica is particularly good at interactive work/exploration. In my experience, it is great to use it as the centre of my workflow, and control other tools from it.

As you said, Python is increasingly popular for scientific work. There are Python libraries implementing functionality that Mathematica does not have at this moment. I would find it extremely useful to be able to access some of this functionality while staying in the same system, using the same familiar plotting and data wrangling functions, etc. Currently, if I need a specific library that only has a Python interface, I am forced to use not only this library but everything else from Python as well (e.g. plotting) because there is no efficient communication between the systems. I am confined within a single system, and can't pick the best tool for the task.

POSTED BY: Szabolcs Horvát
Answer
25 days ago

@Szabolcs

Very clear!

Hopefully, LLVM compilation will allow for a much faster Wolfram Language execution (eagerly waiting to hear the latest news on the WTC: how much does it cover? still the 99.99% of the language? How automatic is it getting? Can we imagine full automation/transparent to the user? What is the current optimization level / % of pure C?). Put together with an eventual evolution of the parallelization technology, a project that I'm still waiting to discover of its (eventual) existence..., and we will need to link with other languages on much fewer occasions... focusing less time on optimization, and more time on the main purpose of the algorithms.

POSTED BY: Pedro Fonseca
Answer
24 days ago

Hi Szabolcs, I put a note on my original post...I made a separate thread requesting that an interface be developed allowing Python to access Mathematica and I think the moderators decided to move my post here. So, now I'm here. Although we are talking different things...you want to access Python through Mathematica, I want to access Mathematica through Python, it is similar.

I agree with you that Mathematica's great strength and Python's great weakness is plotting. Also, Jupyter notebooks just can't compare with Mathematica notebooks. And, although a lot of the same functionality of Mathematica can be found in libraries such as SymPy, SciPy, and friends, it makes a big difference to have everything integrated and documented under one roof.

J/Link is a two way thing and you can use J/Link to call Mathematica from Java. That was the purpose of my original post, which I thought was to be a separate thread...to ask that a similar interface be developed to allow Python to call Mathematica functionality more transparently.

I am in total agreement with what you said here:

... Python is increasingly popular for scientific work. There are Python libraries implementing functionality that Mathematica does not have at this moment. I would find it extremely useful to be able to access some of this functionality while staying in the same system, using the same familiar plotting and data wrangling functions, etc. Currently, if I need a specific library that only has a Python interface, I am forced to use not only this library but everything else from Python as well (e.g. plotting) because there is no efficient communication between the systems. I am confined within a single system, and can't pick the best tool for the task.

and would just add that I would like to access Mathematica from Python for much the same purposes.

POSTED BY: Stephan Foley
Answer
24 days ago

Hi Stephan,

I made a separate thread requesting that an interface be developed allowing Python to access Mathematica and I think the moderators decided to move my post here. So, now I'm here.

I completely missed that, and I think I misunderstood what you meant. With that context now it makes sense.

POSTED BY: Szabolcs Horvát
Answer
20 days ago

Hi Szabolcs and thanks for the reply. Actually, I was a bit embarrassed about what happened, since I don't make it a habit of posting unrelated thoughts in other people's threads :-)

Back to the topic of this discussion, I would like to emphasize I agree about the need for a greater integration of Mathematica with Python. Thinking about it, I've decided that calling Python from Mathematica would be just as helpful as calling Mathematica from Python. Totally agree that the speed of data transfer is critical for things like computational geometry, as you mentioned, or other stuff like machine learning, and also about the fundamental importance of numpy arrays. And about how much an obstacle context switching is...although you can hack together anything, once you start dealing with multiple frameworks the focus becomes more on the programming and less on the problem at hand. And then, just the amount of libraries out there in Python that focus on scientific computing, data analysis and machine learning that would be great to just plug into Mathematica.

I didn't quite follow your argument on why MathLink/WSTP API is a better model then J/Link for a Python interface.

For myself, I find it much easier to put together simple looping constructs...procedural programming...in Python than with Mathematica. Although I like functional programming and see its power, I find that it can get kind of dense. Also, I like having classes around and doing object oriented programming. For example, I really like doing a list of items like so:

class Item:
    def __init__(self, i, parent):
        self.number = i
        self.parent = parent

class List:
    def __init__(self, n):
        self.list = []
        for i in range(n):
            self.list.append(Item(i, self))

Using OO programming (sparingly, of course, mostly just to hold data) can really help to reason through what you are doing and I find Mathematica lacks such facilities, or, at least, I have not learned enough of Mathematica to not miss classes. Using the above structure, it is easy for me to put metadata type functions in the List class and item-specific functions in the Item class and loop through the list to examine the items, etc. But, what's missing is calling Mathematica straight from Python (or visa versa)!

Thanks again

POSTED BY: Stephan Foley
Answer
19 days ago

Group Abstract Group Abstract