Group Abstract Group Abstract

Message Boards Message Boards

Thoughts on a Python interface, and why ExternalEvaluate is just not enough

Posted 8 years ago
POSTED BY: Szabolcs Horvát
25 Replies

I cannot tell you more at the moment, but would like to assure you that this is the beginning of the story, not the end. There are development initiatives in that general direction and I encourage you to stay patient and look forward to more exciting things coming.

POSTED BY: Vitaliy Kaurov
Posted 7 years ago

Here's a teaser for something I've been working on for a bit. I've now gotten things working so I can run a python and Mathematica concurrently in respective notebook interfaces:

enter image description here

POSTED BY: b3m2a1 ​ 

Hi! It's here: http://reference.wolfram.com/language/tutorial/WXFFormatDescription.html As far as I can tell, it's a complete specification.

This was not really advertised when it came out and I completely missed its significance. In particular, I missed the fact that it's fully documented, which is precisely what makes it useful.

POSTED BY: Szabolcs Horvát

print(Range[10]) – do you mean a mix of Python and Mathematica syntax?

What really matters to me is to be able to use this interface to solve practical problems that come up day-to-day in my work. The current ExternalEvaluate is not capable of this. I'd like to suggest to let real user needs drive development, and also to set priorities based on such needs.

Let me give examples:

  • Call some minor scientifically oriented function that exists in some Python library but not Mathematica. E.g. compute a spherical Voronoi tessellation. This should be a no-fuss at most three-lines-of-code task.

  • A concrete application I have in mind is using the networkx library. Example task: compute a minimum weight cycle basis (not available in Mma). This won't be a three-line task because the involved data structures are more complex. But it should be not too hard to set up a framework for transferring graphs back and forth between the two systems, and once that is done, it should be easy to call any functions (like the cycle basis computation).

Python can open up a gateway to a huge number of useful libraries many of which are completely unavailable to Mathematica at this moment. Thus a Python interface should be taken very seriously, and preferably optimized specifically for Python (instead of making it generic and work with any language, like JavaScript).

Example: One task I had to solve recently was to call certain ITK functions from Mathematica (image processing). Like any high profile scientific library, ITK has a Python interface. In fact, it has two, one of them being specifically optimized for scripting language. Right now I had to use LibraryLink to make it work with good performance. It took more than a day to set up a framework for it, and even after that was done, each new function I need to access takes a 5-10 minute setup process. The kind of Python interface I wish for would make this task easy and seamless: directly transfer image data as a NumericArray/RawArray with negligible overhead, call the function, retrieve the result.

To sum up:

Please let real-world applications drive the development of this functionality. Ask users what they imagine doing with a Python interface. Ask those users who use Mathematica daily to get real work done.

POSTED BY: Szabolcs Horvát

The C interface seems to be discouraged

I really do not think this is the case.

and I looked over the J/Link interface and just thought..."man, do I really want to program in Java"

<!-- -->

Python is much easier to read and I would venture to say too much Wolframscript becomes "write once, read never," or, at least, it becomes pretty dense and obscure.

You seem to be looking to use a different language than Mathematica (Wolfram Language) because you are not satisfied with it. This is not the purpose of these links.

These links exist to make functionality that is not available in Mathematica accessible without leaving Mathematica.

For example, to simulate a physical system, you would need a fast low-level language such as C++, C or Fortran. No high-level language like Mathematica or Python will ever be able to compete in this area. But then you may want to run an optimization on one of the parameters of the simulation, map the parameter space in an efficient manner (e.g. using adaptive sampling), or just do a quick visualization to see what your simulation is doing. This is much easier and quicker to do in Mathematica.

Every tool has a purpose: you may be able to hammer a nail with a screwdriver, but it's not going to be very effective. These linking technologies make it possible to use the right tool for the right purpose. Mathematica is particularly good at interactive work/exploration. In my experience, it is great to use it as the centre of my workflow, and control other tools from it.

As you said, Python is increasingly popular for scientific work. There are Python libraries implementing functionality that Mathematica does not have at this moment. I would find it extremely useful to be able to access some of this functionality while staying in the same system, using the same familiar plotting and data wrangling functions, etc. Currently, if I need a specific library that only has a Python interface, I am forced to use not only this library but everything else from Python as well (e.g. plotting) because there is no efficient communication between the systems. I am confined within a single system, and can't pick the best tool for the task.

POSTED BY: Szabolcs Horvát
POSTED BY: Szabolcs Horvát

We are more then welcome to listen to customers for this functionality, which is why we are going to release this library as open source code when ready.

As I mentioned this library allows you to import / export arbitrary mathematica expressions using WXF, and this format is optimized also for transfer PackedArray and NumericArray, and for most types the conversion is automatic.

You can create a dump in mathematica and export it, or you can programmatically start a kernel from python and retrieve the result of an arbitrary computation.

Automatic data conversion from WL to python is not done because WXF importer in python has been implemented very recently so it takes time to add this functionality (you might expect automatic conversion of DateObject to python datetime out of the box which right now works only in the opposite direction).

Take a look at the source code in wolframclient.serializers and wolframclient.deserializers, keep in mind that everything is subject to changes right now, but at least you might get an idea of what can be done.

Thank you for showing this @Riccardo ! I do have the M12 prerelease, but I did not know about these improvements.

Performance was one of the deal-breakers for me. The other one was that there was no way to send structured data to Python. Converting expressions to Python code, which is then parsed and run by Python, is not a good way to do this. Was this fixed in M12? If yes, could you give an example please? If no, are there plans to fix it?

The simple use case I've been suggesting: take a matrix m = RandomReal[1, {100,100}]. How do I compute its eigenvalues (and perhaps eigenvectors) in Python? This includes all the basics one might expect from a language interface: send structured data, call functions, receive structured data.

About WXF:

What your post implies, but you did not spell it out, is that WXF is an openly documented format for storing and transferring Mathematica expressions. I was not aware of this, but from your post it sounded like you must have a decoder for it on the Python side. Will you make this decoder available to the rest of us, perhaps even open source? What about decoders for other languages?

I am also wondering about how this relates to MathLink. I always imagined MathLink to be based on a very similar binary format.

IMO a reasonable way to implement an interface to another language would be to first expose the MathLink API to it. Then we would have a means to transfer expressions back and forth. Why did you choose ZeroMQ and WXF instead of MathLink? Is it simply because Python already speaks ZMQ, or is the performance better with WXF? I remember once I found that transferring certain expressions as JSON, which isn't even binary, was faster than using MathLink, quite shocking!

Finally, I always thought that open sourcing MathLink would be beneficial because it would work around interfacing with GPL'd (or other copyleft) libraries. Is the semi-open WXF a step in this direction?

POSTED BY: Szabolcs Horvát

I'm Riccardo Di Virgilio, currently working at WRI and one contributors of the python implementation for ExternalEvaluate. Data transfer in the first implementation was not efficient, but we have been working to improve it, and with M12 we will ship a much more efficient data transfer thanks to WXF binary format. We are able to serialize a lot of built in data types including integer, float, decimals, datetime, time, complex, fractions, list, tuples, associations, etc...

My current setup using a 2015 2.5 GHz Intel Core i7 macbook pro provides a 500% performance boost over the example that was posted at the beginning of this thread.

In[2]:= ExternalEvaluate[session, "range(10000)"]; // AbsoluteTiming
Out[2]= {0.093617, Null}

We also developed an efficient conversion from numpy arrays to NumericArray.

In[7]:= ExternalEvaluate[session, "import numpy; numpy.ndarray(10000).reshape(4, 2500)"]; //
AbsoluteTiming
Out[7]= {0.001233, Null}

Another very nice feature is an inspectable traceback which provides a very nice interface to debug your python code which is just not possibile from the command line.

Attaching a screenshot.Python traceback

Szabolcs, we have made massive improvements to the ExternalEvaluate framework of the last years especially w.r.t virtual environments. Wondering what your current thoughts are on this topic?

POSTED BY: Arnoud Buzing

@Max Coplan

Thoughts on additions in 12.0?

I don't have time to write a detailed response so I'll just say that the improvements in 12.0 are large, and it's heading in the right direction. But there is still some way to go.

I am already making use of it:

https://mathematica.stackexchange.com/questions/195380/how-can-i-use-the-python-library-networkx-from-mathematica

The data transfer from Python -> Mathematica is now structured, fast and customizable through the Wolfram Client for Python. My biggest wish is that this be implemented for the Mathematica -> Python direction as well for M12.1.

POSTED BY: Szabolcs Horvát
Posted 7 years ago

Give this a look: http://community.wolfram.com/groups/-/m/t/1468475

If you want to contribute to the repo be my guest.

There are also some nice ideas from @Szabolcs Horvát here that I think really should be pursued in terms of making this more extensible.

POSTED BY: b3m2a1 ​ 

Hi Szabolcs, Where can I find the WXF specification? I can only find the Import Export possibilities but isn't that only disk based? Or can it also be used in streams? Happy to learn more thx

POSTED BY: l van Veen

@Szabolcs

Very clear!

Hopefully, LLVM compilation will allow for a much faster Wolfram Language execution (eagerly waiting to hear the latest news on the WTC: how much does it cover? still the 99.99% of the language? How automatic is it getting? Can we imagine full automation/transparent to the user? What is the current optimization level / % of pure C?). Put together with an eventual evolution of the parallelization technology, a project that I'm still waiting to discover of its (eventual) existence..., and we will need to link with other languages on much fewer occasions... focusing less time on optimization, and more time on the main purpose of the algorithms.

POSTED BY: Pedro Fonseca
Posted 8 years ago
POSTED BY: Stephan Foley
Posted 6 years ago

Thoughts on additions in 12.0?

POSTED BY: Max Coplan
Posted 7 years ago

This looks really exciting! Do you have a documentation on this? Or, any related public project people can join?

POSTED BY: Ting Sun
Posted 7 years ago

Just tried out the updated External-related functions in v11.3 this afternoon, and I feel really excited about writing Python code and getting results directly within the notebook! And now numpy-related objects get much better support from Mathematica, which thus can be directly read in without any conversion. In general, I really like this update and very happy to see the power of both systems, Mathematica and python, got combined in a synergistic way.

POSTED BY: Ting Sun
Posted 7 years ago

I would also like to add my vote and support for Szabolcs' request to WRI to implement a low-level and fast interface to Python. My motivation is to get access to Python's superior machine learning libraries.

B

POSTED BY: Bernard Gress
Posted 8 years ago

Hi Szabolcs and thanks for the reply. Actually, I was a bit embarrassed about what happened, since I don't make it a habit of posting unrelated thoughts in other people's threads :-)

Back to the topic of this discussion, I would like to emphasize I agree about the need for a greater integration of Mathematica with Python. Thinking about it, I've decided that calling Python from Mathematica would be just as helpful as calling Mathematica from Python. Totally agree that the speed of data transfer is critical for things like computational geometry, as you mentioned, or other stuff like machine learning, and also about the fundamental importance of numpy arrays. And about how much an obstacle context switching is...although you can hack together anything, once you start dealing with multiple frameworks the focus becomes more on the programming and less on the problem at hand. And then, just the amount of libraries out there in Python that focus on scientific computing, data analysis and machine learning that would be great to just plug into Mathematica.

I didn't quite follow your argument on why MathLink/WSTP API is a better model then J/Link for a Python interface.

For myself, I find it much easier to put together simple looping constructs...procedural programming...in Python than with Mathematica. Although I like functional programming and see its power, I find that it can get kind of dense. Also, I like having classes around and doing object oriented programming. For example, I really like doing a list of items like so:

class Item:
    def __init__(self, i, parent):
        self.number = i
        self.parent = parent

class List:
    def __init__(self, n):
        self.list = []
        for i in range(n):
            self.list.append(Item(i, self))

Using OO programming (sparingly, of course, mostly just to hold data) can really help to reason through what you are doing and I find Mathematica lacks such facilities, or, at least, I have not learned enough of Mathematica to not miss classes. Using the above structure, it is easy for me to put metadata type functions in the List class and item-specific functions in the Item class and loop through the list to examine the items, etc. But, what's missing is calling Mathematica straight from Python (or visa versa)!

Thanks again

POSTED BY: Stephan Foley

Hi Stephan,

I made a separate thread requesting that an interface be developed allowing Python to access Mathematica and I think the moderators decided to move my post here. So, now I'm here.

I completely missed that, and I think I misunderstood what you meant. With that context now it makes sense.

POSTED BY: Szabolcs Horvát
Posted 8 years ago

Hi Szabolcs, I put a note on my original post...I made a separate thread requesting that an interface be developed allowing Python to access Mathematica and I think the moderators decided to move my post here. So, now I'm here. Although we are talking different things...you want to access Python through Mathematica, I want to access Mathematica through Python, it is similar.

I agree with you that Mathematica's great strength and Python's great weakness is plotting. Also, Jupyter notebooks just can't compare with Mathematica notebooks. And, although a lot of the same functionality of Mathematica can be found in libraries such as SymPy, SciPy, and friends, it makes a big difference to have everything integrated and documented under one roof.

J/Link is a two way thing and you can use J/Link to call Mathematica from Java. That was the purpose of my original post, which I thought was to be a separate thread...to ask that a similar interface be developed to allow Python to call Mathematica functionality more transparently.

I am in total agreement with what you said here:

... Python is increasingly popular for scientific work. There are Python libraries implementing functionality that Mathematica does not have at this moment. I would find it extremely useful to be able to access some of this functionality while staying in the same system, using the same familiar plotting and data wrangling functions, etc. Currently, if I need a specific library that only has a Python interface, I am forced to use not only this library but everything else from Python as well (e.g. plotting) because there is no efficient communication between the systems. I am confined within a single system, and can't pick the best tool for the task.

and would just add that I would like to access Mathematica from Python for much the same purposes.

POSTED BY: Stephan Foley
Anonymous User
Anonymous User
Posted 8 years ago

As you've already mentioned: Mathematica does have facility to work with other languages.

Also are you running Mathematica Online? Access your docs remotely on iPhone and etc? Can you expect Notebooks to work properly remotely if they require zombie python drivers and interfacing linked to a particular Desktop code base / kernel?

Your main argument doesn't make sense. Mathematica does not force a choice that external programs use text (more than Mathematica itself does) and does not force you to use JSON: if you do so it's due to choices you made not Mathematica's limitations.

I don't see an issue in that Mathematica already has a few ways to work with external programs (by linking, by text mode, other) and also can run multiple Mathematica Evaluate processes. Why can't you run an interface, the Mathematica documentation seems to indicate it can drive one, also has built-in helpers to use the existing front-end to customize small but very-connected/powerful interface. I'm not sure your right either. Mathematica owning it's side of the math-link is because it spawns it (same as any software would), and doesn't mean you've been prevented to use your computer or software as you need.

Personally I don't want Mathematica to have GPL "drivers" inside it: I run an apple and mathematica specifically so I am no hostage to whatever hack anyone in the GPL has "a gpg key" to force on to me (ie, unix compatibility changes, continual security leaks, etc). I doubt my wish is true. Are you saying Mathematica should have GPL drivers linked in such a way that when shipped a notebook could spawn a zombie? So to take control over the Mathematica product and monitor people, if there were even a single "bug" in the driver it? I'm sure your not :) But you can see where seemless integration with GPL "gpg only uploads" software leads to: control by anonymous uploaders.

Python itself is a continual issue that users who compile it are forced into upgrading their compilers which force upgrades of their desktops: the language changes in such a way that old code no longer runs (it's not safe to work with - you write code today, and 1yr from now one's efforts may be ruined or need extensive rewriting). I complain Mathematica shouldn't make incompatible language changes because proffessors work their hearts out to write notebooks they hope others can use (intact), but they do ignore me at times, but mostly are good about it: today's GPL is not.

Not everyone is "into GPL" because GPL is not the smaller tighter community it used to be (college and government watched) thing it used to be. RFC used to mean something, IETF used to mean something: today they are both ignored. GPL is very political these days (who gets keys, how used) and has drawn lines in the sand about what country will make the machines it runs on "correctly" (ie, ARM v. Intel). I see no end to that in the near future: too many fixes and every fix would be hotly debated.

That being said I enjoy the older GPL world very much and still use it for awk(1), for Mathematica 4, and things like that.

I hate to be so long in response. But I don't think simply saying "Mathematica should link to GPL code the way GPL code would prefer it", is a good idea for everything involved which is quite too much.

POSTED BY: Anonymous User

Dear Szabolcs,

as usual your post is very helpful and instructive. Thank you for your posts both here on on StackExchange. They make my life much easier.

Thanks,

Marco

POSTED BY: Marco Thiel
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard