Group Abstract

Message Boards

WOLFRAM COMMUNITY

29.1K Views

5 Replies

20 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Staff Picks Software Development Mathematica Mathematica Add-Ons External Programs and Systems Interface Construction Wolfram Language Packages

PJLink: Hooking up Mathematica and Python

b3m2a1  

Posted 7 years ago

Here's a cross-post of something I originally wrote here about how to get python and Mathematica to work together like JLink. I thought this might have broad appeal here. PJLink: Hooking up Mathematica and Python Mathematica is an incredibly powerful platform with a fun and intellectually pleasing language, but is incredibly expensive and closed source. Python is a convenient, pretty powerful language with a lot of support from the developer community. For as long as the two have existed people have been trying to tie them together, but very little has been done to do so at the native level with efficient, convenient exchange between the two. That's why over the past few weeks I took the time to build a clean, convenient link between the two. This post will go into how the link was built and some of its features, but first I think a little demo is appropriate. A Quick Demo Installing PJLink The link is based off of the J/Link interface built into Mathematica for hooking up Java and Mathematica. To wit, I called it PJ/Link . It lives on my paclet server as well as GitHub, so we can easily install it from there: PacletInstall["PJLink", "Site"->"http://www.wolframcloud.com/objects/b3m2a1.paclets/PacletServer"] (Out:) Loading PJLink in Jupyter For this demo we'll need the path to this thing as well (note that the version might change in the future): %["Location"] (Out:) "~/Library/Mathematica/Paclets/Repository/PJLink-1.0.0" Now we'll leave Mathematica and open up a Jupyter notebook: Next we'll get that path available so we can actually make use of the package. Then we'll load things from the subsidiary `SubprocessKernel` package which is included in the paclet and makes use of PJLink: import os, sys pjlink_path = "~/Library/Mathematica/Paclets/Repository/PJLink-1.0.0" #this is whatever path was extracted before sys.path.insert(0, os.path.expanduser (pjlink_path)) from SubprocessKernel import SubprocessKernel from SubprocessKernel import MathematicaBlock, LinkEnvironment ## these are helpers I' ll use in the demo Bidirectional Communication Once we have this we can start a subprocess kernel which will open a Mathematica front-end to interact with. We'll also start and evaluator Mathematica can use to call back into python. You may see a long string of output from your C compiler as the setup.py file builds out the native library that PJLink uses. Don't worry, this should only happen once. If it fails, raise an issue on GitHub so I can deal with it. Once Mathematica has loaded, we'll use the `MathematicaBlock` context manager so we can write something that looks a lot like Mathematica code and use the `MEval` function we'll define to run the code. That code for all this looks like: ker = SubprocessKernel() def MEval (expr, wait = True, kernel = ker) : "" "MEval evaluates a Mathematica expression in the Mathematica kernel " "" kernel.drain() # just to make sure things are clen return kernel.evaluate (expr, wait = wait) ker.start() ker.start_evaluator() After that we can simply call into Mathematica: with MathematicaBlock(): res = MEval (Set (M.hi, "Hello from python!")) res We can see string `"Hello from python!"` was set to the symbol `hi` on the Mathematica side and was returned back by `MEval` . Symbols that aren't in the "System`" context need to be prefaced by an `M.` as that's a special class that can resolve symbol names like that. We can also get efficient data transfer of arrays from either side. Here we'll take some Mathematica data and get it back out on the python side. The first thing we need to do is go to the Mathematica notebook that opened and load the "PJLink`" context. Then we'll install the python runtime that the `SubprocessKernel` object configured. This looks like: <<PJLink` InstallPython[ LinkObject->SubprocessKernel`$PyEvaluateLink, ProcessObject->None]; Once it's installed, we'll use it directly via `PyEvaluate` : With[{arr= RandomReal[{-1, 1}, {50, 50, 50}]}, PyEvaluate[dat=arr] ] Calls into python are done in an environment held only by the link, so to access that we need to wrap the evaluator we started ( `ker.evaluator` ) in a `LinkEnvironment` context manager: with LinkEnvironment(ker.evaluator): res = dat.shape res Arrays are held as NumPy arrays by default on the python side, although this may be disabled. If disabled, they're held as a data type called `BufferedNDArray` which holds the data as a single C-contiguous array and allows slicing and viewing into it (although no efficient math or manipulation of any sort). Finally, to close out the demo, we'll plot something on the Mathematica side and watch it come back on the python side. The code for this should be pretty self-explanatory by this point, but there is one cute feature to note: with MathematicaBlock(): res = MEval( Rasterize( Plot(Sin (M.x), List (M.x, 0, Times (2, Pi)), ImageSize = [250, 250], PlotLabel = "sin(x) as plotted in Mathematica" ) ) ) res Unfortunately it really does matter that we pass a `List` expression instead of a python list for the second argument to `Plot` as otherwise the system hangs for reasons that aren't totally clear. On the other hand, we can see how nice options passing is in the interface. We make use of the python `*kwargs` setup and that `ImageSize= ...` and `PlotLabel= ...` both get automatically converted into rules (albeit with a `String` key instead of a `Symbol` ). The `Rasterize` is, sadly, similarly necessary as there is currently no logic in the package to automatically convert `Graphics` expression into their rasterized forms. I think we'll close out the demo here, though, and move onto a description of how this works. PJLink Native Library The heart of PJLink is the C library that connects a python runtime to MathLink. The source for this can be found here . This library, once compiled by the setup.py file packaged with it, implements the basic MathLink calls in a way that python can use them and attempts to do so with efficient memory usage and data transfer. Data Sharing in the Native Library The heart of the native library is the set of `PutArray` and `GetArray` functions it implements. Beyond anything else, it is the fast transfer of large arrays of data that makes a C-level connection so appealing. The way we handle this on the python side is via the python buffer protocol . We enforce the condition that all data sent and received on the python side must be handled by an object that can work with a C-contiguous buffer of data. By default this is done with NumPy if it is installed, but if not there is a custom object called `BufferedNDArray` in the HelperClasses package that deals with this. Threading in the Native Library Python has something called the Global Interpreter Lock (GIL) which is a method for synchronizing python state. Unfortunately for us, the presence of the GIL means that standard C calls of the kind we'll be using will cause all threads to lock. To get around this, every call into the MathLink library in the native library is wrapped in the `MLTHREADED` macro which handles the releasing and reacquiring of the lock. This allows our threads to work once more. Any extensions to the library should keep this in mind. Class Structures PJLink provides a glut of classes that handle the details communication, so we will quickly detail what the important ones do. More information is always available upon request. The Link classes PJLink is based off of JLink and so it makes use of the same kind of class structure that JLink does. This means that it has a `MathLink` class that provides a template for the kind of link we'll work with and a `KernelLink` class that works specifically with Mathematica kernels. In general, we will only really work with a subclass of a `KernelLink` called a `WrappedKernelLink` that implements the `KernelLink` interface by calling into a `NativeLink` which is the only class which actually touches the native library at all. If one is controlling a Mathematica kernel from python, it will be handled by a `WrappedKernelLink` . Reader class The `Reader` class handles the other half of the communication. It waits for calls from Mathematica and processes them via the `KernelLink._handlePacket` function. Most commonly these calls in turn call `KernelLink.__callPython` which builds a python call from the symbolic python packet that `PyEvaluate` sends over the link. A `Reader` does its best not to completely prevent its link from passing data to Mathematica, but in general it is best not to depend on this as the `NativeLink` interface allows only a single thread to access the library at once for reasons of safety and stability. MathLinkEnvironment and MathLinkException The `MathLinkEnvironment` is a standalone class that handles all of the various flags and state that the links need. It centralizes all information about what a given token or flag from MathLink means and provides utility functions for working with this. `MathLinkException` is a subclass of the standard python `Exception` class that handles the MathLink-specific exceptions that are returned. It in turn calls into `MathLinkEnvironment` to learn what various exceptions mean. MPackage, MLSym, and MLExpr The HelperClasses package provides a large number of (generally) smaller classes that serve to make code cleaner in its implementation. A big part of this is done by the `MPackage` , `MLSym` , and `MLExpr` classes, which allow for a way to create packets with a syntax that looks more like standard Mathematica code. `MLSym` and `MLExpr` are types that a `KernelLink` knows how to put onto a link and `MPackage` provides utilities and a custom `__getattr__` so that the packet code can look like Mathematica code. MathematicaBlock and LinkEnvironment Both `MathematicaBlock` and `LinkEnvironment` are also in the HelperClasses. They both edit the current evaluation state as context managers so that explicit references to `MPackage` can be dropped and variables held by a given link can be easily accessed. Being context managers, they are both used via `with` statements and change the execution environment of the enclosing block. BufferedNDArray, ImageData, and SparseArrayData These are all data classes that allow for more efficient and convenient data transfer. The `ImageData` and `SparseArrayData` classes hold data coming in from Mathematica as put using PJLink`Package`AddTypeHints . They have methods to efficiently transform to more standard formats like `PIL.Image` and `scipy.sparse.csr_matrix` . As more data types are handled by `AddTypeHints` it can be assumed that more classes like these will be written. Mathematica-side Package That was all to do with the python side of things, which is where most of the development work had to go. On the other hand, the Mathematica side of the equation still requires some explanation. The package itself is really quite simple, so please feel free to peruse the source . InstallPython Notably, all it really requires is a ```LinkObject``` , so you can pass one directly via the ```LinkObject``` option. It will also by default try to make a python ```ProcessObject``` but you can pass that via the ```ProcessObject``` option or you can pass ```None``` in which case it won't attach to a Mathematica controlled process. ### ClosePython PyEvaluate / PyEvaluateString This is the heart of the package. It takes Mathematica-esque code, converts it into a structure that can be processed by `KernelLink.__callPython()` and sends it over and waits for a response. The conversion is handled by PJLink`SymbolicPython`ToSymbolicPython which was originally written for the PyTools package . This is the best way to move data to python as things like `Image` objects, packable arrays, and `SparseArray` objects will be moved over intelligently. `PyEvaluateString` is like `PyEvaluate` , but with the recognition that `ToSymbolicPython` will always be a little bit lacking. It allows one to simply call a string of python code on the link and get the results back. PyWrite / PyWriteString / PyRead / PyReadErr These are all functions that make use of the fact that when the `Reader` object started it allowed an interactive session to keep running and reading / writing on stdin, stdout, and stderr. The `Read` functions read from stdout and stderr and the write functions write to stdin. The former takes Mathematica code and auto-converts it into a string. The latter simply passes in the given string. Future Work PJLink 1.0.0, beefy as it already is, should only really be seen as the beginning. My hope is that much more can be done to allow for more native data type transfer and for intelligent communication between the two systems. In my demo I tried to show some of the things that make the interoperation of the two so nice, but I obviously don't have the breadth of knowledge to know all of the many applications this can be put to. Applications built off of PJLink are always welcome and I'm happy to provide any requisite information and extensions to PJLink to get them built. Alongside that, I think better integration on the Mathematica side is necessary. There is a partial interface for allowing a `PythonObject` structure to hide the details of `PyEvaluate` on the Mathematica side, but this needs work from both ends, first hooking up the Language`MutatationHandler interface and then extending the same on the python side. After that, a `JavaBlock` -like setup that allows a link to be opened, used, and cleaned up would be highly useful for sandboxing. Finally, I'm sure there are numerous bugs hiding in the package as it stands. Please find them and let me know about them so they can be worked out. In the meantime, I hope you enjoy PJLink and being able to use my two favorite languages symbiotically.

POSTED BY: b3m2a1  

5 Replies

Sort By:

Martin Zand

Martin Zand, University of Rochester

Posted 7 years ago

Thank you! This has been a much needed addition for the last several years. As you point out, the structure of the Wolfram language is intellectually a delight, but one still must be part of a community. The silo approach leaves many of us feeling isolated. This is especially true now with the requirements for releasing code for rigor and reproducibility in scientific manuscripts. Hopefully Wolfram will wake up and provide robust support for your python interface, which is critical for participating meaningfully in the rapidly advancing field of ML applications.

POSTED BY: Martin Zand

b3m2a1  

Posted 7 years ago

Sorry, I should have been more clear about some things in the post, I think. As you likely know, Jupyter is a just a front-end for python. I am personally annoyed by/opposed to Anaconda and so only ever start Jupyer from the command line (i.e. with `jupyter notebook`), but starting Jupyter from Anaconda should be fine. You can also use PJLink without Jupyter, as I generally do—Jupyter is just ersatz Mathematica so I thought it'd be fun to run real and fake Mathematica at the same time. If you want that demo notebook I can provide it, but all the code is in the post. PJLink is a dual Mathematica-python package so the way you start the Mathematica kernel is by launching it via python code. See this block in particular: ker = SubprocessKernel() def MEval (expr, wait = True, kernel = ker) : "" "MEval evaluates a Mathematica expression in the Mathematica kernel " "" kernel.drain() # just to make sure things are clen return kernel.evaluate (expr, wait = wait) ker.start() ker.start_evaluator() `ker.start()` starts the Mathematica kernel and `ker.start_evaluator()` starts a python-side evaluator that Mathematica can call into. Those two lines are the real magic here. Beyond that I'm just calling into python.

POSTED BY: b3m2a1  

b3m2a1  

Posted 7 years ago

For anyone interested in contributing to the project, there's a need for people with non-Mac machines to compile the lib and contribute it to a machine/architecture specific directory so it can be used easily for people without a C compiler. There's also a need for people to add custom encoder/decoders for efficient data type transfer from Mathematica to convenient python forms. Currently things like `PackedArray`, `RawArray`, `Image`, `SparseArray`, and `HashTable` are supported. This uses an encoder on the Mathematica side and a decoder on the python side. Submissions of either are highly appreciated. All ideas and other contributions are welcome, too.

POSTED BY: b3m2a1  

Murray Eisenberg

Murray Eisenberg, University of Massachusetts Amherst

Posted 7 years ago

POSTED BY: Murray Eisenberg

EDITORIAL BOARD

EDITORIAL BOARD, WOLFRAM

Posted 7 years ago

- Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! This is wonderful work, thanks for sharing and keep it coming! Note we also placed it in the Add-Ons Group, which is reserved for showcasing of great work in the extended functionality domain.

POSTED BY: EDITORIAL BOARD

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback

PJLink: Hooking up Mathematica and Python

PJLink: Hooking up Mathematica and Python

A Quick Demo

Installing PJLink

Loading PJLink in Jupyter

Bidirectional Communication

PJLink Native Library

Data Sharing in the Native Library

Threading in the Native Library

Class Structures

The *Link classes

Reader class

MathLinkEnvironment and MathLinkException

MPackage, MLSym, and MLExpr

MathematicaBlock and LinkEnvironment

BufferedNDArray, ImageData, and SparseArrayData

Mathematica-side Package

InstallPython

PyEvaluate / PyEvaluateString

PyWrite / PyWriteString / PyRead / PyReadErr

Future Work