Message Boards Message Boards

PJLink: Hooking up Mathematica and Python

Posted 6 years ago

Here's a cross-post of something I originally wrote here about how to get python and Mathematica to work together like JLink.

I thought this might have broad appeal here.

PJLink: Hooking up Mathematica and Python

Mathematica is an incredibly powerful platform with a fun and intellectually pleasing language, but is incredibly expensive and closed source. Python is a convenient, pretty powerful language with a lot of support from the developer community. For as long as the two have existed people have been trying to tie them together, but very little has been done to do so at the native level with efficient, convenient exchange between the two. That's why over the past few weeks I took the time to build a clean, convenient link between the two. This post will go into how the link was built and some of its features, but first I think a little demo is appropriate.

A Quick Demo

Installing PJLink

The link is based off of the J/Link interface built into Mathematica for hooking up Java and Mathematica. To wit, I called it PJ/Link . It lives on my paclet server as well as GitHub, so we can easily install it from there:

PacletInstall["PJLink", "Site"->"http://www.wolframcloud.com/objects/b3m2a1.paclets/PacletServer"]

(*Out:*)

hookingupmathematicaandpython-274752603667507597

Loading PJLink in Jupyter

For this demo we'll need the path to this thing as well (note that the version might change in the future):

%["Location"]

(*Out:*)

"~/Library/Mathematica/Paclets/Repository/PJLink-1.0.0"

Now we'll leave Mathematica and open up a Jupyter notebook:

hookingupmathematicaandpython-921972091567300718

Next we'll get that path available so we can actually make use of the package. Then we'll load things from the subsidiary SubprocessKernel package which is included in the paclet and makes use of PJLink:

import os, sys
pjlink_path = "~/Library/Mathematica/Paclets/Repository/PJLink-1.0.0" #this is whatever path was extracted before
sys.path.insert(0, os.path.expanduser (pjlink_path))

from SubprocessKernel import SubprocessKernel
from SubprocessKernel import MathematicaBlock, LinkEnvironment
## these are helpers I' ll use in the demo

hookingupmathematicaandpython-695494794636171070

Bidirectional Communication

Once we have this we can start a subprocess kernel which will open a Mathematica front-end to interact with. We'll also start and evaluator Mathematica can use to call back into python.

You may see a long string of output from your C compiler as the setup.py file builds out the native library that PJLink uses. Don't worry, this should only happen once. If it fails, raise an issue on GitHub so I can deal with it.

Once Mathematica has loaded, we'll use the MathematicaBlock context manager so we can write something that looks a lot like Mathematica code and use the MEval function we'll define to run the code. That code for all this looks like:

ker = SubprocessKernel()
def MEval (expr, wait = True, kernel = ker) :
     "" "MEval evaluates a Mathematica expression in the Mathematica kernel
      
      " ""
    kernel.drain() # just to make sure things are clen
    return kernel.evaluate (expr, wait = wait)
ker.start()
ker.start_evaluator()

After that we can simply call into Mathematica:

with MathematicaBlock():
      res = MEval (Set (M.hi, "Hello from python!"))
res

hookingupmathematicaandpython-4447232891793681130

We can see string "Hello from python!" was set to the symbol hi on the Mathematica side and was returned back by MEval . Symbols that aren't in the "System`" context need to be prefaced by an M. as that's a special class that can resolve symbol names like that.

We can also get efficient data transfer of arrays from either side. Here we'll take some Mathematica data and get it back out on the python side. The first thing we need to do is go to the Mathematica notebook that opened and load the "PJLink`" context. Then we'll install the python runtime that the SubprocessKernel object configured. This looks like:

<<PJLink`
InstallPython[ LinkObject->SubprocessKernel`$PyEvaluateLink, ProcessObject->None];

Once it's installed, we'll use it directly via PyEvaluate :

With[{arr= RandomReal[{-1, 1}, {50, 50, 50}]},
  PyEvaluate[dat=arr]
  ]

Calls into python are done in an environment held only by the link, so to access that we need to wrap the evaluator we started ( ker.evaluator ) in a LinkEnvironment context manager:

with LinkEnvironment(ker.evaluator):
      res = dat.shape
res

hookingupmathematicaandpython-8804306407173974153

Arrays are held as NumPy arrays by default on the python side, although this may be disabled. If disabled, they're held as a data type called BufferedNDArray which holds the data as a single C-contiguous array and allows slicing and viewing into it (although no efficient math or manipulation of any sort).

Finally, to close out the demo, we'll plot something on the Mathematica side and watch it come back on the python side. The code for this should be pretty self-explanatory by this point, but there is one cute feature to note:

with MathematicaBlock():
      res = MEval(
           Rasterize(
                 Plot(Sin (M.x), List (M.x, 0, Times (2, Pi)),
                       ImageSize = [250, 250],
                       PlotLabel = "sin(x) as plotted in Mathematica"
                       )
                 )
           )
res

Unfortunately it really does matter that we pass a List expression instead of a python list for the second argument to Plot as otherwise the system hangs for reasons that aren't totally clear. On the other hand, we can see how nice options passing is in the interface. We make use of the python **kwargs setup and that ImageSize= ... and PlotLabel= ... both get automatically converted into rules (albeit with a String key instead of a Symbol ). The Rasterize is, sadly, similarly necessary as there is currently no logic in the package to automatically convert Graphics expression into their rasterized forms.

hookingupmathematicaandpython-6192533434254394386

I think we'll close out the demo here, though, and move onto a description of how this works.

PJLink Native Library

The heart of PJLink is the C library that connects a python runtime to MathLink. The source for this can be found here . This library, once compiled by the setup.py file packaged with it, implements the basic MathLink calls in a way that python can use them and attempts to do so with efficient memory usage and data transfer.

Data Sharing in the Native Library

The heart of the native library is the set of PutArray and GetArray functions it implements. Beyond anything else, it is the fast transfer of large arrays of data that makes a C-level connection so appealing. The way we handle this on the python side is via the python buffer protocol . We enforce the condition that all data sent and received on the python side must be handled by an object that can work with a C-contiguous buffer of data. By default this is done with NumPy if it is installed, but if not there is a custom object called BufferedNDArray in the HelperClasses package that deals with this.

Threading in the Native Library

Python has something called the Global Interpreter Lock (GIL) which is a method for synchronizing python state. Unfortunately for us, the presence of the GIL means that standard C calls of the kind we'll be using will cause all threads to lock. To get around this, every call into the MathLink library in the native library is wrapped in the MLTHREADED macro which handles the releasing and reacquiring of the lock. This allows our threads to work once more. Any extensions to the library should keep this in mind.

Class Structures

PJLink provides a glut of classes that handle the details communication, so we will quickly detail what the important ones do. More information is always available upon request.

The *Link classes

PJLink is based off of JLink and so it makes use of the same kind of class structure that JLink does. This means that it has a MathLink class that provides a template for the kind of link we'll work with and a KernelLink class that works specifically with Mathematica kernels. In general, we will only really work with a subclass of a KernelLink called a WrappedKernelLink that implements the KernelLink interface by calling into a NativeLink which is the only class which actually touches the native library at all.

If one is controlling a Mathematica kernel from python, it will be handled by a WrappedKernelLink .

Reader class

The Reader class handles the other half of the communication. It waits for calls from Mathematica and processes them via the KernelLink._handlePacket function. Most commonly these calls in turn call KernelLink.__callPython which builds a python call from the symbolic python packet that PyEvaluate sends over the link. A Reader does its best not to completely prevent its link from passing data to Mathematica, but in general it is best not to depend on this as the NativeLink interface allows only a single thread to access the library at once for reasons of safety and stability.

MathLinkEnvironment and MathLinkException

The MathLinkEnvironment is a standalone class that handles all of the various flags and state that the links need. It centralizes all information about what a given token or flag from MathLink means and provides utility functions for working with this. MathLinkException is a subclass of the standard python Exception class that handles the MathLink-specific exceptions that are returned. It in turn calls into MathLinkEnvironment to learn what various exceptions mean.

MPackage, MLSym, and MLExpr

The HelperClasses package provides a large number of (generally) smaller classes that serve to make code cleaner in its implementation. A big part of this is done by the MPackage , MLSym , and MLExpr classes, which allow for a way to create packets with a syntax that looks more like standard Mathematica code. MLSym and MLExpr are types that a KernelLink knows how to put onto a link and MPackage provides utilities and a custom __getattr__ so that the packet code can look like Mathematica code.

MathematicaBlock and LinkEnvironment

Both MathematicaBlock and LinkEnvironment are also in the HelperClasses. They both edit the current evaluation state as context managers so that explicit references to MPackage can be dropped and variables held by a given link can be easily accessed. Being context managers, they are both used via with statements and change the execution environment of the enclosing block.

BufferedNDArray, ImageData, and SparseArrayData

These are all data classes that allow for more efficient and convenient data transfer. The ImageData and SparseArrayData classes hold data coming in from Mathematica as put using PJLink`Package`AddTypeHints . They have methods to efficiently transform to more standard formats like PIL.Image and scipy.sparse.csr_matrix . As more data types are handled by AddTypeHints it can be assumed that more classes like these will be written.

Mathematica-side Package

That was all to do with the python side of things, which is where most of the development work had to go. On the other hand, the Mathematica side of the equation still requires some explanation. The package itself is really quite simple, so please feel free to peruse the source .

InstallPython

Notably, all it really requires is a  ```LinkObject``` , so you can pass one directly via the  ```LinkObject``` option. It will also by default try to make a python  ```ProcessObject``` but you can pass that via the  ```ProcessObject``` option or you can pass  ```None``` in which case it won't attach to a Mathematica controlled process.

### ClosePython

PyEvaluate / PyEvaluateString

This is the heart of the package. It takes Mathematica-esque code, converts it into a structure that can be processed by KernelLink.__callPython() and sends it over and waits for a response. The conversion is handled by PJLink`SymbolicPython`ToSymbolicPython which was originally written for the PyTools package . This is the best way to move data to python as things like Image objects, packable arrays, and SparseArray objects will be moved over intelligently.

PyEvaluateString is like PyEvaluate , but with the recognition that ToSymbolicPython will always be a little bit lacking. It allows one to simply call a string of python code on the link and get the results back.

PyWrite / PyWriteString / PyRead / PyReadErr

These are all functions that make use of the fact that when the Reader object started it allowed an interactive session to keep running and reading / writing on stdin, stdout, and stderr. The Read functions read from stdout and stderr and the write functions write to stdin. The former takes Mathematica code and auto-converts it into a string. The latter simply passes in the given string.

Future Work

PJLink 1.0.0, beefy as it already is, should only really be seen as the beginning. My hope is that much more can be done to allow for more native data type transfer and for intelligent communication between the two systems.

In my demo I tried to show some of the things that make the interoperation of the two so nice, but I obviously don't have the breadth of knowledge to know all of the many applications this can be put to. Applications built off of PJLink are always welcome and I'm happy to provide any requisite information and extensions to PJLink to get them built.

Alongside that, I think better integration on the Mathematica side is necessary. There is a partial interface for allowing a PythonObject structure to hide the details of PyEvaluate on the Mathematica side, but this needs work from both ends, first hooking up the Language`MutatationHandler interface and then extending the same on the python side. After that, a JavaBlock -like setup that allows a link to be opened, used, and cleaned up would be highly useful for sandboxing.

Finally, I'm sure there are numerous bugs hiding in the package as it stands. Please find them and let me know about them so they can be worked out.

In the meantime, I hope you enjoy PJLink and being able to use my two favorite languages symbiotically.

POSTED BY: b3m2a1 ​ 
5 Replies

Thank you! This has been a much needed addition for the last several years. As you point out, the structure of the Wolfram language is intellectually a delight, but one still must be part of a community. The silo approach leaves many of us feeling isolated. This is especially true now with the requirements for releasing code for rigor and reproducibility in scientific manuscripts.

Hopefully Wolfram will wake up and provide robust support for your python interface, which is critical for participating meaningfully in the rapidly advancing field of ML applications.

POSTED BY: Martin Zand
Posted 6 years ago

For anyone interested in contributing to the project, there's a need for people with non-Mac machines to compile the lib and contribute it to a machine/architecture specific directory so it can be used easily for people without a C compiler.

There's also a need for people to add custom encoder/decoders for efficient data type transfer from Mathematica to convenient python forms. Currently things like PackedArray, RawArray, Image, SparseArray, and HashTable are supported. This uses an encoder on the Mathematica side and a decoder on the python side. Submissions of either are highly appreciated.

All ideas and other contributions are welcome, too.

POSTED BY: b3m2a1 ​ 

Please explain the step "Loading PJLink in Jupyter". Namely:

  1. How do you start Jupyter? (I can start it from anaconda-Navigator.app or from erminal command anaconda-navigator. Is that what I should do?)

  2. Where do I obtain that Demo.ipynb file? (Or is it created using a New command?)

  3. How do I connect a Mathematica kernel to that Jupyter notebook using PJLink?? (If I launch jupyter from anaconda navigator and use the New command, I see no Mathematica choice for the kind, i.e., the kernel.

These are the sort of details whose omission drive potential users to avoidance!

POSTED BY: Murray Eisenberg
Posted 6 years ago

Sorry, I should have been more clear about some things in the post, I think.

As you likely know, Jupyter is a just a front-end for python. I am personally annoyed by/opposed to Anaconda and so only ever start Jupyer from the command line (i.e. with jupyter notebook), but starting Jupyter from Anaconda should be fine. You can also use PJLink without Jupyter, as I generally do—Jupyter is just ersatz Mathematica so I thought it'd be fun to run real and fake Mathematica at the same time.

If you want that demo notebook I can provide it, but all the code is in the post.

PJLink is a dual Mathematica-python package so the way you start the Mathematica kernel is by launching it via python code. See this block in particular:

ker = SubprocessKernel()
def MEval (expr, wait = True, kernel = ker) :
     "" "MEval evaluates a Mathematica expression in the Mathematica kernel
      
      " ""
    kernel.drain() # just to make sure things are clen
    return kernel.evaluate (expr, wait = wait)
ker.start()
ker.start_evaluator()


ker.start() starts the Mathematica kernel and ker.start_evaluator() starts a python-side evaluator that Mathematica can call into. Those two lines are the real magic here. Beyond that I'm just calling into python.

POSTED BY: b3m2a1 ​ 

enter image description here - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! This is wonderful work, thanks for sharing and keep it coming! Note we also placed it in the Add-Ons Group, which is reserved for showcasing of great work in the extended functionality domain.

POSTED BY: EDITORIAL BOARD
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract