Message Boards Message Boards


Pasting code from a PDF (created with MMA) into a Notebook?

Posted 1 month ago
4 Replies
1 Total Likes

I would like to copy a command from a book on Mathematica in PDF format (undoubtedly created with MMA) and paste it into a Notebook so it can be executed. When I select this text in the PDF file (which I'm entering by hand now)

Table[RandomSample[Range[39], 7] // Sort, {10 ^ 6}];

and paste it into a Notebook, it appears like this:

Table@RandomSample@Range@39D, 7D êê Sort, 810 ^ 6<D;

BTW: the code for '@' maps to '[' in the Mathamatica1Mono font. Using Style[<cmd>, FontFamily -> Mathematica1Mono] doesn't help. This prevents me from experimenting and educating myself using some great online resources. Importing the PDF into MMA doesn't help.

POSTED BY: James M Marks
4 Replies
Posted 11 days ago

I see Roland mentioned a sniping tool for Windows. Modern versions of MacOS will recognize text from files (PDF and others) opened in Preview. If you open the PDF in Preview, you can select a text block, copy it, and paste it elsewhere.

There's also an indie Mac app called TextSniper that will allow you to quickly copy text from anywhere on your screen. It will even recognize text from a video -- something that still seems like magic to me. TextSniper is far more convenient than the Apple-supplied text scanner in Preview. The app is listed for eight bucks. It's a good tool from a good developer.

All of the methods will get some conversion errors.

POSTED BY: Phil Earnhardt


As I explained in my posting about TextRecognize, pdf files created by Mathematica savings contain their input text together with its image graphics. You can reimport the Notebook text

Import[ "filepath\file.pdf", {"PageFormattedText"} ]
POSTED BY: Roland Franzius

Thank you, Roland, for your discussion. I tried your approach using a simple example from an online MMA book. As you can see from this small Notebook, it was not entirely successful. The problem is that TextRecognize does not understand the MMA symbol for Rule[] (and many other symbols which are unique to MMA:

When I look at the raw source of the PDF page, I see that it switches fonts hundreds of times, sometimes on a character by character basis. All of the fonts are actually stored in the PDF (to guarantee portability), including some MMA-private fonts.

I'm in the process of writing code to parse the raw source and substitute an appropriate ASCII string when I encounter a symbol which uses an MMA font.

I will update this discussion with the results of this project.

POSTED BY: James M Marks

Hi James,

in Windows press Shift + Windows + "S" to call the snipping tool. Paste the picture as an image into

ToExpression@TextRecognize[picture, Language -> "English"]

You have to allow internet access and you have to wait some minutes downloading the repository from the server.


POSTED BY: Roland Franzius
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract