Message Boards Message Boards

4 Replies
4 Total Likes
View groups...
Share this post:

How can I remove the formatting from imported RTF (Rich Text Format) files?

Posted 9 years ago

Hi everyone,

How can I strip away the formatting and leave only the text when I import an RTF?

I've imported as

Import[myFile, "RTF"]


POSTED BY: Gregory Lypny
4 Replies


As stated in the help document "Import and Export support RTF format Version 1.3." According to WikiPedia article on RTF ( version 1.3 is from 1993. A quick google search for rtf examples yielded this sites example containing different elements no images though.

Method 1.

rtfz = NETNew["System.Windows.Forms.RichTextBox"]
rtfz@Rtf = URLFetch[""];

Method 2.

  Import["", "RTF"]], 
 XMLElement["String", _, {mtext_}] -> mtext, Infinity]

Method 3.

rtfrules = ToExpression[Import["path of saved attached file rtfrules.txt on your system"]];
rtfrules, MetaCharacters -> Automatic]

Where rtfrules is the contents of the attached file. At some point in 2004 I made a beginning set of replacement rule based on RTF 1.6 or 1.7. This is a beginning set of rules setting all these rtf control tags to "" is not optimum.

Method 4.

If in Windows environment, then install a Generic/Text Printer Driver whose output goes to file and set it as default printer (before starting Mathematica)

nb = CreateDocument[
   Import["", "RTF"]];

The print dialog should popup to save the *.prn file set the paper size to "US Std Fanfold" for 120 characters wide or "Letter" for 80 characters wide. The resulting .prn file should contain ASCII (ANSI) text depending on layout may cutoff. Open saved .prn file in text editor to see if output is acceptable.

Method 5.

Do something similar to .NET method but using Java it would need to be a Swing object. I could not test this it is late and I have some java rust

rtfx = JavaNew[rtfobject] (javax.swing.text.rtf.RTFEditorKit)
rtfx@Rtf = URLFetch[""];

All these methods are starters as some methods would require memory management if applied repeatedly. The replacement rules would require the most work.

RTF is a bit dangerous format as it accepts embedding of external objects.

POSTED BY: Hans Michel
Posted 9 years ago

Hey Hans,

Thanks a load for this. Method 2 works the best and is flexible. The only formatting it leaves behind is the occasional bit of font information wrapping a table here and there. That' easy to get rid of.

Kind regards,


POSTED BY: Gregory Lypny


Maybe try something like this:

nb = Last[
  Import["C:\\Users\\YourName\\Desktop\\This is an RTF file.rtf", "Rules"]]

That should open a new notebook that contains the text contents of the RTF file. From there, I think you should be able to programmatically do whatever you want with the text.

POSTED BY: Tim Mayes
Posted 9 years ago

Thank you, Tim,

I'll look into the stuff on rules. Haven't used that until now.


POSTED BY: Gregory Lypny
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract