Message Boards Message Boards

Can Mathematica save me hours of copy and paste

Posted 4 years ago

Full disclosure, I've worked at Wolfram in the design department for decades and know the power of the Wolfram Language but I haven't broken through to use it to solve my own problems even though I know there are tons of things I do in my daily life that the Wolfram Language could help me do better. In this case, after decades, I'm converting my personal websites to a mobile friendly template. I'm looking at about 3,000 html documents that need certain elements removed and replaced with something else. I've seen coworkers use Mathematica for complex find and replace queries and thought - wow - I know WL can save me days and days of cut and paste. So here's my first post ever to our community.

Is there a way I can have Mathematica replace any width and height attributes found in an html document with my new class="responsive-image" tag? So for example "width="xxx" height="xxx" needs replaced with "class="responsive-image"" in 3,000 documents. The xxx can vary which is why Dreamweaver's simple find and replace logic isn't enough. That's the info - let's give this a try. Thanks.

POSTED BY: heidi kellner
3 Replies
Posted 4 years ago

You can use Mathematica's RegularExpression[] to find matches and replace them:

input =  "
 width=\"1234\"   height=\"3456\"
 width =\"1234\"   height=\"3456\"
 width =   \"1234\"   height    =\"3456\"
 width=\"1234\"              height=\"3456\"
 width=\"123s\"    height=\"34z6\"
 ";

StringReplace[input,
 RegularExpression[
   "width\\s*=\\s*\"\\s*\\d*\\s*\"\\s*height\\s*=\\s*\"\\s*\\d*\\s*\"\
"] -> "class=\"responsive-image\""
]

This code returns the following string (note: since the last line was ill-formed, the replacement did not occur):

class="responsive-image"
class="responsive-image"
class="responsive-image"
class="responsive-image"
width="123s"    height="34z6"

Breaking down the regular expression:

"width\\s*=\\s*\"\\s*\\d*\\s*\"\\s*height\\s*=\\s*\"\\s*\\d*\\s*\"\"

"width": starts with "width"

"\\s*": any number of whitespace (spaces, tabs, etc) characters

"=": equals sign

"\\s*": any number of whitespace (spaces, tabs, etc) characters

"\"": a quotation mark

"\\s*": any number of whitespace (spaces, tabs, etc) characters

"\\d*": any number of digit (0, 1, 2, ... 9) characters

and so on.

A word of caution though, if the height field is given first, this won't match. Also, this will replace all instances of the matches it finds. There may be parts of your HTML document that have width and height fields that you don't want changed to responsive-image.

POSTED BY: Sam M

It would probably be easiest if the HTML documents were first imported as symbolic XML, then "fixed" with ReplaceAll and the appropriate set of replacement rules, and then exported back as HTML.

Could you please provide a couple of smallish files to try this out?

POSTED BY: Robert Nachbar

I've included five random html files that will need updated. I've targeted the most crucial and consistent part that needs updated (for starters, if this works WL can take this much further). For now it would be amazing to point Mathematica to a directory and have it go to work - removing all the old td/tr html and replacing it with a new structure. I've outlined the first easiest pass in the notebook. Can Wolfram Language Save Me?

POSTED BY: heidi kellner
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract