Message Boards Message Boards


String Patterns have a surprising overhead to initialize.

Posted 9 years ago
0 Replies
0 Total Likes

The background is that I have been using DictionaryLookup[] to generate data for solving word puzzles. For example all words with doubled letters with length >6. Mathematica returns a list of 18000+ words at a rate of ~7 micro-seconds/word.

tt = Timing [
  words2x = DictionaryLookup[w : (a___ ~~ x_ ~~ x_ ~~ b___) /; StringLength[w] > 6];

{0.124801, 18102}

That performance was very good. But as the searching took more branches and more calls to DictionaryLookup[] the time became annoying. Investigation showed that each call to DictionaryLookup[] with a pattern took about 15 milliseconds to initialize. It turns out that passing mutiple fixed strings is much faster than passing one pattern. (Simplified example) Find all 5 letter words that start with "pa" and end with "ed".

  DictionaryLookup["pa" ~~ x_ ~~ "ed"]
{0.015600, {"paced", "paged", "paled", "paned", "pared", "paved", "pawed", "payed"}}

15 milliseconds for a very short list. Much more than 7 micro-seconds per word.

What about a do-it-yourself search with fixed strings?

 Map[DictionaryLookup["pa" ~~ # ~~ "ed"] &, CharacterRange["a", "z"]] // Flatten
{0., {"paced", "paged", "paled", "paned", "pared", "paved", "pawed", "payed"}}

Orders of magnitude faster. We're back in the micro-second arena. So I'm reserving patterns for the big lists.

POSTED BY: Douglas Kubler
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract