Message Boards Message Boards

0
|
3863 Views
|
0 Replies
|
0 Total Likes
View groups...
Share
Share this post:

String Patterns have a surprising overhead to initialize.

Posted 10 years ago

The background is that I have been using DictionaryLookup[] to generate data for solving word puzzles. For example all words with doubled letters with length >6. Mathematica returns a list of 18000+ words at a rate of ~7 micro-seconds/word.

tt = Timing [
  words2x = DictionaryLookup[w : (a___ ~~ x_ ~~ x_ ~~ b___) /; StringLength[w] > 6];
  Length[words2x]
  ]
tt[[1]]/tt[[2]]

{0.124801, 18102}
6.8943*10^-6

That performance was very good. But as the searching took more branches and more calls to DictionaryLookup[] the time became annoying. Investigation showed that each call to DictionaryLookup[] with a pattern took about 15 milliseconds to initialize. It turns out that passing mutiple fixed strings is much faster than passing one pattern. (Simplified example) Find all 5 letter words that start with "pa" and end with "ed".

Timing[
  DictionaryLookup["pa" ~~ x_ ~~ "ed"]
  ]
{0.015600, {"paced", "paged", "paled", "paned", "pared", "paved", "pawed", "payed"}}

15 milliseconds for a very short list. Much more than 7 micro-seconds per word.

What about a do-it-yourself search with fixed strings?

Timing[
 Map[DictionaryLookup["pa" ~~ # ~~ "ed"] &, CharacterRange["a", "z"]] // Flatten
 ]
{0., {"paced", "paged", "paled", "paned", "pared", "paved", "pawed", "payed"}}

Orders of magnitude faster. We're back in the micro-second arena. So I'm reserving patterns for the big lists.

POSTED BY: Douglas Kubler
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract