Group Abstract

Message Boards

WOLFRAM COMMUNITY

11.6K Views

1 Reply

8 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Staff Picks Data Science Education Recreation Curated Data Wolfram Language Computational Linguistics Natural Language Processing Computational Humanities

Computational mnemonics

Carlo Barbieri

Carlo Barbieri, Wolfram Research

Posted 7 years ago

Yesterday I was reading this on Stephen's blog and something caught my attention. So I decided to do some coding to figure out how many other sentences like that existed. The strategy is as follows, we're going to first find all the possible partitions of 14 (with permutations), then split the formula string and extract words that match the sub-string. Partitions We need to construct all possible partitions of 14 which is the length of the formula: `IntegerPartition` returns sorted partitions, so we need to call `Permutations`. Join @@ Permutations /@ IntegerPartitions[14] returns 8192 permuted partitions of 14. It is left as an exercise to the reader that there are always `2^(n-1)` permuted partitions of the integer `n`. From these we can compute all the possible spans. spans[n_] := Map[ MapAt[ Function[a, a + 1], Partition[Prepend[Accumulate[#], 0], 2, 1], {All, 1} ] &, Join @@ (Permutations /@ IntegerPartitions[n]) ] Finding words Now we need to find all the words that match the right piece of the pattern: Options[filteredWords] = {IgnoreCase -> True}; filteredWords[{a_, b_, c_}, opts: OptionsPattern[]][pos_] := filteredWords[{a, b, c}, opts][pos] = DictionaryLookup[ StringReplace[StringTake["DDDpqrDpDDprpr", pos], {"D" -> _,"p" -> a, "q" -> b , "r" -> c}], IgnoreCase -> OptionValue[filteredWords, {opts}, IgnoreCase] ] `filteredWords` takes three letters a, b, c and two indices and constructs a StringExpression to be fed to `DictionaryLookup`, by `StringTake`-ing the part of the formula we want to construct mnemonics from. For example: given input `{"u", "r", "e"}` and indices `{10, 14}` it constructs the pattern `_ ~~ "ueue"`, and it extracts `{"queue"}`. Note that we've used the fact that `StringReplace` returns a `StringExpression` when some of the rhs are not strings themselves. As an intermediate step we can construct all the words that will make up our mnemonics: words = DeleteCases[ Map[filteredWords[{"u", "r", "e"}], spans[14], {2}], {___, {}, ___} ] And just from these we can say that there are 192 945 possible sentences: Total[Times @@@ Map[Length, words, {2}]] We can easily construct them: sentences = Flatten[Apply[Outer[StringRiffle[{##}, " "] &, ##] &, words, {1}]] Let's look at them: RandomSample[sentences, 10] returns {"Oz cur Ed um queue", "ID Wu re I US queue", "kW surer UK queue", "gap Urey US queue", "Ms nu re duo queue", "XL cur Ed uh queue", "a I Wu re gut queue", "AM nu re cub queue", "I you re jun queue", "a slur eh up queue"} Do they all end in queue? Select[sentences, Not @* StringMatchQ[___ ~~ "queue"]] unfortunately yes. Now we can try to do some further exploration. A few facts: There are no sentences with `{"p", "q", "r"}` There are a staggering 14 690 288 sentences with `{"a", "b", "c"}` If we restrict the search to case sensitive there are 0 sentences with `{"a", "b", "c"}`, which makes me suspect that there are far too many acronyms with abc in our dictionary. There are no three consecutive letters that have case-sensitive valid sentences. What about case-insensitive? assoc = AssociationMap[ With[ {words = DeleteCases[ Map[filteredWords[#, IgnoreCase -> True], spans[14], {2}], {___, {}, ___} ]}, Total[Times @@@ Map[Length, words, {2}]] ] &, Partition[Alphabet[], 3, 1] ] DeleteCases[assoc, 0] returns <\|{"a", "b", "c"} -> 14690288, {"g", "h", "i"} -> 672097, {"t", "u", "v"} -> 76164\|> For `{"t", "u", "v"}`, they all end in `"TV TV"`, so that's not very interesting. `{"g", "h", "i"}` on the other hand seems to contain quite a few I, which is definitely interesting, if it wasn't for the fact that they all end with `"I"`, which is probably not going to work well with English grammar. I egg HIV go a gig I I big hip gum gig I go ugh is God gig I I gig his gig GI GI Maybe with some punctuation... I have to stop writing here, but I will come back with some statistics of the 26!/23! (15600) 3-letter combinations.

POSTED BY: Carlo Barbieri

1 Reply

Sort By:

EDITORIAL BOARD

EDITORIAL BOARD, WOLFRAM

Posted 7 years ago

- Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming, and consider contributing your work to the The Notebook Archive!

POSTED BY: EDITORIAL BOARD

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback

Computational mnemonics

Partitions

Finding words