Yesterday I was reading this on Stephen's blog and something caught my attention.
So I decided to do some coding to figure out how many other sentences like that existed. The strategy is as follows, we're going to first find all the possible partitions of 14 (with permutations), then split the formula string and extract words that match the sub-string.
Partitions
We need to construct all possible partitions of 14 which is the length of the formula: IntegerPartition
returns sorted partitions, so we need to call Permutations
.
Join @@ Permutations /@ IntegerPartitions[14]
returns 8192 permuted partitions of 14. It is left as an exercise to the reader that there are always 2^(n-1)
permuted partitions of the integer n
.
From these we can compute all the possible spans.
spans[n_] := Map[
MapAt[
Function[a, a + 1],
Partition[Prepend[Accumulate[#], 0], 2, 1],
{All, 1}
] &,
Join @@ (Permutations /@ IntegerPartitions[n])
]
Finding words
Now we need to find all the words that match the right piece of the pattern:
Options[filteredWords] = {IgnoreCase -> True};
filteredWords[{a_, b_, c_}, opts: OptionsPattern[]][pos_] := filteredWords[{a, b, c}, opts][pos] =
DictionaryLookup[
StringReplace[StringTake["DDDpqrDpDDprpr", pos], {"D" -> _,"p" -> a, "q" -> b , "r" -> c}],
IgnoreCase -> OptionValue[filteredWords, {opts}, IgnoreCase]
]
filteredWords
takes three letters a, b, c and two indices and constructs a StringExpression to be fed to DictionaryLookup
, by StringTake
-ing the part of the formula we want to construct mnemonics from. For example: given input {"u", "r", "e"}
and indices {10, 14}
it constructs the pattern _ ~~ "ueue"
, and it extracts {"queue"}
. Note that we've used the fact that StringReplace
returns a StringExpression
when some of the rhs are not strings themselves.
As an intermediate step we can construct all the words that will make up our mnemonics:
words = DeleteCases[
Map[filteredWords[{"u", "r", "e"}], spans[14], {2}],
{___, {}, ___}
]
And just from these we can say that there are 192 945 possible sentences:
Total[Times @@@ Map[Length, words, {2}]]
We can easily construct them:
sentences = Flatten[Apply[Outer[StringRiffle[{##}, " "] &, ##] &, words, {1}]]
Let's look at them:
RandomSample[sentences, 10]
returns
{"Oz cur Ed um queue", "ID Wu re I US queue", "kW surer UK queue",
"gap Urey US queue", "Ms nu re duo queue", "XL cur Ed uh queue", "a I Wu re gut queue",
"AM nu re cub queue", "I you re jun queue", "a slur eh up queue"}
Do they all end in queue?
Select[sentences, Not @* StringMatchQ[___ ~~ "queue"]]
unfortunately yes.
Now we can try to do some further exploration. A few facts:
- There are no sentences with
{"p", "q", "r"}
- There are a staggering 14 690 288 sentences with
{"a", "b", "c"}
- If we restrict the search to case sensitive there are 0 sentences with
{"a", "b", "c"}
, which makes me suspect that there are far too many acronyms with abc in our dictionary.
- There are no three consecutive letters that have case-sensitive valid sentences.
What about case-insensitive?
assoc = AssociationMap[
With[
{words = DeleteCases[
Map[filteredWords[#, IgnoreCase -> True], spans[14], {2}],
{___, {}, ___}
]},
Total[Times @@@ Map[Length, words, {2}]]
] &,
Partition[Alphabet[], 3, 1]
]
DeleteCases[assoc, 0]
returns
<|{"a", "b", "c"} -> 14690288, {"g", "h", "i"} -> 672097, {"t", "u", "v"} -> 76164|>
For {"t", "u", "v"}
, they all end in "TV TV"
, so that's not very interesting.
{"g", "h", "i"}
on the other hand seems to contain quite a few I, which is definitely interesting, if it wasn't for the fact that they all end with "I"
, which is probably not going to work well with English grammar.
I egg HIV go a gig I
I big hip gum gig I
go ugh is God gig I
I gig his gig GI GI
Maybe with some punctuation...
I have to stop writing here, but I will come back with some statistics of the 26!/23! (15600) 3-letter combinations.