Blanagrams

GROUPS:
 Paul Wellin 12 Votes A blanagram is an anagram for another word except for the substitution of one letter. Think of Scrabble with a blank square (blank + anagram = blanagram). For example, phyla is a blanagram of glyph; replace the "g" with an "a" and find anagrams (see blog on Anagrams).We will prototype with a simple word, glyph. We'll start by getting a list of the individual characters in glyph. In[1]:= Characters["glyph"] Out[1]= {"g", "l", "y", "p", "h"} Start by replacing the first letter in glyph with an a and then finding all anagrams. The third argument to StringReplacePart is a list of beginning and ending positions for the replacement. In[2]:= StringReplacePart["glyph", "a", {1, 1}] Out[2]= "alyph" In[3]:= Anagrams[%] Out[3]= {"phyla", "haply"} Now do the same for each character position in the word. In[4]:= Map[StringReplacePart["glyph", "a", {#, #}] &, Range[StringLength["glyph"]]] Out[4]= {"alyph", "gayph", "glaph", "glyah", "glypa"} In[5]:= Flatten[Map[Anagrams, %]] Out[5]= {"phyla", "haply"} Having done this for the letter a, we now repeat for all other single characters. In[6]:= CharacterRange["a", "z"] Out[6]= {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"} In[7]:= blana = Table[ Map[StringReplacePart["glyph", ch, {#, #}] &, Range[StringLength["glyph"]]], {ch, CharacterRange["a", "z"]}] Out[7]= {{"alyph", "gayph", "glaph", "glyah", "glypa"}, {"blyph", "gbyph", "glbph", "glybh", "glypb"}, {"clyph", "gcyph", "glcph", "glych", "glypc"}, {"dlyph", "gdyph", "gldph", "glydh", "glypd"}, {"elyph", "geyph", "gleph", "glyeh", "glype"}, {"flyph", "gfyph", "glfph", "glyfh", "glypf"}, {"glyph", "ggyph", "glgph", "glygh", "glypg"}, {"hlyph", "ghyph", "glhph", "glyhh", "glyph"}, {"ilyph", "giyph", "gliph", "glyih", "glypi"}, {"jlyph", "gjyph", "gljph", "glyjh", "glypj"}, {"klyph", "gkyph", "glkph", "glykh", "glypk"}, {"llyph", "glyph", "gllph", "glylh", "glypl"}, {"mlyph", "gmyph", "glmph", "glymh", "glypm"}, {"nlyph", "gnyph", "glnph", "glynh", "glypn"}, {"olyph", "goyph", "gloph", "glyoh", "glypo"},{"plyph", "gpyph", "glpph", "glyph", "glypp"}, {"qlyph", "gqyph", "glqph", "glyqh", "glypq"}, {"rlyph", "gryph", "glrph", "glyrh", "glypr"}, {"slyph", "gsyph", "glsph", "glysh", "glyps"}, {"tlyph", "gtyph", "gltph", "glyth", "glypt"}, {"ulyph", "guyph", "gluph", "glyuh", "glypu"}, {"vlyph", "gvyph", "glvph", "glyvh", "glypv"}, {"wlyph", "gwyph", "glwph", "glywh", "glypw"}, {"xlyph", "gxyph", "glxph", "glyxh", "glypx"}, {"ylyph", "gyyph", "glyph", "glyyh", "glypy"}, {"zlyph", "gzyph", "glzph", "glyzh", "glypz"}} Because of the extra nesting of the output ( Table[Map[...]] ) we need to flatten the output at a deeper level. And delete duplicates. In[8]:= Flatten[Map[Anagrams, blana, {2}]] // DeleteDuplicates Out[8]= {"phyla", "haply", "glyph", "lymph", "sylph"} Finally, put all the pieces together to create a reusable function, Blanagrams. In[9]:= Blanagrams[word_String] := Module[{blana}, blana = Table[ Map[StringReplacePart[word, ch, {#, #}] &, Range[StringLength[word]]], {ch, CharacterRange["a", "z"]}]; DeleteDuplicates[Flatten[Map[Anagrams, blana, {2}]]] ] This turns out to be fairly quick for small words, but it bogs down for larger words. In[10]:= Blanagrams["glyph"] //Timing Out[10]= {1.25209, {"phyla", "haply", "glyph", "lymph", "sylph"}} In[11]:= Blanagrams["string"] //Timing Out[11]= {8.85717, {"taring", "rating", "gratin", "arsing", "grains", "sating", "giants", "grants", "gratis", "strain", "trains", "brings", "grinds", "sering", "singer", "signer", "resign", "reigns", "ingres", "signet", "tinges", "ingest", "tigers", "insert", "inters", "inerts", "string", "things", "nights", "rights", "girths", "tiring", "siring", "rising", "siting", "tigris", "glints", "roting", "soring", "signor", "groins", "ingots", "strong", "intros", "spring", "sprint", "prints", "stings", "turing", "truing", "strung", "grunts", "wrings", "trying", "stying", "stingy"}} Because we are doing similar computations for each of the letters of the alphabet, this is a good candidate for parallelizing. But which parts do we parallelize? One way to help determine where the computational bottlenecks are is to profile the code. Integrated development environments such as Wolfram Workbench have built-in profilers, but here we will create a simple set of profiling steps to determine where our bottlenecks are.Here is a small auxiliary function that wraps AbsoluteTiming around an expression and adds a tag to make it easy to identify the various parts of what is reported. In[12]:= timing[expr_, tag_] := Print[{NumberForm[First@AbsoluteTiming[expr], 10], tag}] In[13]:= SetAttributes[timing, HoldAll]; In[14]:= word = "string"; timing[ tmp = Table[Map[StringReplacePart[word, ch, {#, #}] &, Range[StringLength[word]]],{ch, CharacterRange["a", "z"]}];, "table" ]; timing[ Flatten[tmp2 = Map[Anagrams, tmp, {2}]];, "map Anagrams" ]; timing[ DeleteDuplicates[Flatten[tmp2]];, "flatten and delete duplicates" ] Out[15]= {0.000794,table} Out[16]= {8.986847,map Anagrams} Out[17]= {0.000050,flatten and delete duplicates} Not surprisingly, creating the many possible letter combinations is very quick. The greatest part of this computation is spent with mapping Anagrams across the many word combinations. So we can simply try to parallelize that using ParallelMap. In[17]:= BlanagramsParallel[word_String]:= Module[{blana}, blana = Table[Map[StringReplacePart[word,ch{#,#}]&, Range[StringLength[word]]],{ch,CharacterRange["a","z"]}]; DeleteDuplicates@Flatten[ParallelMap[Anagrams,blana,{2}]] ] Launch kernels on our current machine and compute. In[18]:= LaunchKernels[] Out[18]= {KernelObject[1,local],KernelObject[2,local]} In[19]:= DistributeDefinitions[Anagrams] Out[19]= {Anagrams,word} In[20]:= BlanagramsParallel["strings"]//AbsoluteTiming Out[20]= {37.792462, {ratings,gratins,staring,strains,resting,stinger,singers,signers, resigns,ingress,signets,ingests,tigress,inserts,strings,tirings,risings, sitings,sorting,storing,signors,tossing,springs,sprints,rusting,tryings,stringy}} For comparison, here is the computation done serially on one kernel. In[21]:= Blanagrams["strings"]//AbsoluteTiming Out[21]= {47.756149, {ratings,gratins,staring,strains,resting,stinger,singers,signers,resigns, ingress,signets,ingests,tigress,inserts,strings,tirings,risings,sitings,sorting, storing,signors,tossing,springs,sprints,rusting,tryings,stringy}} With the 2-kernel machine this was run on, we are getting a slight speedup. This particular computation parallelizes well and so on an 8-kernel machine, we would see substantially faster compute times. In[22]:= First[%]/First[%%] Out[22]= 1.2636422  NOTE: This article was originally posted on Paul Wellin's blog PROGRAMMING WITH MATHEMATICA.
1 year ago
2 Replies
 - you have earned "Featured Contributor" badge, congratulations !This is a great post and it has been selected for the curated Staff Picks group. Your profile is now distinguished by a "Featured Contributor" badge and displayed on the "Featured Contributor" board.
1 year ago
 Christopher Wolfram 3 Votes Nice post! I also like your blog post about anagrams.I think there are actually a few ways you could make this a bit quicker. First, for anagram finding, you can do something like this Select[ToLowerCase@DictionaryLookup[], Sort@Characters@# === Sort@Characters@"elvis" &] which returns {"elvis", "evils", "levis", "lives", "veils"} This is a bit more efficient than generating all of the permutations (as it looked like you were doing in the blog). For blanagrams, there is a somewhat analogous approach.This gets the same results as your Blanagrams["strings"], but runs in about 1.3 seconds on my computer. Select[ ToLowerCase@DictionaryLookup[], StringLength["strings"] === StringLength[#] && Total@Select[Merge[{CharacterCounts@"strings", -CharacterCounts@#}, Total], Positive] <= 1 & ] Let me see if I can explain this because it is a bit cryptic. Most of this code is defining what criteria need to be met for one word to be a blanagram of another. The first criteria is that they have to be the same length. The next criteria is a bit less clear. CharacterCounts gives an association with the counts for each character that appears in the string. The following line of code then finds the differences in the numbers of each character that appears either in # or "strings". Merge[{CharacterCounts@"strings", -CharacterCounts@#}, Total] For example, when # is set to be "ratings", this returns <|"s" -> 1, "g" -> 0, "n" -> 0, "i" -> 0, "r" -> 0, "t" -> 0, "a" -> -1|> This says that "strings" and "ratings" have the same number of Gs, Ns, Is, Rs, and Ts, but "strings" has 1 more "s" than "ratings", and "strings" has 1 fewer "a" than "ratings".We know that the total of this association has to be 0, because the string lengths are the same, and so have the same total number of characters. Therefore, if we just select the positive values, we will get the characters that "strings" contains but # doesn't, but we know that then # will contain an equal number of characters that "strings" doesn't contain. That means that the sum of the positive values of this association will tell us how many substitutions we would need. For your blanagrams, it seems that we want a single substitution or none at all, and so we set that this has to be less than or equal to 1.I think it would actually be equivalent to take the absolute value of the differences in character counts and then divide by 2, but that seems even more cryptic.One neat feature of this approach is that we can find double blanagrams (diblanagrams?), where 2 substitutions can be used, and this is no more computationally difficult! Select[ ToLowerCase@DictionaryLookup[], StringLength["strings"] === StringLength[#] && Total@Select[Merge[{CharacterCounts@"strings", -CharacterCounts@#}, Total], Positive] <= 2 & ] {"against", "ageists", "agustin", "airguns", "airings", "antsier", \ "arising", "artisan", "artists", "assigns", "austins", "basting", \ "besting", "bestirs", "bigness", "bistros", "bitings", "bossing", \ "brights", "britons", "busting", "carting", "casings", "casting", \ "christs", "cistern", "citrons", "consist", "cosigns", "costing", \ "crating", "cretins", "cringes", "cryings", "cursing", "cussing", \ "darting", "designs", "digests", "digress", "disgust", "dissent", \ "dissing", "dossing", "dotings", "dristan", "dusting", "easting", \ "egoists", "engross", "enlists", "ensigns", "entries", "erasing", \ "fasting", "fessing", "fingers", "firings", "fitness", "frights", \ "fringes", "frisson", "fussing", "gainers", "gaiters", "garnets", \ "garnish", "gassier", "gassing", "gastric", "genesis", "genesis", \ "gingers", "girting", "glisten", "glister", "gnostic", "gnostic", \ "goiters", "goriest", "granite", "grating", "gratins", "gringos", \ "gristle", "gristly", "grotius", "guitars", "gustier", "gusting", \ "gutsier", "hasting", "hinters", "hissing", "histing", "horsing", \ "hosting", "hurting", "ignites", "ignores", "incests", "infests", \ "ingests", "ingrate", "ingress", "insects", "inserts", "insight", \ "insists", "insteps", "insults", "insures", "integer", "interns", \ "inverts", "invests", "issuing", "jesting", "jurists", "kirsten", \ "kissing", "knights", "kristen", "kristin", "lasting", "lingers", \ "listens", "listing", "losings", "lusting", "margins", "marting", \ "martins", "massing", "matings", "messing", "migrant", "minster", \ "minters", "missing", "misters", "misters", "misting", "monists", \ "mossing", "musings", "mussing", "nastier", "negress", "nesting", \ "nighest", "nitrous", "nosiest", "nostril", "nudists", "nursing", \ "nutrias", "orating", "orients", "orients", "origins", "orisons", \ "ousting", "outings", "parings", "parsing", "parting", "passing", \ "pasting", "persist", "pistons", "porting", "posting", "prating", \ "priests", "prisons", "prosing", "purists", "pursing", "racists", \ "rafting", "raising", "raisins", "ranting", "rapists", "rasping", \ "ratings", "rations", "ratting", "ravings", "regains", "regents", \ "regions", "renting", "resents", "resigns", "resists", "resting", \ "retains", "retinas", "retsina", "retying", "reusing", "rifting", \ "ringers", "ringgit", "ringlet", "rinsing", "rioting", "risings", \ "risking", "rooting", "rosiest", "rossini", "rotting", "rousing", \ "routing", "rulings", "rushing", "russian", "rustics", "rusting", \ "rutting", "salting", "sangria", "sargent", "sarnies", "sarongs", \ "sassing", "satires", "savings", "sayings", "scaring", "scoring", \ "scripts", "searing", "seating", "seeings", "seiners", "seniors", \ "seniors", "sensing", "serving", "setting", "sharing", "shiners", \ "shintos", "shoring", "shrines", "shrinks", "sidings", "sifters", \ "sifting", "signals", "signers", "signets", "signora", "signore", \ "signori", "signors", "silents", "silting", "sinatra", "singers", \ "singles", "singlet", "sinkers", "sinners", "sirring", "sisters", \ "sistine", "sitings", "sitters", "sitting", "skating", "skiting", \ "slating", "slights", "smiting", "snaring", "snidest", "snifter", \ "snipers", "snoring", "soaring", "sorties", "sorting", "souring", \ "sousing", "sparing", "spigots", "spinets", "spirits", "spiting", \ "splints", "sporing", "sprains", "sprangs", "spriest", "springs", \ "springy", "sprints", "sprites", "squints", "squirts", "stagier", \ "staging", "stainer", "staking", "staling", "staring", "stating", \ "staving", "staying", "steiner", "stewing", "stigmas", "stiling", \ "stimson", "stinger", "stinker", "stogies", "stoking", "stonier", \ "stoning", "stories", "storing", "stowing", "strains", "straits", \ "strands", "strange", "strides", "strikes", "strings", "stringy", \ "stripes", "striven", "strives", "stygian", "styling", "suiting", \ "suitors", "sunkist", "sunrise", "surfing", "surging", "sussing", \ "sustain", "swinger", "syrians", "syringe", "takings", "tangier", \ "tangier", "tangies", "tarnish", "tarring", "tarting", "tasking", \ "tasting", "taxings", "tearing", "teasing", "tensing", "tensors", \ "terming", "terning", "testing", "thirsts", "throngs", "tidings", \ "tigress", "tilings", "timings", "tingles", "tinkers", "tinsels", \ "tirings", "tocsins", "tonsils", "torsion", "tossing", "touring", \ "tracing", "trading", "tragics", "transit", "treeing", "triking", \ "tristan", "trowing", "truisms", "tryings", "tsarism", "tsarist", \ "tunings", "turfing", "turning", "turnips", "tushing", "twiners", \ "twinges", "unrests", "urgings", "versing", "vesting", "virgins", \ "wasting", "westing", "wingers", "winters", "winters", "wisting", \ "witness", "wrights", "writing", "yessing", "zingers"}