Message Boards Message Boards

UK Place Name Generator

Posted 8 years ago

Every time I see another article pop up wherein somebody trains a neural net to generate names of something, I feel obligated to go back and run the same training sets through my dead-simple setup to do the same thing in the Wolfram Language. This pass at generating British place names seemed like a fun one, since the training set goes deep into small towns and villages across the UK. So I'll start with a small set of functions I've used before for this sort of thing — "decamel" is a utility function to clean up and split apart any incidental camelcased words that show up in predictions; "nameGenerator" does some minimal string processing on a provided list of Wolfram Language entities or raw strings, and produces a SequencePredictorFunction; "predictionList" produces a list of results of varying lengths using a predictor function:

decamel[str_] := 
 StringTrim[
  StringJoin[
   StringSplit[
    str, {RegularExpression["([a-z])([A-Z])"] -> "$1 $2", 
     RegularExpression["([0-9])([A-Z])"] -> "$1 $2", 
     RegularExpression["([a-z])([0-9])"] -> "$1 $2"}]]]

predictionList[func_, num_, min_, max_, decam_: True] := 
 If[decam == True, 
  decamel /@ 
   Table[StringTrim@
     StringReplace[
      func["|", "RandomNextElement" -> RandomInteger[{min, max}]], 
      "|" -> " "], num],
  Table[StringTrim@
    StringReplace[
     func["|", "RandomNextElement" -> RandomInteger[{min, max}]], 
     "|" -> " "], num]]

nameGenerator[entOrString_List, extractor_: "SegmentedWords"] :=
 Block[{names, list},
  With[{heads = DeleteDuplicates[Head /@ entOrString]},
   Which[
    heads === {Entity},
    names = CommonName[DeleteMissing[entOrString]];
    list = 
     StringRiffle[StringSplit["|" <> # <> "|"], "|"] & /@ names;
    SequencePredict[list, FeatureExtractor -> extractor],
    heads === {String},
    names = StringTrim /@ DeleteMissing[entOrString];
    list = 
     StringRiffle[StringSplit["|" <> # <> "|"], "|"] & /@ names;
    SequencePredict[list, FeatureExtractor -> extractor]]]]

So I'll start by importing the file used in the original article, and just grabbing place names out of it (it also includes some numerical IDs, and county names):

uknames = 
  Import["https://cdn.obrienmedia.co.uk/cdn/farfuture/5-\
1bFjgWmjONhWhk9sGAeYzlIzhwHRSBIF_Fzr55UYs/mtime:1425905283/sites/\
default/files/uk_towns_and_counties.csv"];

namelist = uknames[[All, 2]] // Rest // DeleteDuplicates;

In[84]:= Select[namelist, StringContainsQ["("]][[;; 10]]

Out[84]= {"Wdig (Goodwick)", "Vermuden's Drain (Forty Foot)", "Valley \
(Dyffryn)", "Usk (Brynbuga)", "Upper Largo (Kirkton of Largo)", \
"Uisage Dubh (Black Water)", "Tyddewi (St David's)", "Treorci \
(Treorchy)", "Treorchy (Treorci)", "Trent (Piddle)"}

I don't want to try to generate names with parenthetical transcriptions or alternate forms, so let's split those up and treat the parentheticals as distinct names for training purposes:

In[78]:= splitter[rec_] := 
 StringTrim[StringSplit[rec, "("], {" ", ")"}]

In[105]:= newlist = Flatten[splitter /@ namelist];

In[83]:= Length[newlist] 
Out[83]= 41245

Then all that's left to do is make the SequencePredictorFunction, and generate some names (removing predicted names that were already in the training set, or that end with words of fewer than 5 characters):

ukpl = nameGenerator[newlist, "SegmentedCharacters"];

Multicolumn[
 Complement[
  Select[predictionList[ukpl, 400, 8, 18], 
   StringLength[StringSplit[#, " "][[-1]]] > 4 &], newlist], 6]

enter image description here

Not all of these are gems (or even readable), but there's some good stuff in here — my personal favorites include:

  • Blackleaze Ferry
  • Bleburgh
  • Farmlingthorpe
  • Kirphook
  • Low of Gosbe
  • Roebucklecott
  • Stainton Doirkmill
  • Tattin Grime
  • Toberland Garker
  • Winstapleton Dalby
POSTED BY: Alan Joyce
2 Replies

enter image description here - Congratulations! This post is now a Staff Pick as distinguished on your profile! Thank you, keep it coming!

POSTED BY: EDITORIAL BOARD

Fantastic! Great stuff!

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract