Message Boards Message Boards

GROUPS:

UK Place Name Generator

Posted 3 years ago
6469 Views
|
2 Replies
|
5 Total Likes
|

Every time I see another article pop up wherein somebody trains a neural net to generate names of something, I feel obligated to go back and run the same training sets through my dead-simple setup to do the same thing in the Wolfram Language. This pass at generating British place names seemed like a fun one, since the training set goes deep into small towns and villages across the UK. So I'll start with a small set of functions I've used before for this sort of thing — "decamel" is a utility function to clean up and split apart any incidental camelcased words that show up in predictions; "nameGenerator" does some minimal string processing on a provided list of Wolfram Language entities or raw strings, and produces a SequencePredictorFunction; "predictionList" produces a list of results of varying lengths using a predictor function:

decamel[str_] := 
 StringTrim[
  StringJoin[
   StringSplit[
    str, {RegularExpression["([a-z])([A-Z])"] -> "$1 $2", 
     RegularExpression["([0-9])([A-Z])"] -> "$1 $2", 
     RegularExpression["([a-z])([0-9])"] -> "$1 $2"}]]]

predictionList[func_, num_, min_, max_, decam_: True] := 
 If[decam == True, 
  decamel /@ 
   Table[StringTrim@
     StringReplace[
      func["|", "RandomNextElement" -> RandomInteger[{min, max}]], 
      "|" -> " "], num],
  Table[StringTrim@
    StringReplace[
     func["|", "RandomNextElement" -> RandomInteger[{min, max}]], 
     "|" -> " "], num]]

nameGenerator[entOrString_List, extractor_: "SegmentedWords"] :=
 Block[{names, list},
  With[{heads = DeleteDuplicates[Head /@ entOrString]},
   Which[
    heads === {Entity},
    names = CommonName[DeleteMissing[entOrString]];
    list = 
     StringRiffle[StringSplit["|" <> # <> "|"], "|"] & /@ names;
    SequencePredict[list, FeatureExtractor -> extractor],
    heads === {String},
    names = StringTrim /@ DeleteMissing[entOrString];
    list = 
     StringRiffle[StringSplit["|" <> # <> "|"], "|"] & /@ names;
    SequencePredict[list, FeatureExtractor -> extractor]]]]

So I'll start by importing the file used in the original article, and just grabbing place names out of it (it also includes some numerical IDs, and county names):

uknames = 
  Import["https://cdn.obrienmedia.co.uk/cdn/farfuture/5-\
1bFjgWmjONhWhk9sGAeYzlIzhwHRSBIF_Fzr55UYs/mtime:1425905283/sites/\
default/files/uk_towns_and_counties.csv"];

namelist = uknames[[All, 2]] // Rest // DeleteDuplicates;

In[84]:= Select[namelist, StringContainsQ["("]][[;; 10]]

Out[84]= {"Wdig (Goodwick)", "Vermuden's Drain (Forty Foot)", "Valley \
(Dyffryn)", "Usk (Brynbuga)", "Upper Largo (Kirkton of Largo)", \
"Uisage Dubh (Black Water)", "Tyddewi (St David's)", "Treorci \
(Treorchy)", "Treorchy (Treorci)", "Trent (Piddle)"}

I don't want to try to generate names with parenthetical transcriptions or alternate forms, so let's split those up and treat the parentheticals as distinct names for training purposes:

In[78]:= splitter[rec_] := 
 StringTrim[StringSplit[rec, "("], {" ", ")"}]

In[105]:= newlist = Flatten[splitter /@ namelist];

In[83]:= Length[newlist] 
Out[83]= 41245

Then all that's left to do is make the SequencePredictorFunction, and generate some names (removing predicted names that were already in the training set, or that end with words of fewer than 5 characters):

ukpl = nameGenerator[newlist, "SegmentedCharacters"];

Multicolumn[
 Complement[
  Select[predictionList[ukpl, 400, 8, 18], 
   StringLength[StringSplit[#, " "][[-1]]] > 4 &], newlist], 6]

enter image description here

Not all of these are gems (or even readable), but there's some good stuff in here — my personal favorites include:

  • Blackleaze Ferry
  • Bleburgh
  • Farmlingthorpe
  • Kirphook
  • Low of Gosbe
  • Roebucklecott
  • Stainton Doirkmill
  • Tattin Grime
  • Toberland Garker
  • Winstapleton Dalby
2 Replies

enter image description here - Congratulations! This post is now a Staff Pick as distinguished on your profile! Thank you, keep it coming!

Fantastic! Great stuff!

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract