Thanks for sharing your tips, Vitaliy! These "cleaning methods" seem to work very nicely on the whole. I have nothing new to add (haven't played with WordCloud yet), just wanted to say that there's obviously an error in WordData["species","BaseForm"]
. Yes, "specie" is a word, but it has to do with coins and other forms of commodity money (i.e. no cats involved as far as humans know). WordData doesn't even return "species" as a baseform of "species", I looked at the complete results. If we did get both baseforms though, we'd need a good technique to decide which one is more likely to be relevant to the topic, but that might actually be doable... (sorry to be a downer, the "specie" error is actually quite funny...)