Recently we had a question about
Sorting synonyms. That was needed to shorten a phrase - in order to fit in 140 characters of a tweet.
Here is a stab at a more general problem of automation of text shortening. So what is more suitable for a Sunday evening than linguistic optimization in search of the global minimum of a text length? I first will show you the result and then explain how it works. Of course it is not perfect, but requires just a few manual adjustments. It all depends on shortening rules and they can be made more sophisticated to perfect automation. So here are a few examples:
A poem by classic Chinese poet, musician and painter Wang Wei who lived in 8th century. What is cool is that poetry translations can be quite arbitrary and you can consider this to be Wolfram Language meditations on translation of classic Chinese ;-)
text = "All alone in a foreign land, I am twice as homesick on this day, when brothers carry dogwood up the mountain, each of them a branch and my branch missing.";
text // StringLength
Out[]= 154
Shorten it:
RePhrase[text, ss, 5]
"all solo in a alien land, i am twice as homesick on this day, when pal carry cornel up the wad, each of them a branch and my branch missing."
In[]= % // StringLength
Out[]= 140
A quote by Einstein:
text = "The language of mathematics is even more inborn and universal than the language of music; a mathematical formula is crystal clear and independent of all sense organs.";
text // StringLength
Out[] = 166
Shorten it:
RePhrase[text, ss, 5]
the words of math is even more innate and general than the words of music; a numerical rule is quartz clear and main of all sense organs.
In[]= % // StringLength
Out[]= 137
A tweet based on
Encyclopedia Britannica definition of physics:
text = "I want to learn physics, a science that deals with the structure of matter and the interactions between the fundamental constituents of the observable universe!";
text // StringLength
Out[] = 160
Shorten it:
RePhrase[text, ss, 5]
i want to learn physics, a skill that deals with the structure of matter and the interaction 'tween the key part of the evident world!
In[]= % // StringLength
Out[]= 134
Of course, as I said, a few corrections needed that can be easily done manually but we did save a lot of space even for hash-tags ;-) Now, how it works. First of all we would need a set of characters based on which we will split a string into separate words, which later will be replaced by short synonyms. These characters are easy to get I also added to them black space.
ss = Complement[#, Cases[#, _?LetterQ]] &@CharacterRange["!", "~"]~Join~{" "}
Out[]= {"!", "@", "#", "%", "^", "&", "*", "(", ")", "_", "-", "+", "=", "~", "`", "{", "[", "}", "]", "|", "\\", "<", ">", ".", ",", ";", "\"", "?", "'", "/", ":", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "$", " "}
A better way to do this - as advised by Michael in the comments:
ss = Select[CharacterRange[" ", "~"], ! LetterQ@# &]~Join~{" "}
Let me know if you have other ideas how to get special character list. Now we define a function that returns a sorted by string length list of synonyms for a given word based on a few rules. Specifically, synonyms are returned only if (otherwise the original word itself is returned)
- Word is longer than n (say 5) characters
- Word is not ending on ing those are easy to confuse with a different part of speech verb versus adjective
- Word is not a verb those have too many special forms
- Word generally has at least one synonym
SynSort[w_, n_] :=
Sort[
With[
{r = WordData[w, "Synonyms", "List"]},
If[StringLength[w] < n || StringTake[w, {-3, -1}] == "ing" ||
MemberQ[WordData[w, "PartsOfSpeech"], "Verb"] || r == {} ||
Head[r] == WordData, {w}, {w}~Join~r, {w}~Join~r]
]
, StringLength[#1] < StringLength[#2] &]
Built-in knowledge of WordData was used to find synonyms and assess grammatical properties of words. Define a function that actually replaces longer words with shorter synonyms. We need to go ToLowerCase capitalized WL strings do not have synonyms. Also note Thread[ss -> ss] option to StringSplit that allows to keep original punctuation.
RePhrase[txt_, ss_, n_] := StringJoin[First[SynSort[#, n]] & /@ StringSplit[ToLowerCase@txt, Thread[ss -> ss]]]
And thats it! Do not forget that to tweet you do not even have to leave just
use this on Raspberry Pi:
SendMessage["Twitter", "...TEXT..."]
One thing to remember that some nice short synonyms can be reached only in a few steps of synonym network. We did not use it here but it could improve things. For example faux is not a direct synonym of "fictitious" but a cousin.
WordData["fictitious", "Synonyms", "List"]
Out[]= {"assumed", "fabricated", "false", "fancied", "fictional", "fictive", "pretended", "put on", "sham"}
Graph[Union[Sort /@ Flatten[Rest[NestList[Union[Flatten[Thread[# <->
WordData[#, "Synonyms", "List"]] & /@ Last /@ #]] &, {"" <-> "fictitious"}, 2]]]],
VertexLabels -> Placed["Name", Below],
VertexStyle -> {Opacity[.5], "fictitious" -> Red, "faux" -> Green},
BaseStyle -> EdgeForm[], EdgeStyle -> Opacity[.5],
VertexSize -> {"fictitious" -> {"Scaled", .03}, "faux" -> {"Scaled", .03}}]
Notebook is attached for convenience.
Attachments: