Message Boards Message Boards

10
|
10276 Views
|
5 Replies
|
21 Total Likes
View groups...
Share
Share this post:

Fit it and tweet it - optimized reflections on Einstein and Chinese poems

Posted 10 years ago
Recently we had a question about Sorting synonyms. That was needed to shorten a phrase - in order to fit in 140 characters of a tweet.

Here is a stab at a more general problem of automation of text shortening. So what is more suitable for a Sunday evening than linguistic optimization in search of the global minimum of a text length? I first will show you the result and then explain how it works. Of course it is not perfect, but requires just a few manual adjustments. It all depends on shortening rules and they can be made more sophisticated to perfect automation. So here are a few examples:


A poem by classic Chinese poet, musician and painter Wang Wei who lived in 8th century. What is cool is that poetry translations can be quite arbitrary and you can consider this to be Wolfram Language meditations on translation of classic Chinese ;-)
text = "All alone in a foreign land, I am twice as homesick on this day, when brothers carry dogwood up the mountain, each of them a branch and my branch missing.";

text // StringLength
Out[]= 154

Shorten it:
RePhrase[text, ss, 5]
"all solo in a alien land, i am twice as homesick on this day, when pal carry cornel up the wad, each of them a branch and my branch missing."
In[]= % // StringLength
Out[]= 140


A quote by Einstein:
text = "The language of mathematics is even more inborn and universal than the language of music; a mathematical formula is crystal clear and independent of all sense organs.";

text // StringLength
Out[] = 166

Shorten it:
RePhrase[text, ss, 5]
the words of math is even more innate and general than the words of music; a numerical rule is quartz clear and main of all sense organs.
In[]= % // StringLength
Out[]= 137


A tweet based on Encyclopedia Britannica definition of physics:
text = "I want to learn physics, a science that deals with the structure of matter and the interactions between the fundamental constituents of the observable universe!";

text // StringLength
Out[] = 160

Shorten it:
RePhrase[text, ss, 5]
i want to learn physics, a skill that deals with the structure of matter and the interaction 'tween the key part of the evident world!
In[]= % // StringLength
Out[]= 134

Of course, as I said, a few corrections needed that can be easily done manually – but we did save a lot of space – even for hash-tags ;-) Now, how it works. First of all we would need a set of characters based on which we will split a string into separate words, which later will be replaced by short synonyms. These characters are easy to get – I also added to them black space.
ss = Complement[#, Cases[#, _?LetterQ]] &@CharacterRange["!", "~"]~Join~{" "}

Out[]= {"!", "@", "#", "%", "^", "&", "*", "(", ")", "_", "-", "+", "=", "~", "`", "{", "[", "}", "]", "|", "\\", "<", ">", ".", ",", ";", "\"", "?", "'", "/", ":", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "$", " "}

A better way to do this - as advised by Michael in the comments:
ss = Select[CharacterRange[" ", "~"], ! LetterQ@# &]~Join~{" "}

Let me know if you have other ideas how to get special character list. Now we define a function that returns a sorted by string length list of synonyms for a given word based on a few rules. Specifically, synonyms are returned only if (otherwise the original word itself is returned)

  • Word is longer than n (say 5) characters
  • Word is not ending on –ing – those are easy to confuse with a different part of speech – verb versus adjective
  • Word is not a verb – those have too many special forms
  • Word generally has at least one synonym
 SynSort[w_, n_] :=
  Sort[
   With[
    {r = WordData[w, "Synonyms", "List"]},
    If[StringLength[w] < n || StringTake[w, {-3, -1}] == "ing" ||
      MemberQ[WordData[w, "PartsOfSpeech"], "Verb"] || r == {} ||
      Head[r] == WordData, {w}, {w}~Join~r, {w}~Join~r]
    ]
   , StringLength[#1] < StringLength[#2] &]

Built-in knowledge of WordData was used to find synonyms and assess grammatical properties of words. Define a function that actually replaces longer words with shorter synonyms. We need to go ToLowerCase – capitalized WL strings do not have synonyms. Also note Thread[ss -> ss] option to StringSplit that allows to keep original punctuation. 
RePhrase[txt_, ss_, n_] := StringJoin[First[SynSort[#, n]] & /@ StringSplit[ToLowerCase@txt, Thread[ss -> ss]]]

And that’s it! Do not forget that to tweet you do not even have to leave – just use this on Raspberry Pi:
SendMessage["Twitter", "...TEXT..."]

One thing to remember that some nice short synonyms can be reached only in a few steps of synonym network. We did not use it here but it could improve things. For example “faux” is not a direct synonym of "fictitious" but a cousin. 
 WordData["fictitious", "Synonyms", "List"]
 Out[]= {"assumed", "fabricated", "false", "fancied", "fictional", "fictive", "pretended", "put on", "sham"}
 
 Graph[Union[Sort /@ Flatten[Rest[NestList[Union[Flatten[Thread[# <->
  WordData[#, "Synonyms", "List"]] & /@ Last /@ #]] &, {"" <-> "fictitious"}, 2]]]],
  VertexLabels -> Placed["Name", Below],
  VertexStyle -> {Opacity[.5], "fictitious" -> Red, "faux" -> Green},
  BaseStyle -> EdgeForm[], EdgeStyle -> Opacity[.5],
  VertexSize -> {"fictitious" -> {"Scaled", .03}, "faux" -> {"Scaled", .03}}]



Notebook is attached for convenience. 
Attachments:
POSTED BY: Vitaliy Kaurov
5 Replies
Bravo Vitaliy:
EmitSound[
Sound /@ (SoundNote["Clap", #] & /@ RandomReal[{0.1, 0.7}, 20])]

I didn't know they used different channels....  this can be addictive!
instruments = {"Clap", "Sticks", "Shaker", "LowWoodblock", 
  "Castanets", "Maracas"};

EmitSound[
Sound /@ (SoundNote[#, RandomReal[{0.4, 0.6}]] & /@ 
    RandomChoice[instruments, 50])]
POSTED BY: W. Craig Carter
Craig, this is wonderful - the truly - or should I say
WordData["truly", "Synonyms", "List"]
Out[]= {"genuinely", "in truth", "really", "rightfully", "sincerely", "unfeignedly"}

- Obnoxious Phone Answering Program. The only thing that can improve it, I think, is ambient pentatonic scale music that can be played simultaneously because channels for Speak and EmitSound are different.
Speak["...text of obnoxious message..."];

EmitSound@Sound[SoundNote[#1, #2, "Flute"] & @@@
   Transpose[{RandomChoice[RandomInteger[{1, 5}, 9] ->
   {"A3", "B3", "D", "E", "G", "A", "B", "D5", "E5"}, 100],
     RandomChoice[{.4, .3, .2, .1} -> {.1, .2, .4, .8}, 100]}]]
POSTED BY: Vitaliy Kaurov
This is much better way to get special characters, Michael, thank you.
POSTED BY: Vitaliy Kaurov
The Obnoxious Phone Answering Program

Here is something that I played with a couple years ago that does the opposite...

My purpose here was to create the most obnoxious phone answering message.  The result is--in my opinion--poetically humerous.
 removePunctuation[sentence_] :=
  StringReplace[sentence, a_ /; (! LetterQ[a]) :> " "]
 
 
 mapToLowerCase[sentence_] :=
  StringReplace[sentence, a_ /; UpperCaseQ[a] :> ToLowerCase[a]]
 
 
 muchbroader[word_] := Module[
   {wordData , likeTypes},
   If[RandomReal[] < 0.8, Return[word]];
   wordData  = WordData[word, "Synonyms"];
   If[ Head[wordData] == WordData, Return[{word}]];
   (*Print[wordData];*)
   
   likeTypes = Flatten[Union@Flatten[Last /@ wordData]];
   If[Length[likeTypes] == 0 , Return[{word}]];
   likeTypes =
    Union@Join[likeTypes,
      Flatten@(Last /@
         Flatten[Map[WordData[#, "BroaderTerms"] &,
           Union@Flatten[Last /@ wordData]], 1])];
   If[Length[likeTypes == 0], Return[word]];
   RandomChoice[likeTypes, 1]
   ] /; StringQ[word]


broadRephrase[sentence_] :=
StringJoin@
   Riffle[Flatten[
     Join[muchbroader /@
       StringSplit[mapToLowerCase[removePunctuation[sentence]]]]],
    " "] <> ".\n"

Example:
StringJoin@
DeleteDuplicates[
  NestList[broadRephrase[#] &,
   "Hello, This is Maxwell Fields, please leave a message", 20]]

This is an example of an obnoxious answering machine message:
(Here you may wish to put on headphones or turn your volume down)
Speak@StringJoin@
  DeleteDuplicates[
   NestList[broadRephrase[#] &,
    "Hello, This is Maxwell Fields, please leave a message", 20]]

Improvements (or their opposite) welcome!
POSTED BY: W. Craig Carter
Posted 10 years ago
I ran your request for the definition of symbol ss through my proprietary, occasionally reliable, and currently unimplemented code shortener and got this:
Select[CharacterRange[" ", "~"], ! LetterQ@# &]
Interesting results. I actually have a notebook open from last night where I was messing around with WordData for part-of-speech tagging. Nothing shareable yet though.
POSTED BY: Michael Hale
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract