Message Boards Message Boards

8 Replies
26 Total Likes
View groups...
Share this post:

Simple Spritz Reader on Wolfram Language

Posted 10 years ago
I get very impressed by Spritz technology for text reader, and tried to reproduce a simpler version in Wolfram Language.
Here is my code:
     formatedWord=Table[" ",{i,1,midlePoint-centerLetter}]~Join~MapAt[Style[#,Bold,Red]&,Style[#,Bold]&/@(Characters[str]),centerLetter];
    textList=Flatten@MapAt[{#," "," "}&,textList,Position[textList,n_/;StringQ[n]&&StringMatchQ[n,"*."]]];
             {Style["Simple Spritz on Wolfram Language",FontSize->20]}
            ,{Style[Row[{"Words Per Minute: ",wordsPerMinute}],FontSize->20]}
        ,Frame-> All

Now Let's test it!

text="Here is a simple test using Spritz on Wolfram Language. Spritz is a Boston based startup focused on text streaming technology and its integration into modern communication. The founders are serial entrepreneurs with extensive experience in developing and commercializing innovative technologies. We have assembled an international team of experts in reading methodologies and software engineering. Spritz offers a variety of licensing options for the integration with operating systems, applications, wearables, and websites.";


Here is the result:

Now at a faster rate! 600 words per minute.

Feel free to improve it!
POSTED BY: Rodrigo Murta
8 Replies
Well, actually, perhaps choice to highlight the first letter of the second syllable is not very useful. First of all, because of the different length, the words jump around a lot when displayed this way. That is rather distracting. It seems to be easier to read if the highlight is more or less in the middle of the word. However, research suggests that the frequency of syllables effects the word recognition:

It appears to make sense that high and low frequencey syllables (phonetic or orthographic) carry on average different amounts of information. A first step in that direction would be to slightly shift the highlight from the middle towards the most "informative" syllable. The following command finds all the different syllables in the dictionary and tells us how frequent they are:
syllablestats =
Reverse[SortBy[Tally[Flatten[WordData[#, "Hyphenation"] & /@ (WordData[])]], Last]]

The first two are the MissingData, and "empty set" ones. It is interesting that for 80782 words out of the 149191 dictionary entries the hyphenation is not available. Anyway, ListLogLogPlot gives us some insight into how frequent the syllables are:
ListLogLogPlot[syllablestats[[2 ;;, 2]]]

Perhaps it makes sense to shift the highlight to rather rare syllables such as:
syllablestats[[-20 ;; -3, 1]]
{"airt", "airs", "aired", "Aire", "Air", "aigne", "aide", "Agri", "aft", "aff", "adze", "adz", "Act", "abstrac", "absinth", "aat",
"Aar", "aa"}

Of course also the stress of the syllables influence word recognition:

Also, there is the effect that we can usually still read a text if the letters in the middle of the words (i.e. all letters exept the first and the last) are shuffled. Perhaps that can also be used to find an optimal highlighting.  

Perhaps I can find a linguist tomorrow who can suggest a better way of highlighting syllables/letters.

POSTED BY: Marco Thiel
Dear Rodrigo,

absolutely briliant post. I have just added a line to use the hyphenation rules. Of course that could be improved, but it shows that the dictionary lookup in Mathematica is fast enough.
 centerWord[str_String, maxString_] :=
  Module[{centerLetter, formatedWord, midlePoint, paddedWord},
   centerLetter =
    Which[WordData[str, "Hyphenation"] === Missing["NotAvailable"],
     Ceiling[StringLength[str]/2, 1],
     Length[WordData[str, "Hyphenation"]] == 1,
     Ceiling[(Characters[WordData[str, "Hyphenation"][[1]]] // Length)/
       2], True,
     Length[Characters[WordData[str, "Hyphenation"][[1]]]] + 1];
  midlePoint = Ceiling[maxString/2, 1];
  formatedWord =
   Table[" ", {i, 1, midlePoint - centerLetter}]~Join~
    MapAt[Style[#, Bold, Red] &,
     Style[#, Bold] & /@ (Characters[str]), centerLetter];
  Grid[{formatedWord}, Spacings -> 0.1 .5]]

spritzIt[text_String, wordsPerMinute_] :=
DynamicModule[{maxString, textList, stopWords, n},
  maxString = StringLength /@ StringSplit[text] // Max;
  textList = StringSplit[text];
  textList =
   Flatten@MapAt[{#, " ", " "} &, textList,
     Position[textList, n_ /; StringQ[n] && StringMatchQ[n, "*."]]];
  textList = centerWord[#, maxString] & /@ textList;
  Grid[{{Style["Simple Spritz on Wolfram Language",
      FontSize -> 20]}, {Dynamic[textList[[n]]~Magnify~2]}, {Animator[
      Dynamic[n], {1, Length@textList, 1},
      wordsPerMinute/60]}, {Style[
      Row[{"Words Per Minute: ", wordsPerMinute}], FontSize -> 20]}},
   Alignment -> Left, ItemSize -> {Automatic, 3}, FrameStyle -> Gray,
   Frame -> All]]

Also, I wanted it to read something more book-like.
words = Import[""];
Evaluate[StringTake[words, {23029 ;; 40000}][[1]]];
spritzIt[Evaluate[StringTake[words, {23029 ;; 40000}][[1]]], 400]

That loads Moby Dick and starts reading from the first chapter. It turns out that for many words the "hyphenation-lookup" fails, that is why I need the missing data bit. I decided to highlight the first letter of the second syllable, if known. For short words I just choose the middle letter.

Thanks a lot for your interesting post,
POSTED BY: Marco Thiel
Nice that you liked, I enjoyed your implementation of WordData!
POSTED BY: Rodrigo Murta
Posted 10 years ago
If the box for the text was substantially larger and the text centered in that box, to move the distraction of the title and the animaged progress bar away from the point where we are supposed to be completely focussed on the words, then it might be easier to concentrate on the stream of words. I tried to reduce the distractions by modifying the code and successfully deleted the title, but could not delete the progress bar and keep it working.

I have spent some time practicing with the Sprint Reader extension that runs in the Chrome browser and works on hilighted text in any web page.

Based on that, if it was possible to estimate the reading level or difficulty of the text and automatically increase or decrease the speed somewhat accordingly then this might be a nice enhancement. Nice fluffy entertainment reading could blow by at 1000 wpm while the Wolfram help pages would go somewhat slower. This might be fairly easy to do in your code.

Even more importand and probably more difficult to implement, if it were possible to estimate more important or key words or phrases in the text and pause longer on those to have those make a bigger impression on the reader then I think that might be the next key enhancement. I've read large technical documents with this and a key "2" in the text blows by just as rapidly as multisyllable words when the "2" was substantially more important, yet made absolutely no impression on me in that moment. But I think it is more than just word length or syllable count that needs to be considered. I'm not sure how to try putting this into your code, but I would be very interested in seeing the results. And if it isn't too late then you might patent this idea.
POSTED BY: Bill Simpson
You can use WordData to find how the words break into syllables:

WordData["parallelogram", "Hyphenation"]

You might be able to highlight the beggining of the second syllable that way, although you'd probably have to pre-load the WordData hypenation. 
POSTED BY: Sean Clarke
Looks very interesting! When I found out about spritz I was actually wondering whether it was really needed to highlight the word's stress, or just the middle of the word was enough. But taking a look at this, it still is possible to read, but not quite as easy as spritz. Anyways, it's a pretty cool application, good job!
POSTED BY: Daniel Pryjma
Very nice, Rodrigo, thanks for sharing! Is your choice of red letter coincides with Spritz's one? Is it just a center of a word?
POSTED BY: Sam Carrettie
Tks @Sam, it's just the center of the word (centerLetter=Ceiling[StringLength/2,1]). I believe Spritz has a much more deep technology to do that, but I get very happy with this simpler way.
POSTED BY: Rodrigo Murta
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract