Group Abstract Group Abstract

Message Boards Message Boards

Idea-nets and uniqueness of US inaugural addresses

POSTED BY: Vitaliy Kaurov
10 Replies

I have published a relevant function KeywordsGraph:

https://resources.wolframcloud.com/FunctionRepository/resources/KeywordsGraph

enter image description here

POSTED BY: Vitaliy Kaurov

Hi Vitaliy,

I write a simple n-gram WordCloud function below. The code quality can still improve, but it works with n-gram idea. Do you have any suggestion to improve the code quality or effiency?

Attached the notebook format for better understand.

nGramWords[text_String, n_Integer: 4, filterLevel_Integer: 2] := 
  Module[{
    words = DeleteStopwords[TextWords[ToLowerCase[text]]], 
    nGramInitial, nGramTable, removeValue, newLine},
   seperate[list_] := 
    With[{l = Length@list}, {Take[list, l - 1], Take[list, -(l - 1)]}];
   nGramInitial = 
    Normal@Table[
      Select[WordCounts[StringRiffle[words], 
        i], # >= filterLevel &], {i, n, 1, -1}];
   nGramInitial = 
    Join[Drop[
      nGramInitial, -1], {({#[[1]]} -> #[[2]]) & /@ 
       Last@nGramInitial}];
   nGramTable = {};
   nGramTable = Append[nGramTable, First@nGramInitial];
   Do[
    removeValue = 
     Flatten@Table[
       Thread[Rule[seperate[First@Last[nGramTable][[i]]], 
         Table[-Last@Last[nGramTable][[i]], {2}]]], {i, 1, 
        Length@Last[nGramTable]}];
    newLine = 
     Sort[Select[
       Flatten@If[
           Length[#] > 
            1, #[[1]][[1]] -> (#[[1]][[2]] + #[[2]][[2]]) , #[[
            1]]] & /@ 
        GatherBy[Join[removeValue, nGramInitial[[j]]], 
         First], #[[2]] >= 1 &], #1[[2]] > #2[[2]] &];
    nGramTable = Append[nGramTable, newLine], {j, 2, n}];
   Sort[{StringRiffle[#[[1]]], #[[2]]} & /@ 
     Select[Flatten[nGramTable], #[[2]] >= filterLevel &], #1[[
       2]] > #2[[2]] &]
   ];

Compare 1-gram with n-gram WordCloud compare common words and unique words

Attachments:
POSTED BY: Frederick Wu
POSTED BY: Marco Thiel

Impressive compactness!

POSTED BY: Sander Huisman
POSTED BY: Jesse Friedman

Thanks for posting, @Frederick Wu, I will have to find some time to dig through this. I think if you could add some explanations how your code for n-gram WordCloud works, this actually could make a separate nice post! Especially if the obvious difference between regular WordCloud would be demonstrated in some cases. Please consider this ;-)

POSTED BY: Vitaliy Kaurov
Posted 9 years ago

Hi Vitaliy,

I find "Idea-network" is very hard to understand, maybe it can evaluate the complex of text, but it seems not much useful.

However, the common and unique word cloud is inspired me a lot. Maybe an intersection of all texts or a lots texts would get a fairly acceptable common words (with certain frequency level).

Another suggestion, n-gram wordcloud will contain more useful and readable information instead of one-gram. http://hack-r.com/n-gram-wordclouds-in-r/

I saw, Eric written on Wolfram blog, one image of wordcloud contain mutilple math words, it should be n-gram wordcloud? http://blog.wolfram.com/2016/12/22/the-semantic-representation-of-pure-mathematics/

POSTED BY: Updating Name

Thank you @Jesse ! Yes, I'd like to do that soon. I am considering first giving an upgrade to the function KeywordsGraph that would give am option of counting different grammatical forms as one keyword (such as, for instance, "cat" and "cats"). BTW, great example with ReadabilityScore and American presidents’ inaugural addresses!

POSTED BY: Vitaliy Kaurov
Posted 9 years ago

I am extremely new to all and anything Wolfram. And perhaps I just in a "mood" (apologies), but, ehh...

Could this be useful as a "BS" detector? Maybe even provide an actual computable definition of "BS"???

POSTED BY: Victor Lewis
Posted 9 years ago

Hi Vitaliy,

This is a great idea with many possible extensions. This code could have interesting applications in literature. My incompetence with the Cyrillic alphabet and Slavic language leads me to an idea where many people may be interested in the results.

These are difficult times throughout the world, when we are all hoping for the best possible leadership. It could be a mistake to think that the best voices will be from those who win the contest for public or private office. Maybe you have already guessed what I'm hinting at: yes, Bulgakov, the samizdat hero, from Kiev, Ukraine.

Quickly searching google I find many translations of (in English) "Heart of a Dog", even a few PDF files that could be mined for plaintext. Side by side comparisons of the translations would give us a quantitative idea about the fluctuations between texts. Do you think this is possible using your code? Can your code operate on Cyrillic texts?

"Heart of a Dog" is a masterpiece of science fiction. I'm sure it's a risk to say out loud, but I think more people in the English speaking world should read Bulgakov. That being said, science fiction can sometimes have a gender biased audience. In particular I think the details of this story might be too grotesque and masculine for some women. But I have recently heard that the woman poet Anna Akhmatova is also a worthwhile Ukrainian-born samizdat hero, also active "during the terrible years of the Yezhovshchina". Again many translations exist, so another question: could your method be adapted to compare translations of short form writing such as a poem?

Thanks, Brad.

POSTED BY: Brad Klee
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard