Group Abstract

Message Boards

WOLFRAM COMMUNITY

37.6K Views

15 Replies

45 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Staff Picks Data Science Graphics and Visualization Wolfram Language Wolfram Cloud Wolfram Development Platform Computational Linguistics Computational Humanities

Analytics of Republican Debate and network percolation

Vitaliy Kaurov

Vitaliy Kaurov, WOLFRAM Research

Posted 11 years ago

Alan Joyce sent me some neat code of analysis of Republican Debate Sep. 16, 2015. Please do see his analytics below. Transcripts of debate can be found online. Alan mined most popular words used by the candidates filtered and re-weighted by different criteria. Properly weighted WordClouds are a good way to grasp key topics. I just wanted to point to graph & networks take on the data. I thought that some candidates may share some top words they use. So if the candidates are nodes, then a weighted edge between them reflects upon how many top words they share. If you consider 1 top word per candidate then the graph will be completely disconnected as each candidate has own unique single top word. As you increase top words' pool some of them will be common and shared between some candidates and links between nodes will appear. Percolation is the moment when, driven by top-words pool-size, all candidates become connected. In the opposite limit of large pool-size all candidates are connected and we get a complete graph. So below is the percolation moment that happens at 5 top words per candidate. It is indicative of which candidates speak about top common subjects. CommunityGraphPlot[HighlightGraph[SetProperty[g, EdgeLabels -> None], Table[Style[e, Opacity[.7], Thickness[.005 PropertyValue[{g, e}, EdgeWeight]]], {e, EdgeList[g]}]], CommunityBoundaryStyle -> Directive[Red, Dashed, Thick], CommunityRegionStyle -> {Directive[Opacity[.1], Red], Directive[Opacity[.1], Yellow], Directive[Opacity[.1], Blue]}] The edge thickness is reflective of number of common words. Grouping shows clustering of candidates around common words. And vertex size come from DegreeCentrality. DegreeCentrality will give high centralities to vertices that have high vertex degrees. So candidates with top words similar to more other candidates will have larger vertices. Clustered CommunityGraphPlot was derived from the top words: topWords = Sort[Normal[highFrequencyForCloud[#]], #1[[2]] > #2[[2]] &][[;; 5]][[All, 1]] & /@ candidates; TableForm[topWords, TableHeadings -> {candidates, None}] ( refining text filters would narrow top words more precisely ) and WeightedAdjacencyGraph: mocw = Outer[Length[Intersection[#1, #2]] &, topWords, topWords, 1] (1 - IdentityMatrix[10]) /. 0 -> Infinity; mocw // MatrixForm g = WeightedAdjacencyGraph[candidates, mocw, VertexLabels -> "Name", EdgeLabels -> "EdgeWeight",EdgeLabelStyle -> 15, VertexLabelStyle -> 14, VertexSize -> "DegreeCentrality", GraphStyle -> "ThickEdge", GraphLayout -> "CircularEmbedding", VertexStyle -> Directive[Opacity[.8], Orange]] For top words extraction and better refinement see Alan's analysis right below. The notebook is attached to his post.

POSTED BY: Vitaliy Kaurov

15 Replies

Sort By:

Alan Joyce

Alan Joyce, Wolfram|Alpha

Posted 11 years ago

This uses a preliminary transcript of the September 16 debate, with some manual editing to make it easier to process the edited text is included in the attached notebook, assigned to the variable `textRaw`. Feel free to try out the simple public app, or do some additional experimentation on the text. It'll be interesting to start watching trends over time, as the campaign season progresses and more debates occur. Manipulate[ BarChart[ReplaceAll[<\|# -> allCandidateCounts[#][ToLowerCase[word]] & /@ Keys[allCandidateCounts]\|>, _Missing -> 0], BarOrigin -> Left, ChartLabels -> Automatic, PlotLabel -> "Word frequency in the Sept. 16, 2015 Republican Debate"], {word, "freedom", InputField[#, String] &}] CloudDeploy[%, Permissions -> "Public"] ===> CloudObject"[https://www.wolframcloud.com/objects/d1b62bc5-f686-42b3-bab9-bb70436d7e02"] Basic counts debateBySpeaker = StringJoin /@ GroupBy[StringSplit[#, ": ", 2] & /@ StringTrim /@ StringSplit[StringDelete[ StringReplace[StringReplace[StringDelete[textRaw, "\n" \| "(APPLAUSE)" \| "(LAUGHTER)" \| "(CROSSTALK)" \| "(COMMERCIAL BREAK)" \| ("UNKNOWN") \| "know" \| "going" \| "think" \| "people" \| "say" \| "said" \| "country" \| "want" \| "need"], "..." -> " "], name : RegularExpression["[A-Z ]+:"] :> "\n" <> name], RegularExpression["\[[a-z ]+\]"]], "\n"], First][[All, All, 2]]; candidates = {"TRUMP", "BUSH", "WALKER", "RUBIO", "CHRISTIE", "CARSON", "PAUL", "CRUZ", "KASICH", "FIORINA"}; Multicolumn[Labeled[Framed@ WordCloud[DeleteStopwords@debateBySpeaker[#], IgnoreCase -> True, ImageSize -> 300], Style[#, "Section"], Top] & /@ candidates, 3] Only words with higher than overall mean frequency allCandidateCounts = <\|# -> Sort[Counts[ DeleteStopwords[ TextWords[ ToLowerCase@ StringReplace[debateBySpeaker[#], "." -> " "]]]]] & /@ candidates\|>; meanCounts = Merge[Values[allCandidateCounts], N[Mean[#]] &]; candidateVsMean = <\|# -> With[{cand = allCandidateCounts[#]}, <\|# -> {cand[#], meanCounts[#]} & /@ Keys[cand]\|>] & /@ candidates\|>; highFrequencyPerCandidate = Select[#, #[[1]] > #[[2]] &] & /@ candidateVsMean; highFrequencyForCloud = <\|# -> highFrequencyPerCandidate[#][[All, 1]] & /@ candidates\|>; Multicolumn[ Labeled[Framed@WordCloud[highFrequencyForCloud[#], IgnoreCase -> True, ImageSize -> 300], Style[#, "Section"], Top] & /@ candidates, 3] Individual frequency divided by overall mean frequency: candidateOverMean = <\|# -> With[{cand = allCandidateCounts[#]}, <\|# -> cand[#]/meanCounts[#] & /@ Keys[cand]\|>] & /@ candidates\|>; Multicolumn[ Labeled[Framed@WordCloud[candidateOverMean[#], IgnoreCase -> True, ImageSize -> 300], Style[#, "Section"], Top] & /@ candidates, 3] Unique words: Only show words that none of the other participants used uniqueKeys = <\|# -> FoldList[Complement, Keys[allCandidateCounts[#]], Keys[allCandidateCounts[#]] & /@ Complement[Keys[allCandidateCounts], {#}]][[-1]] & /@ Keys[allCandidateCounts]\|>; Multicolumn[ Labeled[Framed@WordCloud[KeyTake[allCandidateCounts[#], uniqueKeys[#]], IgnoreCase -> True, ImageSize -> 300], Style[#, "Section"], Top] & /@ candidates, 3] Reply to this discussion Community posts can be styled and formatted using the Markdown syntax. Tag limit exceeded Note: Only the first five people you tag will receive an email notification; the other tagged names will appear as links to their profiles. Reply Preview Attachments Remove Add a file to this post Follow this discussion or Discard Be respectful. Review our Community Guidelines to understand your role and responsibilities. Community Terms of Use Feedback Products Wolfram\|One Mathematica Notebook Assistant + LLM Kit Compute Services System Modeler Wolfram\|Alpha Notebook Edition Wolfram\|Alpha Pro Mobile Apps Wolfram Engine Wolfram Player Volume & Site Licensing Server Deployment Options Consulting Wolfram Consulting Repositories Data Repository Function Repository Community Paclet Repository Neural Net Repository Prompt Repository Wolfram Language Example Repository Notebook Archive Wolfram GitHub Learning Wolfram U Wolfram Language Documentation Webinars & Training Educational Programs Wolfram Language Introduction Fast Introduction for Programmers Fast Introduction for Math Students Books Wolfram Community Wolfram Blog Public Resources Wolfram\|Alpha Wolfram Problem Generator Wolfram Challenges Computer-Based Math Computational Thinking Computational Adventures Demonstrations Project Wolfram Data Drop MathWorld Wolfram Science Wolfram Media Publishing Customer Resources Store Product Downloads User Portal Your Account Organization Access Support FAQ Contact Support Company About Wolfram Careers Contact Events © 2026 Wolfram \| Legal & Privacy Policy \| English

This uses a preliminary transcript of the September 16 debate, with some manual editing to make it easier to process the edited text is included in the attached notebook, assigned to the variable textRaw. Feel free to try out the simple public app, or do some additional experimentation on the text. It'll be interesting to start watching trends over time, as the campaign season progresses and more debates occur.

Manipulate[
 BarChart[ReplaceAll[<|# -> allCandidateCounts[#][ToLowerCase[word]] & /@ 
     Keys[allCandidateCounts]|>, _Missing -> 0], BarOrigin -> Left, 
  ChartLabels -> Automatic, 
  PlotLabel -> "Word frequency in the Sept. 16, 2015 Republican Debate"], {word, 
  "freedom", InputField[#, String] &}]

CloudDeploy[%, Permissions -> "Public"]

===> CloudObject"[https://www.wolframcloud.com/objects/d1b62bc5-f686-42b3-bab9-bb70436d7e02"]

Basic counts

debateBySpeaker = 
  StringJoin /@ GroupBy[StringSplit[#, ": ", 2] & /@ 
      StringTrim /@ StringSplit[StringDelete[
         StringReplace[StringReplace[StringDelete[textRaw, 
            "\n" | "(APPLAUSE)" | "(LAUGHTER)" | "(CROSSTALK)" | 
             "(COMMERCIAL BREAK)" | ("UNKNOWN") | "know" | "going" | 
             "think" | "people" | "say" | "said" | "country" | 
             "want" | "need"], "..." -> " "], 
          name : RegularExpression["[A-Z ]+:"] :> "\n" <> name], 
         RegularExpression["\[[a-z ]+\]"]], "\n"], First][[All, All, 2]];

candidates = {"TRUMP", "BUSH", "WALKER", "RUBIO", "CHRISTIE", 
   "CARSON", "PAUL", "CRUZ", "KASICH", "FIORINA"};

Multicolumn[Labeled[Framed@
     WordCloud[DeleteStopwords@debateBySpeaker[#], IgnoreCase -> True,
       ImageSize -> 300], Style[#, "Section"], Top] & /@ candidates, 3]

enter image description here

Only words with higher than overall mean frequency

allCandidateCounts = <|# -> 
      Sort[Counts[
        DeleteStopwords[
         TextWords[
          ToLowerCase@
           StringReplace[debateBySpeaker[#], "." -> " "]]]]] & /@ 
    candidates|>;

meanCounts = Merge[Values[allCandidateCounts], N[Mean[#]] &];

candidateVsMean = <|# -> 
      With[{cand = allCandidateCounts[#]}, <|# -> {cand[#], meanCounts[#]} & /@ 
         Keys[cand]|>] & /@ candidates|>;

highFrequencyPerCandidate = 
  Select[#, #[[1]] > #[[2]] &] & /@ candidateVsMean;

highFrequencyForCloud = <|# -> 
      highFrequencyPerCandidate[#][[All, 1]] & /@ candidates|>;

Multicolumn[
 Labeled[Framed@WordCloud[highFrequencyForCloud[#], IgnoreCase -> True, 
      ImageSize -> 300], Style[#, "Section"], Top] & /@ candidates, 3]

enter image description here

Individual frequency divided by overall mean frequency:

candidateOverMean = <|# -> 
      With[{cand = 
         allCandidateCounts[#]}, <|# -> cand[#]/meanCounts[#] & /@ 
         Keys[cand]|>] & /@ candidates|>;

Multicolumn[
 Labeled[Framed@WordCloud[candidateOverMean[#], IgnoreCase -> True, 
      ImageSize -> 300], Style[#, "Section"], Top] & /@ candidates, 3]

enter image description here

Unique words: Only show words that none of the other participants used

uniqueKeys = <|# -> 
      FoldList[Complement, Keys[allCandidateCounts[#]], 
        Keys[allCandidateCounts[#]] & /@ 
         Complement[Keys[allCandidateCounts], {#}]][[-1]] & /@ 
    Keys[allCandidateCounts]|>;

Multicolumn[
 Labeled[Framed@WordCloud[KeyTake[allCandidateCounts[#], uniqueKeys[#]], 
      IgnoreCase -> True, ImageSize -> 300], Style[#, "Section"], Top] & /@ candidates, 3]

enter image description here

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback