Message Boards Message Boards

Text analysis of US political party platforms

Posted 9 years ago

I discovered the American Presidency Project's collection of historical party platforms over the holiday weekend, and couldn't resist diving into some simple text analysis and visualization. For example I could see how word usage by The Democratic and Republican parties changes over time:

enter image description here

Let's see how we can start building this visualization. Given a specific party platform page ID at presidency.ucsb.edu, import the page and pull out the main platform text:

partyPlatformImporter[pageID_] := Block[{raw, text},
  raw = Import[
    "http://www.presidency.ucsb.edu/ws/index.php?pid=" <> pageID, 
    "Source"];
  text = ImportString[
    StringCases[raw, 
      "<span class=\"displaytext\">" ~~ x : ___ ~~ 
        "</span><hr noshade=\"noshade\" size=\"1\">" :> x][[1]], 
    "HTML"]]

Import the HTML source of the index page for party platforms:

In[106]:= platformsRaw = 
  Import["http://www.presidency.ucsb.edu/platforms.php", "Source"];

Find all tables in the source:

In[107]:= rawCases = 
  StringCases[platformsRaw, "<table" ~~ ___ ~~ "</table>", Overlaps -> True];

In[108]:= tables = StringSplit[rawCases[[9]], "</table>"];

The Democratic and Republican platforms happens to be in the first two tables in this list, so split by row and create an Association of years and page IDs for each:

In[109]:= demRaw = StringSplit[tables[[1]], "<tr>"];

In[154]:= demPageIDs = 
 "Democratic" -> <|
   StringCases[#, 
      "<a href=\"http://www.presidency.ucsb.edu/ws/index.php?pid=" ~~ 
        url : RegularExpression["\\d+"] ~~ "\">" ~~ 
        text : RegularExpression["\\d+"] ~~ RegularExpression[" ?\\w*"] ~~ 
        "</a>" :> text -> url] & /@ demRaw|>

Out[154]= "Democratic" -> <|"2012" -> "101962", "2008" -> "78283", "2004" -> "29613", 
  "2000" -> "29612", "1996" -> "29611", "1992" -> "29610", "1988" -> "29609", 
  "1984" -> "29608", "1980" -> "29607", "1976" -> "29606", "1972" -> "29605", 
  "1968" -> "29604", "1964" -> "29603", "1960" -> "29602", "1956" -> "29601", 
  "1952" -> "29600", "1948" -> "29599", "1944" -> "29598", "1940" -> "29597", 
  "1936" -> "29596", "1932" -> "29595", "1928" -> "29594", "1924" -> "29593", 
  "1920" -> "29592", "1916" -> "29591", "1912" -> "29590", "1908" -> "29589", 
  "1904" -> "29588", "1900" -> "29587", "1896" -> "29586", "1892" -> "29585", 
  "1888" -> "29584", "1884" -> "29583", "1880" -> "29582", "1876" -> "29581", 
  "1872" -> "29580", "1868" -> "29579", "1864" -> "29578", "1860" -> "29577", 
  "1856" -> "29576", "1852" -> "29575", "1848" -> "29574", "1844" -> "29573", 
  "1840" -> "29572"|>

In[111]:= repRaw = StringSplit[tables[[2]], "<tr>"];

In[113]:= repPageIDs = 
 "Republican" -> <|
   StringCases[#, 
      "<a href=\"http://www.presidency.ucsb.edu/ws/index.php?pid=" ~~ 
        url : RegularExpression["\\d+"] ~~ "\">" ~~ 
        text : RegularExpression["\\d+"] ~~ RegularExpression[" ?\\w*"] ~~ 
        "</a>" :> text -> url] & /@ repRaw|>

Out[113]= "Republican" -> <|"2012" -> "101961", "2008" -> "78545", "2004" -> "25850", 
  "2000" -> "25849", "1996" -> "25848", "1992" -> "25847", "1988" -> "25846", 
  "1984" -> "25845", "1980" -> "25844", "1976" -> "25843", "1972" -> "25842", 
  "1968" -> "25841", "1964" -> "25840", "1960" -> "25839", "1956" -> "25838", 
  "1952" -> "25837", "1948" -> "25836", "1944" -> "25835", "1940" -> "29640", 
  "1936" -> "29639", "1932" -> "29638", "1928" -> "29637", "1924" -> "29636", 
  "1920" -> "29635", "1916" -> "29634", "1912" -> "29633", "1908" -> "29632", 
  "1904" -> "29631", "1900" -> "29630", "1896" -> "29629", "1892" -> "29628", 
  "1888" -> "29627", "1884" -> "29626", "1880" -> "29625", "1876" -> "29624", 
  "1872" -> "29623", "1868" -> "29622", "1864" -> "29621", "1860" -> "29620", 
  "1856" -> "29619"|>

Then join them together:

In[353]:= IDset = <|demPageIDs, repPageIDs|>

Out[353]= <|"Democratic" -> <|"2012" -> "101962", "2008" -> "78283", "2004" -> "29613", 
   "2000" -> "29612", "1996" -> "29611", "1992" -> "29610", "1988" -> "29609",
    "1984" -> "29608", "1980" -> "29607", "1976" -> "29606", 
   "1972" -> "29605", "1968" -> "29604", "1964" -> "29603", "1960" -> "29602",
    "1956" -> "29601", "1952" -> "29600", "1948" -> "29599", 
   "1944" -> "29598", "1940" -> "29597", "1936" -> "29596", "1932" -> "29595",
    "1928" -> "29594", "1924" -> "29593", "1920" -> "29592", 
   "1916" -> "29591", "1912" -> "29590", "1908" -> "29589", "1904" -> "29588",
    "1900" -> "29587", "1896" -> "29586", "1892" -> "29585", 
   "1888" -> "29584", "1884" -> "29583", "1880" -> "29582", "1876" -> "29581",
    "1872" -> "29580", "1868" -> "29579", "1864" -> "29578", 
   "1860" -> "29577", "1856" -> "29576", "1852" -> "29575", "1848" -> "29574",
    "1844" -> "29573", "1840" -> "29572"|>, 
 "Republican" -> <|"2012" -> "101961", "2008" -> "78545", "2004" -> "25850", 
   "2000" -> "25849", "1996" -> "25848", "1992" -> "25847", "1988" -> "25846",
    "1984" -> "25845", "1980" -> "25844", "1976" -> "25843", 
   "1972" -> "25842", "1968" -> "25841", "1964" -> "25840", "1960" -> "25839",
    "1956" -> "25838", "1952" -> "25837", "1948" -> "25836", 
   "1944" -> "25835", "1940" -> "29640", "1936" -> "29639", "1932" -> "29638",
    "1928" -> "29637", "1924" -> "29636", "1920" -> "29635", 
   "1916" -> "29634", "1912" -> "29633", "1908" -> "29632", "1904" -> "29631",
    "1900" -> "29630", "1896" -> "29629", "1892" -> "29628", 
   "1888" -> "29627", "1884" -> "29626", "1880" -> "29625", "1876" -> "29624",
    "1872" -> "29623", "1868" -> "29622", "1864" -> "29621", 
   "1860" -> "29620", "1856" -> "29619"|>|>

Make a list of common words we want to exclude:

In[356]:= commonwords = 
 "America" | "American" | "Americans" | "Democratic" | "Republican" | 
  "Administration" | "Federal" | "Government" | "government" | "programs";

And generate a WordCloud for each party in a specific year (in this case, 1960):

Row[WordCloud[
    DeleteStopwords@
     StringDelete[partyPlatformImporter[IDset[#party]["1960"]], 
      commonwords], IgnoreCase -> True, 
    ColorFunction -> ColorData[#color]] & /@ {<|
    "party" -> "Democratic", "color" -> "AtlanticColors"|>, <|
    "party" -> "Republican", "color" -> "ValentineTones"|>}]

enter image description here

I've attached a notebook with some more exploration. For instance, I wanted to see which words skew most strongly towards each party in a given year, so I got word counts by party for a single year. Then I selected words that appear in both platforms, where the difference in word counts is at least 10, and at least one party uses that word more than 20 times, and merge those Associations, and then plot:

enter image description here

See the attached notebook for further details including how to make the top figure in this post. Enjoy...

Attachments:
POSTED BY: Alan Joyce
2 Replies

Great idea... I think this will do it (starting from functions and data created in the notebook above):

text = <|# -> 
      partyPlatformImporter[IDset[#]["1960"]] & /@ {"Democratic", 
     "Republican"}|>;
counts = Counts /@ (TextWords[ToLowerCase[DeleteStopwords[#]]] & /@ text);
demOnly = 
  Complement[Keys[counts["Democratic"]], Keys[counts["Republican"]]];
repOnly = 
  Complement[Keys[counts["Republican"]], Keys[counts["Democratic"]]];

Row[WordCloud[KeyTake[counts[#party], #keys], 
    ColorFunction -> ColorData[#color], ImageSize -> 400] & /@ {<|
    "party" -> "Democratic", "keys" -> demOnly, 
    "color" -> "AtlanticColors"|>, <|"party" -> "Republican", 
    "keys" -> repOnly, "color" -> "ValentineTones"|>}]

enter image description here

POSTED BY: Alan Joyce

This is interesting. How would you generate word clouds that were specific only to one party and had no overlap?

POSTED BY: heidi kellner
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract