Message Boards Message Boards

Measuring public interest in Syria and Ukraine from Wikipedia data

I intend to show some how to use programming to data mine current events, and I will not delve into the politics per se. Recently The Economist published an article where it attempts to measuring relative public interest in Syria and Ukraine from various internet sources, such as new media, search terms, and Wikipedia. Below the first image is from The Economist and the second is made with Wolfram Language (WL). It is easy to create such infographics, especially with handy Wolfram|Alpha data and functions new like FindPeaks and TimelinePlot. Please take a note of WL interactivity in the second image. Tutorial is below the images.

enter image description here

enter image description here

I will concentrate on the third, the last, plot from The Guardian. Wolfram|Alpha servers have numerous curated data including Wikipedia popularity (weekly hits per day) for many specific English language pages. I will access the popularity data for Ukraine and Syria as

{ukraine, syria} = 
  WolframAlpha["ukraine syria", {{"PopularityPod:WikipediaStatsData", 1}, "ComputableData"}];

here is a sample, that as you can see contains date stamps and hits per day:

Short[ukraine]

enter image description here

Here is the plot:

DateListPlot[{ukraine, syria},
 PlotRange -> All, PlotTheme -> "Detailed", AspectRatio -> 1/4,
 ImageSize -> 800, PlotLegends -> {"ukraine", "syria"}]

enter image description here

I will be interested in the events starting second half of 2013 and TimeSeriesWindow function will help us to cut those data out:

{ukraineW, syriaW} =
  TimeSeriesWindow[TimeSeriesResample[TimeSeries[#]], {DateObject[{2013, 6}], Now}] & /@
   {ukraine, syria};

FindPeaks is a nice function that will help me to find only those peaks whose value is above 10^4 hits per day:

peaksU = FindPeaks[TimeSeriesResample[TimeSeries[ukraineW]], 0, 
   Quantity[0, IndependentUnit["hits"]/("Days")], Quantity[10000, IndependentUnit["hits"]/("Days")]];
peaksS = FindPeaks[TimeSeriesResample[TimeSeries[syriaW]], 0, 
   Quantity[0, IndependentUnit["hits"]/("Days")], Quantity[10000, IndependentUnit["hits"]/("Days")]];

I can even visualize these peaks with help of TimelinePlot:

TimelinePlot[{Labeled @@@ Normal[peaksU], Labeled @@@ Normal[peaksS]}, 
AxesOrigin -> Center, PlotLegends -> {"Ukraine", "Syria"}, PlotLayout -> "Vertical"]

enter image description here

Or I can put the peaks on the time series plots. It is also better to use log-scale to see smaller data patterns:

p1 = DateListLogPlot[{ukraineW, peaksU, syriaW, peaksS},
 PlotStyle -> {
   Automatic, Directive[Blue, PointSize[.01]],
   Pink, Directive[Red, PointSize[.01]]},
 Joined -> {True, False, True, False},
 PlotMarkers -> {"", {"\[FilledCircle]", 10}, "", {"\[FivePointedStar]", 12}}, AspectRatio -> 1/4, 
 ImageSize -> 800, Filling -> Bottom, PlotRange -> {{DateObject[{2013, 6}], Now}, All},
 PlotRangePadding -> {0, .15}, PlotLegends -> {"Ukraine", "peaks U", "Syria", "peaks S"}]

enter image description here

The point is to see to what historical events those peaks correspond. The Guardian lists a few important ones. Below I list my modified version, but we should remember these are just guesses and there is no proof what really induces spikes in Wikipedia visits. There are many close events and it is easy to miss or misinterpret. Also note slight shifts in dates between the event and Wikipedia data peak, - probably an indication of some duration and inertia in the process event >> mass media >> Wikipedia. Events I found reading online articles:

p2 = TimelinePlot[
  <|<|
    Style["S: Chemical weapons \n suspected by UN", Red] -> DateObject[{2013, 9}],
    Style["S: Palmyra UNESCO lost", Red] -> DateObject[{2015, 5, 21}],
    Style["S: Refugees rush out", Red] -> DateObject[{2015, 9, 15}],
    Style["S: Russian military intervention", Red] -> DateObject[{2015, 9, 30}]
    |>,
   <|
    "U: Maidan protests started " -> DateObject[{2013, 11, 25}],
    "U: New Anti-Protest Laws" -> DateObject[{2014, 1, 6}],
    "U: Maidan Exiles President" -> DateObject[{2014, 2, 21}],
    "U: Russia's annexation of Crimea" -> DateObject[{2014, 3, 18}],
    "U: War in Donbass" -> DateObject[{2014, 4, 18}],
    "U: Malaysia flight MH17 shot" -> DateObject[{2014, 7, 17}],
    "U: Donetsk airport lost" -> DateObject[{2015, 1, 21}],
    "U: Minsk II ceasefire" -> DateObject[{2015, 2, 11}]
    |>|>,
  PlotRange -> {DateObject[{2013, 6}], Now},PlotRangePadding -> 0, 
AspectRatio -> 1/3.5, ImageSize -> 800, AxesOrigin -> Top];

Note how carefully I select the sam values for PlotRange and PlotRangePadding in p1 and p2 plots. I also cheated a bit to save the space - you will need to set PlotLegends to None in p1 and remove Frame and Axes. But in the end here is the final line to make the 2nd image (animation) at the top:

Panel[Grid[{{p1}, {p2}}, Spacings -> {0, 0}]]

Please share your own ideas and code on similar data analysis (but not the political views please).

Attachments:
POSTED BY: Vitaliy Kaurov
6 Replies

Wow that's Huge. Thanks for the studies, that's why i love Data Science.

POSTED BY: Chrisss Tinerd

enter image description here - another post of yours has been selected for the Staff Picks group, congratulations! We are happy to see you at the top of the "Featured Contributor" board. Thank you for your wonderful contributions, and please keep them coming!

POSTED BY: EDITORIAL BOARD

Thanks, Kay. That call constructed automatically, you do not need to know the syntax. It is explained how to get it in:

Generally you can always check

WolframAlpha["Einstein", "DataRules"]

to see what computable data are available. While page hits are not yet available, still take a look also at other data at WikipediaData.

POSTED BY: Vitaliy Kaurov

Vitaliy, nice, but I don't really understand the alpha call:

WolframAlpha["ukraine syria", {{"PopularityPod:WikipediaStatsData", 1}, "ComputableData"}]

Is there documentation for this? what other sources are available but "Wikepediastatsdata"? In other words, how do I conduct my own research on some topic. Thanks.

POSTED BY: Kay Herbert

Interesting question. If I understand right what you mean, then probably visits' data on foreign (especially native to conflict zones) language Wikipedia pages and relative comparison with English page would show the difference. But how much of that is bias versus noise is a harder question. Do you have any ideas of bias detection? - It'd be interesting to try.

POSTED BY: Vitaliy Kaurov

really informative analysis.

can you detect a bias in coverage among your sources?

POSTED BY: Peter Barendse
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract