Group Abstract

Message Boards

WOLFRAM COMMUNITY

27.1K Views

6 Replies

26 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Staff Picks Data Science Economics Social Science Curated Data Graphics and Visualization Wolfram Language Computational Humanities

Measuring public interest in Syria and Ukraine from Wikipedia data

Vitaliy Kaurov

Vitaliy Kaurov, WOLFRAM Research

Posted 11 years ago

I intend to show some how to use programming to data mine current events, and I will not delve into the politics per se. Recently The Economist published an article where it attempts to measuring relative public interest in Syria and Ukraine from various internet sources, such as new media, search terms, and Wikipedia. Below the first image is from The Economist and the second is made with Wolfram Language (WL). It is easy to create such infographics, especially with handy Wolfram\|Alpha data and functions new like FindPeaks and TimelinePlot. Please take a note of WL interactivity in the second image. Tutorial is below the images. I will concentrate on the third, the last, plot from The Guardian. Wolfram\|Alpha servers have numerous curated data including Wikipedia popularity (weekly hits per day) for many specific English language pages. I will access the popularity data for Ukraine and Syria as {ukraine, syria} = WolframAlpha["ukraine syria", {{"PopularityPod:WikipediaStatsData", 1}, "ComputableData"}]; here is a sample, that as you can see contains date stamps and hits per day: Short[ukraine] Here is the plot: DateListPlot[{ukraine, syria}, PlotRange -> All, PlotTheme -> "Detailed", AspectRatio -> 1/4, ImageSize -> 800, PlotLegends -> {"ukraine", "syria"}] I will be interested in the events starting second half of 2013 and TimeSeriesWindow function will help us to cut those data out: {ukraineW, syriaW} = TimeSeriesWindow[TimeSeriesResample[TimeSeries[#]], {DateObject[{2013, 6}], Now}] & /@ {ukraine, syria}; FindPeaks is a nice function that will help me to find only those peaks whose value is above 10^4 hits per day: peaksU = FindPeaks[TimeSeriesResample[TimeSeries[ukraineW]], 0, Quantity[0, IndependentUnit["hits"]/("Days")], Quantity[10000, IndependentUnit["hits"]/("Days")]]; peaksS = FindPeaks[TimeSeriesResample[TimeSeries[syriaW]], 0, Quantity[0, IndependentUnit["hits"]/("Days")], Quantity[10000, IndependentUnit["hits"]/("Days")]]; I can even visualize these peaks with help of TimelinePlot: TimelinePlot[{Labeled @@@ Normal[peaksU], Labeled @@@ Normal[peaksS]}, AxesOrigin -> Center, PlotLegends -> {"Ukraine", "Syria"}, PlotLayout -> "Vertical"] Or I can put the peaks on the time series plots. It is also better to use log-scale to see smaller data patterns: p1 = DateListLogPlot[{ukraineW, peaksU, syriaW, peaksS}, PlotStyle -> { Automatic, Directive[Blue, PointSize[.01]], Pink, Directive[Red, PointSize[.01]]}, Joined -> {True, False, True, False}, PlotMarkers -> {"", {"\[FilledCircle]", 10}, "", {"\[FivePointedStar]", 12}}, AspectRatio -> 1/4, ImageSize -> 800, Filling -> Bottom, PlotRange -> {{DateObject[{2013, 6}], Now}, All}, PlotRangePadding -> {0, .15}, PlotLegends -> {"Ukraine", "peaks U", "Syria", "peaks S"}] The point is to see to what historical events those peaks correspond. The Guardian lists a few important ones. Below I list my modified version, but we should remember these are just guesses and there is no proof what really induces spikes in Wikipedia visits. There are many close events and it is easy to miss or misinterpret. Also note slight shifts in dates between the event and Wikipedia data peak, - probably an indication of some duration and inertia in the process event >> mass media >> Wikipedia. Events I found reading online articles: p2 = TimelinePlot[ <\|<\| Style["S: Chemical weapons \n suspected by UN", Red] -> DateObject[{2013, 9}], Style["S: Palmyra UNESCO lost", Red] -> DateObject[{2015, 5, 21}], Style["S: Refugees rush out", Red] -> DateObject[{2015, 9, 15}], Style["S: Russian military intervention", Red] -> DateObject[{2015, 9, 30}] \|>, <\| "U: Maidan protests started " -> DateObject[{2013, 11, 25}], "U: New Anti-Protest Laws" -> DateObject[{2014, 1, 6}], "U: Maidan Exiles President" -> DateObject[{2014, 2, 21}], "U: Russia's annexation of Crimea" -> DateObject[{2014, 3, 18}], "U: War in Donbass" -> DateObject[{2014, 4, 18}], "U: Malaysia flight MH17 shot" -> DateObject[{2014, 7, 17}], "U: Donetsk airport lost" -> DateObject[{2015, 1, 21}], "U: Minsk II ceasefire" -> DateObject[{2015, 2, 11}] \|>\|>, PlotRange -> {DateObject[{2013, 6}], Now},PlotRangePadding -> 0, AspectRatio -> 1/3.5, ImageSize -> 800, AxesOrigin -> Top]; Note how carefully I select the sam values for PlotRange and PlotRangePadding in p1 and p2 plots. I also cheated a bit to save the space - you will need to set PlotLegends to None in p1 and remove Frame and Axes. But in the end here is the final line to make the 2nd image (animation) at the top: Panel[Grid[{{p1}, {p2}}, Spacings -> {0, 0}]] Please share your own ideas and code on similar data analysis (but not the political views please). Attachments:

POSTED BY: Vitaliy Kaurov

6 Replies

Sort By:

Chrisss Tinerd

Chrisss Tinerd, Work as Data Scientist for a company

Posted 9 years ago

Wow that's Huge. Thanks for the studies, that's why i love Data Science.

POSTED BY: Chrisss Tinerd

EDITORIAL BOARD

EDITORIAL BOARD, WOLFRAM

Posted 9 years ago

- another post of yours has been selected for the Staff Picks group, congratulations! We are happy to see you at the top of the "Featured Contributor" board. Thank you for your wonderful contributions, and please keep them coming!

POSTED BY: EDITORIAL BOARD

Kay Herbert

Kay Herbert, Chief scientist, Sloan Valve Co.

Posted 11 years ago

Vitaliy, nice, but I don't really understand the alpha call: WolframAlpha["ukraine syria", {{"PopularityPod:WikipediaStatsData", 1}, "ComputableData"}] Is there documentation for this? what other sources are available but "Wikepediastatsdata"? In other words, how do I conduct my own research on some topic. Thanks.

POSTED BY: Kay Herbert

Vitaliy Kaurov

Vitaliy Kaurov, WOLFRAM Research

Posted 11 years ago

Thanks, Kay. That call constructed automatically, you do not need to know the syntax. It is explained how to get it in: My reply to How to get FinancialData[] Exchange Rates? Data Formats in Wolfram\|Alpha tutorial - search for "computable data". Generally you can always check WolframAlpha["Einstein", "DataRules"] to see what computable data are available. While page hits are not yet available, still take a look also at other data at WikipediaData.

POSTED BY: Vitaliy Kaurov

Peter Barendse

Peter Barendse, Wolfram Research

Posted 11 years ago

really informative analysis. can you detect a bias in coverage among your sources?

POSTED BY: Peter Barendse

Vitaliy Kaurov

Vitaliy Kaurov, WOLFRAM Research

Posted 11 years ago

Interesting question. If I understand right what you mean, then probably visits' data on foreign (especially native to conflict zones) language Wikipedia pages and relative comparison with English page would show the difference. But how much of that is bias versus noise is a harder question. Do you have any ideas of bias detection? - It'd be interesting to try.

POSTED BY: Vitaliy Kaurov

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback