Message Boards Message Boards

Wikipedia Category browser 10.1 update with WikipediaData

Posted 9 years ago

I updated my Wikipedia category traffic browser to use the new WikipediaData in 10.1. It shortened the code significantly. I also recently switched to using monthly summary traffic files, instead of using the frequently slow web service that requires you to get the traffic one page at a time. This gave a huge performance increase in exchange for an hour of initialization and 4 GB of memory usage. I now have the luxury to look any category that briefly catches my curiosity.

Recent uses for others:

  • A friend from college works at an architecture firm. He learned about several significant, new architects from the most popular articles in the 21st century architects category.
  • My sister is going to nursing school. She clicked on Medicine->Medicine in society->Medical scandals and learned about the Chicago Tylenol murders because that article floated to near the top.

Short preview GIF animation below and then code below that. Enjoy!

enter image description here

(* download and extract monthly traffic file from \
http://dumps.wikimedia.org/other/pagecounts-ez/merged/ *)
str = OpenRead[
   "E:\\Wiki\\traffic\\Uncompressed\\pagecounts-2015-02-views-ge-5-\
totals"];
(* can take almost an hour to generate the article traffic \
association, uses about 4 GB of memory *)
pageTraffic = 
  Association@
   Reap[While[True, 
      Read[str, {Word, Word, Number}] // 
       If[# === EndOfFile, Break[], 
         If[#[[1]] == "en.z", Sow[URLDecode@#[[2]] -> #[[3]]]]] &]][[
    2, 1]];

traffic[category_] := <|"name" -> #, 
    "traffic" -> pageTraffic@StringReplace[#, " " -> "_"]|> & /@ 
  WikipediaData["Category" -> category, "CategoryMembers"]

updatePages[category_] := (AppendTo[history, category]; 
  pages = traffic[current = category])
updatePages[category_, "Append"] := 
 pages = DeleteDuplicates@Join[pages, traffic@category]

history = {}; updatePages@"Main topic classifications"; \
onlyCategories = False;

Panel@Column@{Dynamic[
    ToString@
      Length@If[onlyCategories, 
        Select[pages, StringMatchQ[#name, "Category:*"] &], pages] <> 
     " pages"], 
   Row@{Button["<", updatePages[current = history[[-2]]]; 
      history = history[[;; -3]], 
      Enabled -> Dynamic@If[Length@history > 1, True, False]], 
     InputField[Dynamic[current, updatePages@# &], String], 
     " Only categories:", Checkbox@Dynamic@onlyCategories}, 
   Pane[Dynamic@
     Grid@MapIndexed[{Button["x", pages = DeleteCases[pages, #]], 
         If[StringMatchQ[#name, "Category:*"], 
          Button["+", 
           updatePages[StringDrop[#name, StringLength@"Category:"], 
            "Append"]; pages = DeleteCases[pages, #]]], 
         If[StringMatchQ[#[[1]], "Category:*"], 
          Button[">", 
           updatePages@
            StringDrop[#name, StringLength@"Category:"]]], #2[[1]], 
         Hyperlink[#name, 
          "http://en.wikipedia.org/wiki/" <> 
           URLEncode@StringReplace[#name, " " -> "_"]], #traffic} &, 
       SortBy[If[onlyCategories, 
         Select[pages, StringMatchQ[#name, "Category:*"] &], 
         pages], -#traffic &]], ImageSize -> {500, 600}, 
    Scrollbars -> {False, Automatic}]}
POSTED BY: Michael Hale
2 Replies

Great, Michael, thanks for sharing! What do you think about WikipediaData? I haven't had a chance to play with it yet. Anything you wished it would have?

POSTED BY: Sam Carrettie
Posted 9 years ago

I think it's a very nice wrapper around the MediaWiki API. There's nothing I would immediately add. Interesting future developments will be potential integration of Wikidata information into the Entity related functions if Wikidata matures well and has significant crowdsourced momentum.

For example, the first example in WikipediaData uses the moon. If I look at

EntityProperties[Entity["PlanetaryMoon", "Moon"]]

I get a pretty long list. If I click on the Wikidata item link for the moon on the left side of the article, I get a much shorter list, but it has a few potentially nice additions: a richer class hierarchy, age since formation, link to image gallery with permissive licenses.

POSTED BY: Michael Hale
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract