Message Boards Message Boards

Analyzing a Dataset of Game Releases

Attachments:
POSTED BY: Rob Lockhart
7 Replies

enter image description here - you earned "Featured Contributor" badge, congratulations !

This is a great post and it has been selected for the curated Staff Picks group. Your profile is now distinguished by a "Featured Contributor" badge and displayed on the "Featured Contributor" board.

POSTED BY: Moderation Team

Oh, one more thing. Motivated by a post on the wolfram blog by Matthias Odisio you can, of course, also work with the covers of the video games. You can download a zip file with the covers from

http://www.gametdb.com/download.php?FTP=GameTDB-wii_cover-EN-2015-07-22.zip

You can then unzip the file and -after adjusting your file path- run:

covers = Import["~/Desktop/cover/EN/*", "PNG"];

There are more than 3000 covers in that dataset. Running the following takes to long now, so I only use 40 random covers to illustrate the idea, i.e. the same code that Matthias Odisio used:

covers2 = RandomChoice[covers, 40];
imagedistances = ConstantArray[0., {Length[covers2], Length[covers2]}];
Monitor[Do[d = ImageDistance[covers2[[i]], covers2[[j]], DistanceFunction -> "EarthMoverDistance"];
imagedistances[[i, j]] = imagedistances[[j, i]] = d,{i, 1, Length[covers2] - 1}, {j, i + 1, Length[covers2]}];, {i, j}];
allimagedistances = Flatten[Table[Diagonal[imagedistances, k], {k, 1, Length[covers2] - 1}]];

He then plotted everything like so:

thr = FindThreshold[allimagedistances, Method -> {"BlackFraction", .05}];
adjmatrix = 1 - Unitize[Threshold[imagedistances, thr]] - IdentityMatrix[Length[covers2]];
GraphPlot[adjmatrix, VertexRenderingFunction -> (Inset[covers2[[#2]], #, Center, .5] &), Method -> "SpringEmbedding", ImageSize -> Full]

enter image description here

None of this is my idea; all of the credit goes to Matthias Odisio. I only post it here, because the idea seems to fit nicely.

Cheers,

Marco

POSTED BY: Marco Thiel

There is, of course, a lot more you can do. For example we can use the following website:

http://thegamesdb.net

This allows us to crosscheck the data we have looked at before. So if we take the names list from before:

data = Import["http://pastebin.com/DG1CsVXk", "Data"];
Quiet[names = (StringSplit[#, "("] & /@ data[[2, 2, 3 ;;]][[1 ;;]])[[All, 1]]];

We can use:

smalldataset = 
 Quiet[{"id" -> 
      Flatten[StringSplit[StringSplit[#, "<id>"], "</id>"]][[1]], 
     "GameTitle" -> 
      Flatten[StringSplit[StringSplit[#, "<GameTitle>"], 
         "</GameTitle>"]][[2]], 
     If[StringContainsQ[
       Flatten[StringSplit[StringSplit[#, "<ReleaseDate>"], 
          "</ReleaseDate>"]][[2]], "Platform"], 
      "ReleaseDate" -> "Missing", 
      "ReleaseDate" -> 
       Interpreter["Date"][ 
        Flatten[StringSplit[StringSplit[#, "<ReleaseDate>"], 
           "</ReleaseDate>"]][[2]]]]} & /@ ((StringSplit[#, 
         "<Game>\n"] & @(Import[
           "http://thegamesdb.net/api/GetGamesList.php?name=" <> #] \
&@ RandomChoice[names]))[[2 ;;]])]

To make a nice list of rules. Note that your database is much larger so many queries on http://thegamesdb.net will give empty sets or worse errors. Anyways, we an the use fancy things like

TimelinePlot[Association["GameTitle" -> "ReleaseDate" /. smalldataset]]

To obtain

enter image description here

This command gives 100 games:

smalldataset = 
 Quiet[{"id" -> 
      Flatten[StringSplit[StringSplit[#, "<id>"], "</id>"]][[1]], 
     "GameTitle" -> 
      Flatten[StringSplit[StringSplit[#, "<GameTitle>"], 
         "</GameTitle>"]][[2]], 
     If[StringContainsQ[
       Flatten[StringSplit[StringSplit[#, "<ReleaseDate>"], 
          "</ReleaseDate>"]][[2]], "Platform"], 
      "ReleaseDate" -> "Missing", 
      "ReleaseDate" -> 
       Interpreter["Date"][ 
        Flatten[StringSplit[StringSplit[#, "<ReleaseDate>"], 
           "</ReleaseDate>"]][[2]]]]} & /@ ((StringSplit[#, 
         "<Game>\n"] & @(Import[
           "http://thegamesdb.net/api/GetGamesList.php?name=" <> #] & /@ Import[
          "http://thegamesdb.net/api/GetGamesList.php?platform=PC"]))[[2 ;;]])]

We can again plot the TimeLinePlot:

TimelinePlot[Association[Select["GameTitle" -> "ReleaseDate" /. smalldataset, DateObjectQ[#[[2]]] &]]]

It is much nicer when it is interactive in the notebook, but it looks like this:

enter image description here

That shows quite nicely how much the market has grown. It also suggests clusters of release dates.

It is very easy to make a nice, orderly dataset out of this:

Dataset[Association /@ smalldataset]

enter image description here

There is certainly lots more to discover here.

Cheers,

Marco

POSTED BY: Marco Thiel

Hi Rob,

this is really nice. I haven't had much time, but I liked this representation:

data = Import["http://pastebin.com/DG1CsVXk", "Data"];
Quiet[names = (StringSplit[#, "("] & /@ data[[2, 2, 3 ;;]][[1 ;;]])[[All, 1]]];
WordCloud[DeleteStopwords@(ToString /@ DeleteCases[Flatten@(TextWords /@ DeleteDuplicates[Select[names, StringQ[#] &]]), {}])]

enter image description here

It gives an idea of what people are interested about in games. The Quiet function indicates that I was too lazy to deal with the cleaning of the data properly.

It is easy to generate a word cloud for different periods in time.

Cheers,

Marco

POSTED BY: Marco Thiel

Well done. Thanks! Do you mind if I post this on Gamasutra.com (where this article is cross-posted) as well?

With attribution of course.

POSTED BY: Rob Lockhart

Sure, no problem at all.

If you could provide your table with all the words in each year, or your initial data file as an attachment, we could make a very nice BubbleChart diagram over different years. I scraped the data in a very crude way, because the website did not load properly in my browser.

You might also be able to use google trends to cross-check the popularity of these games. Also, many of these games are described in great detail in wikipedia, which in Mathematica 10.2 is part of the Wolfram Language, i.e. is easy to access in the WL. You have all the titles of the games and could scrape useful information from Wikipedia. With that some really cool diagrams should be possible.

Cheers, Marco

POSTED BY: Marco Thiel

Dear Rob,

very nice! Thanks for sharing. A small comment: you might want to use DeleteStopwords instead of your nontrivial.

Thanks,

Marco

POSTED BY: Marco Thiel
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract