Message Boards Message Boards

Analyzing a Dataset of Game Releases

Attachments:
POSTED BY: Rob Lockhart
7 Replies

enter image description here - you earned "Featured Contributor" badge, congratulations !

This is a great post and it has been selected for the curated Staff Picks group. Your profile is now distinguished by a "Featured Contributor" badge and displayed on the "Featured Contributor" board.

POSTED BY: Moderation Team

Oh, one more thing. Motivated by a post on the wolfram blog by Matthias Odisio you can, of course, also work with the covers of the video games. You can download a zip file with the covers from

http://www.gametdb.com/download.php?FTP=GameTDB-wii_cover-EN-2015-07-22.zip

You can then unzip the file and -after adjusting your file path- run:

covers = Import["~/Desktop/cover/EN/*", "PNG"];

There are more than 3000 covers in that dataset. Running the following takes to long now, so I only use 40 random covers to illustrate the idea, i.e. the same code that Matthias Odisio used:

covers2 = RandomChoice[covers, 40];
imagedistances = ConstantArray[0., {Length[covers2], Length[covers2]}];
Monitor[Do[d = ImageDistance[covers2[[i]], covers2[[j]], DistanceFunction -> "EarthMoverDistance"];
imagedistances[[i, j]] = imagedistances[[j, i]] = d,{i, 1, Length[covers2] - 1}, {j, i + 1, Length[covers2]}];, {i, j}];
allimagedistances = Flatten[Table[Diagonal[imagedistances, k], {k, 1, Length[covers2] - 1}]];

He then plotted everything like so:

thr = FindThreshold[allimagedistances, Method -> {"BlackFraction", .05}];
adjmatrix = 1 - Unitize[Threshold[imagedistances, thr]] - IdentityMatrix[Length[covers2]];
GraphPlot[adjmatrix, VertexRenderingFunction -> (Inset[covers2[[#2]], #, Center, .5] &), Method -> "SpringEmbedding", ImageSize -> Full]

enter image description here

None of this is my idea; all of the credit goes to Matthias Odisio. I only post it here, because the idea seems to fit nicely.

Cheers,

Marco

POSTED BY: Marco Thiel

There is, of course, a lot more you can do. For example we can use the following website:

http://thegamesdb.net

This allows us to crosscheck the data we have looked at before. So if we take the names list from before:

data = Import["http://pastebin.com/DG1CsVXk", "Data"];
Quiet[names = (StringSplit[#, "("] & /@ data[[2, 2, 3 ;;]][[1 ;;]])[[All, 1]]];

We can use:

smalldataset = 
 Quiet[{"id" -> 
      Flatten[StringSplit[StringSplit[#, "<id>"], "</id>"]][[1]], 
     "GameTitle" -> 
      Flatten[StringSplit[StringSplit[#, "<GameTitle>"], 
         "</GameTitle>"]][[2]], 
     If[StringContainsQ[
       Flatten[StringSplit[StringSplit[#, "<ReleaseDate>"], 
          "</ReleaseDate>"]][[2]], "Platform"], 
      "ReleaseDate" -> "Missing", 
      "ReleaseDate" -> 
       Interpreter["Date"][ 
        Flatten[StringSplit[StringSplit[#, "<ReleaseDate>"], 
           "</ReleaseDate>"]][[2]]]]} & /@ ((StringSplit[#, 
         "<Game>\n"] & @(Import[
           "http://thegamesdb.net/api/GetGamesList.php?name=" <> #] \
&@ RandomChoice[names]))[[2 ;;]])]

To make a nice list of rules. Note that your database is much larger so many queries on http://thegamesdb.net will give empty sets or worse errors. Anyways, we an the use fancy things like

TimelinePlot[Association["GameTitle" -> "ReleaseDate" /. smalldataset]]

To obtain

enter image description here

This command gives 100 games:

smalldataset = 
 Quiet[{"id" -> 
      Flatten[StringSplit[StringSplit[#, "<id>"], "</id>"]][[1]], 
     "GameTitle" -> 
      Flatten[StringSplit[StringSplit[#, "<GameTitle>"], 
         "</GameTitle>"]][[2]], 
     If[StringContainsQ[
       Flatten[StringSplit[StringSplit[#, "<ReleaseDate>"], 
          "</ReleaseDate>"]][[2]], "Platform"], 
      "ReleaseDate" -> "Missing", 
      "ReleaseDate" -> 
       Interpreter["Date"][ 
        Flatten[StringSplit[StringSplit[#, "<ReleaseDate>"], 
           "</ReleaseDate>"]][[2]]]]} & /@ ((StringSplit[#, 
         "<Game>\n"] & @(Import[
           "http://thegamesdb.net/api/GetGamesList.php?name=" <> #] & /@ Import[
          "http://thegamesdb.net/api/GetGamesList.php?platform=PC"]))[[2 ;;]])]

We can again plot the TimeLinePlot:

TimelinePlot[Association[Select["GameTitle" -> "ReleaseDate" /. smalldataset, DateObjectQ[#[[2]]] &]]]

It is much nicer when it is interactive in the notebook, but it looks like this:

enter image description here

That shows quite nicely how much the market has grown. It also suggests clusters of release dates.

It is very easy to make a nice, orderly dataset out of this:

Dataset[Association /@ smalldataset]

enter image description here

There is certainly lots more to discover here.

Cheers,

Marco

POSTED BY: Marco Thiel

Hi Rob,

this is really nice. I haven't had much time, but I liked this representation:

data = Import["http://pastebin.com/DG1CsVXk", "Data"];
Quiet[names = (StringSplit[#, "("] & /@ data[[2, 2, 3 ;;]][[1 ;;]])[[All, 1]]];
WordCloud[DeleteStopwords@(ToString /@ DeleteCases[Flatten@(TextWords /@ DeleteDuplicates[Select[names, StringQ[#] &]]), {}])]

enter image description here

It gives an idea of what people are interested about in games. The Quiet function indicates that I was too lazy to deal with the cleaning of the data properly.

It is easy to generate a word cloud for different periods in time.

Cheers,

Marco

POSTED BY: Marco Thiel

Well done. Thanks! Do you mind if I post this on Gamasutra.com (where this article is cross-posted) as well?

With attribution of course.

POSTED BY: Rob Lockhart
POSTED BY: Marco Thiel

Dear Rob,

very nice! Thanks for sharing. A small comment: you might want to use DeleteStopwords instead of your nontrivial.

Thanks,

Marco

POSTED BY: Marco Thiel
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract