I am working on developing a tool in Wolfram Language to correlate user ratings on the website Board Game Geek (BGG, www.boardgamegeek.com). Users on the site can rate games with a value from 1-10, and my initial goal is to allow someone to check their ratings against those of other users to find other users whose tastes generally match their own.
Part of this involves, of course, grabbing all of a user's rated games. BGG has an API which allows this. To access it, one does this:
userName = "skutsch";
urlUser = "https://www.boardgamegeek.com/xmlapi2/collection?username=" <> userName <> "&rated=1";
r1 = Import[urlUser, "XML"];
When that call is made, BGG checks if a data file already exists for that user. If so, it serves the data as XML. If not, it sends an HTTP code of 202 and returns a message that it is preparing the data. Accessing the link again a few seconds later usually then results in an HTTP code of 200 and the XML data. (The data is saved by BGG until either the user makes changes to their collection or a number of days pass by. I've been hitting these users as test cases so they will probably serve data on the first try.)
When it comes to the XML results, I am clueless. I've read the documentation for XML in WL and haven't been able to digest it.
The data very roughly looks like this:
<items totalitems="318" ... >
<item objecttype="thing" objectid="177590" subtype="boardgame"...>
<name ...>
<blah>
<blah>
<stats ...>
<rating value="7.5">
<blah/>
<blah/>
</rating>
</stats>
<blah>
</item>
<item objecttype="thing" objectid="68448" subtype="boardgame"...>
<name ...>
<blah>
<blah>
<stats ...>
<rating value="6.5">
<blah/>
<blah/>
</rating>
</stats>
<blah>
</item>
<...many more items...>
</items>
As I have editorially indicated, I'm only interested in a few values here. What I want to parse out is:
- for every <item> where <item subtype="boardgame">:
- get <item objectid="xxxxx">
- get <rating value="yy">
I was trying to do this in a very brute force matter thusly:
a = <|"gameID" -> Values[r1[[2, 3, i, 2, 2]]], "userName" -> username,
"rating" -> ToExpression @@ Values[r1[[2, 3, i, 3, 5, 3, 1, 2]]]|>
basically going to the exact location of the data and iterating through it (i = 1 to <items totalitems="i">). Not great, but seemed to work. I'm lazy, so I liked that it kept me from having to figure out the XML.
Unfortunately there's a snag. If you look at the data where userName="Legomancer", you hit a problem at item 129 (<item objectid="1231">). That item has an additional element that others don't have:
<item objecttype="thing" objectid="1231" subtype="boardgame" collid="6163073">
<name sortindex="1">Bandu</name>
<originalname>Bausack</originalname>
<yearpublished>1987</yearpublished>
That <originalname> element shifts the rest of the fields, wrecking my brainless strategy and throwing an error. So I guess I need to learn how to use XML after all.
So while I read up again on XML and knock at it with some trial and error, if anyone could point me towards a path, that would be super helpful.