Message Boards Message Boards

Analysis of the Wolfram Community

The Wolfram Community is now ~4.5 years 'old'. So time to do some analysis… let's go!

Let's download all the threads-titles, their votes, their authors et cetera:

SetDirectory[NotebookDirectory[]];
$HistoryLength=1;
xml=Import["http://community.wolfram.com/dashboard/-/discussions-list/all+groups/Any+discussions/none/active/full/10000/1/filter","XMLObject"];
Export["website.mx",xml]

This will save that output to website.mx file next to your notebook.

Here is a small function that extract the relevant data:

ClearAll[GetThreadProperties]
GetThreadProperties[threadxml_]:=Module[{url,title,creator,creatorurl,views,replies,votes},
    {url,title}=FirstCase[threadxml,XMLElement["h3",{"class"->"asset-title"},{_,XMLElement["a",{"shape"->"rect","href"->url_},{title_}],_}]:>{url,title},{Missing[],Missing[]},\[Infinity]];
    {creator,creatorurl}=FirstCase[threadxml,XMLElement["span",{"class"->"metadata-entry"},{_,XMLElement["span",{"class"->"asset-meta-bold"},{"CREATED BY: "}],_,XMLElement["span",{"class"->"asset-meta-normal"},{_,XMLElement["a",{"shape"->"rect","href"->creatorprofileurl_},{creator_}],_}],_}]:>{creator,creatorprofileurl},{Missing[],Missing[]},\[Infinity]];
    views=FirstCase[threadxml,XMLElement["div",{"class"->"views stats"},{_,XMLElement["div",{"class"->_},{views_}],_,XMLElement["div",{"class"->_},{"VIEWS"}],_}]:>views,Missing[],\[Infinity]];
    replies=FirstCase[threadxml,XMLElement["div",{"class"->"replies stats"},{_,XMLElement["div",{"class"->_},{replies_}],_,XMLElement["div",{"class"->_},{"REPLIES"}],_}]:>replies,Missing[],\[Infinity]];
    votes=FirstCase[threadxml,XMLElement["div",{"class"->"votes stats"},{_,XMLElement["div",{"class"->_},{votes_}],_,XMLElement["div",{"class"->_},{"VOTES"}],_}]:>votes,Missing[],\[Infinity]];
    views=If[StringEndsQ[views,"K"],1000ToExpression[StringDrop[views,-1]],ToExpression@views];
    replies=If[StringEndsQ[replies,"K"],1000ToExpression[StringDrop[replies,-1]],ToExpression@replies];
    votes=If[StringEndsQ[votes,"K"],1000ToExpression[StringDrop[votes,-1]],ToExpression@votes];
    <|"url"->url,"title"->title,"creator"->creator,"views"->views,"replies"->replies,"votes"->votes,"createrurl"->creatorurl|>
]

Import the data from the mx file, and then get the relevant values from it using the above function, store in a dataset:

tmp=Import["website.mx"];
tmp=Cases[tmp,XMLElement["div",{"class"->"asset-abstract default-asset-publisher","style"->_},___],\[Infinity]];
ds=Dataset[GetThreadProperties/@tmp];

Here an example:

enter image description here

Let's start simple and get the number of threads, the number of views, votes, and replies:

Length[ds]
ds[Total,{"views","votes","replies"}]

enter image description here

Nearly 8000 topics and nearing 15 mega-views!

Let's check the number of replies for each topic:

ListLogPlot[Tally[Normal[ds[All, "replies"]]], AxesLabel -> {"Number of replies", "Number of threads"}]

giving:

enter image description here

We can also plot the number of votes vs rank:

ListLogLogPlot[Flatten[Normal[Values[ds[Reverse@*SortBy[#votes&]][All,{"votes"}]]]],PlotRange->All,AxesLabel->{"Rank","Number of votes"},PlotMarkers->{Automatic,Medium}]

enter image description here

Let's look into my posts:

ds[Select[#creator=="Sander Huisman"&]/*Reverse@*SortBy[#votes&]]

enter image description here

I'm very happy to see some of my posts got ~50 votes.

Let's have a look at the authors, who is the most active in making threads:

data=SortBy[Tally[Flatten[Normal[Values[ds[[All,{"creator"}]]]]]],Minus@*Last][[;;50]];
BarChart[Association[Rule@@@Reverse@data],ChartLabels->Automatic,BarOrigin->Left,Frame->True,PerformanceGoal->"Speed",AspectRatio->GoldenRatio/2,ChartStyle->Directive[EdgeForm[{Thickness[Medium],Black,Opacity[1]}],RGBColor[0,0.5,1]],FrameStyle->Black,FrameTicks->{{Automatic,Automatic},{All,All}},ImageSize->650,PlotRange->{0,160},PlotRangePadding->{None,{None,Scaled[0.01]}},BarSpacing->None,PlotLabel->Style["Number of threads",16,Black]]

enter image description here

@Clayton Shonkwiler is by far the author with the most posts.

We can also check by number of votes:

BarChart[Sort[GroupBy[Normal[Values[ds[[All,{"creator","votes"}]]]],First->Last,Total]][[-100;;]],ChartLabels->Automatic,BarOrigin->Left,Frame->True,PerformanceGoal->"Speed",ScalingFunctions->"Log",AspectRatio->GoldenRatio,ChartStyle->Directive[EdgeForm[{Thickness[Medium],Black,Opacity[1]}],RGBColor[0,0.5,1]],FrameStyle->Black,FrameTicks->{{Automatic,Automatic},{All,All}},ImageSize->650,PlotRange->{20,3000},PlotRangePadding->{None,{None,Scaled[0.01]}},PlotLabel->Style["Total number of votes",16,Black]]

enter image description here

I was surprised to see I end up second in this list!

Finally let's make a word cloud of the topic-titles:

WordCloud[ToLowerCase[StringRiffle[Flatten[Normal[Values[ds[[All,{"title"}]]]]]]],MaxItems->150]

enter image description here

Hope you enjoyed this little exploration, even further analysis would be to download all the threads, but that would be quite the undertaking without access to the database directly…

POSTED BY: Sander Huisman
3 Replies

enter image description here - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming!

POSTED BY: EDITORIAL BOARD

Thanks Henrik! Now I have to update the post ;-)

POSTED BY: Sander Huisman

Very nice and interesting! ==> One more vote for Sander Huisman!

POSTED BY: Henrik Schachner
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract