Message Boards Message Boards

GROUPS:

Open Source Deep Learning framework GitHub Stars

Posted 5 months ago
1059 Views
|
3 Replies
|
13 Total Likes
|

I use Wolfram Language as a Deep Learning framework. When I joined "Machine Learning Webinar Series" in April, I knew that many participants used open source Deep Learning frameworks in the questionnaire before the start. TensorFlow is used by around me, but I didn't know PyTorch is popular. So I checked the popularity of Deep Learning framework : TensorFlow, Caffe, PyTorch, CNTK, MXNet. The method is the number of stars of each GitHub repository. It is so easy for Wolfram Language to scrape, crawl and visualize them like star-history.

enter image description here

Access token

The GitHub API has the following restriction. I will call many api from now on, so I need to get an access token.

For API requests using Basic Authentication or OAuth, you can make up to 5000 requests per hour. For unauthenticated requests, the rate limit allows for up to 60 requests per hour.

token = "token abcdefghijklmnopqrstuvwxyz0123456789abcd";  (* sample *)

Total number of stars

In the case of TensorFlow the total number of the stars and the date when the last star has been updated can be found below.

totalstarsURL[r_String] := 
  StringTemplate["https://api.github.com/repos/`r`"][<|"r" -> r|>];
gettotalstars[r_String] := Module[{req, res},
  req = HTTPRequest[totalstarsURL[r],
      <|"Headers" -> {"Authorization" -> token}|>];
  res = Association@Import[req, "JSON"];
  {DateObject@res["updated_at"], res["stargazers_count"]}
  ]
gettotalstars["tensorflow/tensorflow"] 

enter image description here

History of star-gazers

The URL that gets 100 star-gazers per one page is

stargazersURL[r_String, n_Integer] := 
  totalstarsURL[r] <> "/stargazers?page=" <> ToString[n] <> "&per_page=100";

The URL that collects the first 100 star-gazers of TensorFlow is

stargazersURL["tensorflow/tensorflow", 1]

enter image description here

The GitHub API will tell me the maximum number of pages in the Link Header.

getpageMax[r_String] := Module[{req, res, max},
  req = HTTPRequest[stargazersURL[r, 1],
      <|"Headers" -> 
      {"Authorization" -> token, 
       "Accept" -> "application/vnd.github.v3.star+json"}|>];
  res = URLRead[req, "Headers"];
  max = StringCases[(Association@res)[["link"]], 
     RegularExpression["next.*?page=(\\d+).*?last"] -> "$1"][[1]]; 
  ToExpression[max]
  ]
getpageMax["tensorflow/tensorflow"]

enter image description here

The total number of the stars of TensoFlow is 100,266. So the number of pages could be 1,003(100,266/100). However, the maximum is 400 due to the restriction of the GitHub API.

I define the function that gets one -page and all the pages. Then I define the function that gets the date when each star has been updated from pages.

getpage[r_String, n_Integer] := Module[{req, res},
  req = HTTPRequest[stargazersURL[r, n],
      <|"Headers" -> 
      {"Authorization" -> token, 
       "Accept" -> "application/vnd.github.v3.star+json"}|>];
  Association /@ Import[req, "JSON"]
  ]    
getpages[r_String] := Module[{pageMax, pages},
  pageMax = getpageMax[r];
  pages = getpage[r, #] & /@ Range[pageMax]
  ]    
getstardates[pages_List] := Module[{res},
  res = #["starred_at"] & /@ Flatten[pages];
  Select[DateObject /@ res, DateObjectQ]
  ]

Now that we are ready, let's get all pages and dates on TensorFlow.

pages = getpages["tensorflow/tensorflow"];
dates = getstardates[pages];

I convert these dates into a list for DateListPlot. However, I need to consider that I can only get up to 40, 000(400*100) dates. That is, if the total number of stars exceeds 40, 000, I need to append the date of the last star and the total number of stars to the list. All the dates are unnecessary if it is only to show a history of stars, some samples are enough like star-history. However, I will use all the pages for another purpose later.

plotlist[r_String, dates_List] := 
 Module[{pageMax, total, lastdate, pl},
  pageMax = getpageMax[r];
  {lastdate, total} = gettotalstars[r];
  pl = Transpose[{dates, Range[Length[dates]]}];
  If[pageMax == 400 && dates[[-1]] != lastdate, 
   AppendTo[pl, {lastdate, total}], pl]
  ]

Let's show the history of stars of TensorFlow.

tfd = plotlist["tensorflow/tensorflow", dates]; 
DateListPlot[tfd, "Day", PlotRange -> All, 
 FrameLabel -> {None, "GitHub Stars"}, PlotTheme -> "Detailed"  ]

enter image description here

The similar processes are performed for MXNet, CNTK, Caffe and PyTorch. As of May 2018 it is as follows. TensorFlow is overwhelming popular, and PyTorch seems to be high attention recently. This trend is the same as the above questionnaire result.

enter image description here

Further : Locations of star-gazers

I continue to use the GitHub API. I check the locations of star-gazers.

First of all, I will get URLs of star-gazers from all the pages.

getpersonURLs[pages_List] := Module[{},
  Union[("url" /. ("user" /. #)) & /@ Flatten[pages]]
  ]
purls = getpersonURLs[pages];

Next, I will get location information from each page of star-gazers.

getlocations[url_String] := Module[{req, res, loc},
  req = HTTPRequest[url,
      <|"Headers" -> {"Authorization" -> token}|>];
  res = Import[req, "JSON"];
  loc = "location" /. res
  ]

In the case of TensorFlow the number of URLs of star-gazers is 40,000, so it may have to be splitted every 5,000 in one hour depending on the network performance.

locs = getlocations /@ purls;

And then I will get the country names from them by using SemanticInterpretation. Since the city name may be written as a location, I will get the country name from the city name with EntityValue.

getCountry[loc_String] := Module[{s, tmp},
  s = SemanticInterpretation[loc];
  tmp = If[s[[1]] == "Country", s, EntityValue[s, "Country"]]
  ]
countries = getCountry /@ locs;

Because countries contain noises that are not country names, I remove this noise. And then I select Top 8.

s = Select[countries, SameQ[#[[1]], "Country"] &];
country8tf = Sort[Tally[s], #1[[2]] > #2[[2]] &][[1 ;; 8]]

enter image description here

I will visualize this a little more with the flags.

Grid[{Prepend[Range[8], ""],
  Prepend[
   ImageResize[#, 50] & /@ (#["Flag"] & /@ 
      Transpose[country8tf][[1]]), "TensorFlow"]}, Frame -> All]

enter image description here

It can be easily visualized by continent.

continents = {EntityClass["Country", "Asia"], 
   EntityClass["Country", "NorthAmerica"], 
   EntityClass["Country", "Europe"], 
   EntityClass["Country", "SouthAmerica"], 
   EntityClass["Country", "Oceania"], 
   EntityClass["Country", "Africa"]};
continentcountries = CountryData /@ continents;
classifycontinents[c_, continentcountries_] := Module[{},
  Boole /@ (IntersectingQ[{c}, #] & /@ continentcountries)
  ]

Japan is classified as Asian Continent.

classifycontinents[Entity["Country", "Japan"], continentcountries]

enter image description here

All countries are classified by continent and visualized its distribution by BarChart.

tt = Total[classifycontinents[#, continentcountries] & /@ countries];
continentstf = N[#, 3] & /@ (tt*100/Total[tt]) ; 
Grid[{{BarChart[{continentstf}, ChartLayout -> "Stacked", 
    BarOrigin -> Left, Joined -> True, BarSpacing -> 0.5,
    ChartElementFunction -> "GlassRectangle", ChartStyle -> "Pastel", 
    ChartLabels -> {{"TensorFlow"}, None}, AxesLabel -> "%"],
   SwatchLegend["Pastel", continents]}}]

enter image description here

The similar processes are performed for MXNet, CNTK, Caffe and PyTorch. As of May 2018 it is as follows.

enter image description here

enter image description here

3 Replies

You should add MXNet to the list: this is what the Wolfram Language is using underneath.

I added MXNet. And I checked the current number of stars.

Partition[
  Flatten@{{"TensorFlow", gettotalstars["tensorflow/tensorflow"]},
    {"Caffe", gettotalstars["BVLC/caffe"]},
    {"PyTorch", gettotalstars["pytorch/pytorch"]},
    {"CNTK", gettotalstars["Microsoft/CNTK"]},
    {"MXNet", gettotalstars["apache/incubator-mxnet"]}}, 
  3] // TableForm

enter image description here

enter image description here - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming!

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract