Message Boards Message Boards

Parsing of Wikipedia 'Infobox' block with WikipediaData[]?

Posted 8 years ago

I have been trying to extract and parse the 'Infobox' part of Wikipedia articles.

Firstly, I tried the Mathematica tools WikipediaData and WikipediaSearch. Initially, I thought that WikipediaData would extract directly the 'Infobox' block. In fact, I do not understand very well how these two functions work, mainly WikipediaData.

Then, I looked for any tailormade solution in StackExchange; looks like some attempts have been made but not with Mathematica (Python, JavaScript, …).

Finally, I wrote a non syntax oriented and rough brute force solution 'fpars' just to extract the Infobox block from the rest of the article ("SummaryWikicode"). My ‘solution’ fpars produces for

infobox= fpars@WikipediaData[ "Frederic Bastiat", "SummaryWikicode"];

the following output

"{{Infobox economist | name = Frédéric Bastiat | school_tradition = [[Classical liberalism]] <!-- not supported: | color = green -->| color = green --> | image = Bastiat.jpg | caption = | birth_name = Claude-Frédéric Bastiat | birth_date = {{Birth date|1801|06|29|df=y}} | birth_place = [[Bayonne]], [[French First Republic|France]] | death_date = {{Death date and age|1850|12|24|1801|06|29|df=y}} | death_place = [[Rome]], [[Papal States]] | nationality = French | religion = [[Roman Catholicism]] | contributions = [[The Law (book)|The Law]] (''La Loi'') | influences = [[Richard Cobden]], [[Adam Smith]], \ [[Jean-Baptiste Say]], [[Charles Comte]], [[Charles Dunoyer]] | influenced = [[Arthur Latham Perry]], [[Gustave de \ Molinari]], [[Ludwig von Mises]], [[Henry Hazlitt]], [[Ron Paul]], \ [[Rand Paul]], [[Thomas Sowell]], [[Mark Spitznagel]], [[Walter E. \ Williams]] }}"

and for

infobox = fp@WikipediaData[ "Adam Smith", "SummaryWikicode"];

The output is

"{{Infobox philosopher | region = [[Western philosophy]] | image = Adam Smith The Muir portrait.jpg | alt = A sketch of Adam Smith facing to the right | name = Adam Smith | honorific_suffix = {{post-nominals|country=GBR|FRSA|size=100%}} | signature = Adam Smith signature 1783.svg | birth_date = 16 June 1723 [[New Style|NS]]
<small>(5 June \ 1723 [[Old Style|OS]])</small> | birth_place = [[Kirkcaldy]], [[Fife]], [[Scotland]] | death_date = {{death date and \ age|df=yes|1790|07|17|1723|06|16}} | death_place = [[Edinburgh]], Scotland | school_tradition = [[Classical economics]] | alma_mater = [[University of Glasgow]]
[[Balliol \ College, Oxford]] | institution = [[University of Glasgow]] | nationality = Scottish<ref>{{cite web|title=Adam \ Smith|url=http://www.econlib.org/library/Enc/bios/Smith.html|website=\ The Concise Encyclopedia of Economics|publisher=Liberty Fund, \ Inc.|accessdate=29 July 2015}}</ref> | notable_works = ''[[The Wealth of Nations]]'', ''[[The Theory of \ Moral Sentiments]]'' | main_interests = [[Political philosophy]], ethics, economics | influences = [[Aristotle]]{{[CenterDot]}} [[David \ Hume|Hume]]{{[CenterDot]}} [[Francis Hutcheson \ (philosopher)|Hutcheson]]{{[CenterDot]}} [[Bernard de \ Mandeville|Mandeville]]{{[CenterDot]}} [[François \ Quesnay|Quesnay]]{{[CenterDot]}}[[Jean-Jacques Rousseau|Rousseau]]{{\ [CenterDot]}} [[John Locke|Locke]]{{[CenterDot]}}[[Edmund \ Burke|Burke]]{{[CenterDot]}}[[Voltaire]] | influenced = [[Frédéric Bastiat|Bastiat]]{{[CenterDot]}} \ [[Milton Friedman|Friedman]]{{[CenterDot]}} [[Friedrich von \ Hayek|Hayek]]{{[CenterDot]}} [[Ludwig Von \ Mises|Mises]]{{[CenterDot]}} [[Murray Rothbard|Rothbard]]{{\ [CenterDot]}} [[Ayn Rand|Rand]]{{[CenterDot]}} [[Paul \ Krugman|Krugman]]{{[CenterDot]}} [[Thomas Sowell|Sowell]]{{\ [CenterDot]}} [[Georg Wilhelm Friedrich \ Hegel|Hegel]]{{[CenterDot]}} [[Thomas Hodgskin|Hodgskin]]{{\ [CenterDot]}} [[John Maynard Keynes|Keynes]]{{[CenterDot]}} \ [[Thomas Malthus|Malthus]]{{[CenterDot]}} [[Karl Marx|Marx]] {{\ [CenterDot]}} [[John Stuart Mill|Mill]]{{[CenterDot]}} [[David \ Ricardo|Ricardo]]{{[CenterDot]}} [[Henri de \ Saint-Simon|Saint-Simon]]{{[CenterDot]}} [[Jean-Baptiste Say|Say]]{{\ [CenterDot]}} [[Founding Fathers of the United States|US Founding \ Fathers]]{{[CenterDot]}} [[Noam Chomsky|Chomsky]] | notable_ideas = [[Classical economics]],
modern [[free \ market]],
[[division of labour]],
the \"[[invisible hand]]\" }}"

However it does not work at all for

fpars@WikipediaData[ "John Maynard Keynes", "SummaryWikicode"]

The reason, of course is that the sintax of the Infobox block is unavailable and I wrote the parser by mere inspection of the WikipediaData ouput for simple cases as

WikipediaData[ "Adam Smith", "SummaryWikicode"]
WikipediaData[ "Frederic Bastiat", "SummaryWikicode"]

and others.

In the Help for WikipediaData I did not find whether this extraction can be done with this command; so my question is whether I am missing something in the documentation of WikipediaData or, alternatively, whether there is any piece of software or utility written in Mathematica for this purpose.

In the documentation of Wikipedia itself, there is no clue of something that could be properly called a syntax for 'Infobox'; at most a sketchy template.

Thanks.

POSTED BY: E Martin
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract