Group Abstract

Message Boards

WOLFRAM COMMUNITY

13K Views

6 Replies

19 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Automatically-generated timelines

Jesse Friedman

Jesse Friedman, Wolfram Research

Posted 10 years ago

Inspired by the recent Wolfram Blog post, I created a program that automatically generates timelines based on mentions of notable events in Wikipedia articles. Try it out below: generateAutoTimeline[text_String] := Block[{yearsentences}, yearsentences = Select[TextSentences[text], StringMatchQ[#, RegularExpression[".?(?:I\|i)n (\\d{4}).?\\."]] &]; TimelinePlot[ Tooltip[DateObject[{ToExpression@ StringCases[#, RegularExpression["\\d{4}"]][[1]]}], #] & /@ yearsentences]] generateAutoTimeline[WikipediaData["Natural language processing"]]

enter image description here

Inspired by the recent Wolfram Blog post, I created a program that automatically generates timelines based on mentions of notable events in Wikipedia articles. Try it out below:

generateAutoTimeline[text_String] := 
 Block[{yearsentences}, 
  yearsentences = 
   Select[TextSentences[text], 
    StringMatchQ[#, 
      RegularExpression[".*?(?:I|i)n (\\d{4}).*?\\."]] &]; 
  TimelinePlot[
   Tooltip[DateObject[{ToExpression@
         StringCases[#, RegularExpression["\\d{4}"]][[1]]}], #] & /@ 
    yearsentences]]

generateAutoTimeline[WikipediaData["Natural language processing"]]

Timeline of notable events in natural language processing

POSTED BY: Jesse Friedman

6 Replies

Sort By:

Vitaliy Kaurov

Vitaliy Kaurov, WOLFRAM Research

Posted 2 years ago

I just recently used this again, such a useful function! I needed to scan a long historic Wikipedia article and focus on dates only. This did all job for me, worked like a charm. Two minor suggestions: to inherit options of `TimelinePlot`, something like `PlotLayout -> "Vertical"` would work; to add to Wolfram Function Repository :-) Thanks, Jesse, this idea already saved me a lot of time.

POSTED BY: Vitaliy Kaurov

EDITORIAL BOARD

EDITORIAL BOARD, WOLFRAM

Posted 2 years ago

-- you have earned *Featured Contributor Badge* Your exceptional post has been selected for our editorial column *Staff Picks* http://wolfr.am/StaffPicks and Your Profile is now distinguished by a *Featured Contributor Badge* and is displayed on the Featured Contributor Board. Thank you!

POSTED BY: EDITORIAL BOARD

Christopher Carlson

Christopher Carlson, Wolfram Research

Posted 10 years ago

Very cool idea! And it works for topics you might not expect it to be useful for:

POSTED BY: Christopher Carlson

Sam Carrettie

Sam Carrettie, Freelancer

Posted 10 years ago

This is very neat idea Jesse. Do you mean that Select[TextSentences[nlp], StringMatchQ[#, RegularExpression[".?(?:I\|i)n (\\d{4}).?\\."]] &] is more robust in finding sentences with year dates then given in blog method TextCases[nlp, Containing["Sentence", "Number"]] I am not that familiar with regex. Could you explain briefly how to read RegularExpression[".?(?:I\|i)n (\\d{4}).?\\."]

POSTED BY: Sam Carrettie

Jesse Friedman

Jesse Friedman, Wolfram Research

Posted 10 years ago

I started out with TextCases, but (at least for me) it runs really slowly on even a relatively small text, like the Wikipedia article. I think this is because it has to connect to the cloud to use semantic interpretations of numbers like "two thousand and four." For me, regex is much, much faster and more adaptable. Here's a breakdown of the regex: *.?** means "match as many characters as you can, but not any more than necessary." It should work with just "."; the lazy quantifier is a holdover from when I was fine-tuning the regex. (?:I\|i)* means "match either capital I or lowercase i." The "?:" is just a formality, preventing the creation of a capture group. character n character [space] (\\d{4}) means "match four digits." The actual code for a digit is "\d", but it has to be escaped in a Wolfram Language string. .?* again \\. means "match the character [period]." The period has to be escaped as it's a regex character, as does the slash, since otherwise the Wolfram Language will think I want to insert a special character." I could probably get away with just ".(I\|i)n \d{4}." for the regex, but I needed the other parts for previous iterations of the code and never bothered to take them out. A Wolfram Language pattern translation of the above is "I" \| "i" ~~ "n" ~~ Repeated[DigitCharacter, {4}].

POSTED BY: Jesse Friedman

Vitaliy Kaurov

Vitaliy Kaurov, WOLFRAM Research

Posted 10 years ago

Thanks, Jesse, very instructive !

POSTED BY: Vitaliy Kaurov

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback