# Automatically-generated timelines

Posted 8 years ago
7472 Views
|
6 Replies
|
19 Total Likes
|
 Inspired by the recent Wolfram Blog post, I created a program that automatically generates timelines based on mentions of notable events in Wikipedia articles. Try it out below: generateAutoTimeline[text_String] := Block[{yearsentences}, yearsentences = Select[TextSentences[text], StringMatchQ[#, RegularExpression[".*?(?:I|i)n (\\d{4}).*?\\."]] &]; TimelinePlot[ Tooltip[DateObject[{ToExpression@ StringCases[#, RegularExpression["\\d{4}"]][[1]]}], #] & /@ yearsentences]] generateAutoTimeline[WikipediaData["Natural language processing"]] 
6 Replies
Sort By:
Posted 3 months ago
 I just recently used this again, such a useful function! I needed to scan a long historic Wikipedia article and focus on dates only. This did all job for me, worked like a charm. Two minor suggestions: to inherit options of TimelinePlot, something like PlotLayout -> "Vertical" would work; to add to Wolfram Function Repository :-) Thanks, Jesse, this idea already saved me a lot of time.
Posted 3 months ago
 -- you have earned Featured Contributor Badge Your exceptional post has been selected for our editorial column Staff Picks http://wolfr.am/StaffPicks and Your Profile is now distinguished by a Featured Contributor Badge and is displayed on the Featured Contributor Board. Thank you!
Posted 8 years ago
 Very cool idea! And it works for topics you might not expect it to be useful for:
Posted 8 years ago
 This is very neat idea Jesse. Do you mean that Select[TextSentences[nlp], StringMatchQ[#, RegularExpression[".*?(?:I|i)n (\\d{4}).*?\\."]] &] is more robust in finding sentences with year dates then given in blog method TextCases[nlp, Containing["Sentence", "Number"]] I am not that familiar with regex. Could you explain briefly how to read RegularExpression[".*?(?:I|i)n (\\d{4}).*?\\."] 
Posted 8 years ago
 I started out with TextCases, but (at least for me) it runs really slowly on even a relatively small text, like the Wikipedia article. I think this is because it has to connect to the cloud to use semantic interpretations of numbers like "two thousand and four." For me, regex is much, much faster and more adaptable.Here's a breakdown of the regex: .*? means "match as many characters as you can, but not any more than necessary." It should work with just ".*"; the lazy quantifier is a holdover from when I was fine-tuning the regex. (?:I|i) means "match either capital I or lowercase i." The "?:" is just a formality, preventing the creation of a capture group. character n character [space] (\\d{4}) means "match four digits." The actual code for a digit is "\d", but it has to be escaped in a Wolfram Language string. *.?** again \\. means "match the character [period]." The period has to be escaped as it's a regex character, as does the slash, since otherwise the Wolfram Language will think I want to insert a special character." I could probably get away with just ".*(I|i)n \d{4}.*" for the regex, but I needed the other parts for previous iterations of the code and never bothered to take them out.A Wolfram Language pattern translation of the above is "I" | "i" ~~ "n" ~~ Repeated[DigitCharacter, {4}].
Posted 8 years ago
 Thanks, Jesse, very instructive !