Computer Analysis of Poetry — Part 1: Metrical Pattern

Posted 2 years ago
11166 Views
|
25 Replies
|
32 Total Likes
|
 This is the first part of an ongoing series. The second part can be found here. Poets pay attention to the natural stresses in words, and sometimes they arrange words so that the stresses form patterns. Typical patterns stress every other syllable (duple meter) or every third syllable (triple meter). Conventions exist to further classify poetic lines according to a unit of two or three syllables, called a foot. I choose not to follow this convention, instead looking at the line of poetry as a continuous pattern. The goal of step 1 is to display the pattern of a line of poetry graphically around the printed syllables .The function below accepts a line of English poetry (or prose) and returns the stress pattern with syllables. It gets the stress information from the "PhoneticForm" property in WordData and the syllabification information from the "Hyphenation" property. Sometimes words are not in WordData, or the database doesn't have phonetic or hyphenation values for the word. Much of the code deals with how to guess at those values when they are missing. Also, 1-syllable words are stressed in the database, but stopwords are usually unstressed in context. So the code demotes single-syllable stopwords from stressed to undetermined. A series of replacement rules attempts to resolve syllables that the program has not yet determined to be stressed or unstressed. analyzeMeter[verse_] := { ipaVowels = {"a?", "a?", "e?", "??", "o?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "a", "æ", "e", "i", "o", "", "ø", "u", "y"}; words = ToLowerCase[TextWords[verse]]; getWordInfo[wd_] := { ipa = WordData[wd, "PhoneticForm"]; str = If[StringQ[ipa], vow = StringCases[ipa, "?" | "?" ... ~~ ipaVowels]; ToExpression[ StringReplace[ vow, {"?" ~~ __ -> "1", "?" ~~ __ -> ".5", __ -> "0"}]], dips = {"ae", "ai", "au", "ay", "ea", "ee", "ei", "eu", "ey", "ie", "oa", "oe", "oi", "oo", "ou", "oy", "ue", "ui", "uy"}; vows = {"a", "e", "i", "o", "u", "y"}; Table[.5, Total[ToExpression[ Characters[ StringReplace[ wd, {StartOfString ~~ "y" -> "0", "e" ~~ EndOfString -> "0", dips -> "1", vows -> "1", _ -> "0"}]]]]]]; hyp = WordData[wd, "Hyphenation"]; fauxSyl = StringPartition[wd, UpTo[Ceiling[StringLength[wd]/Length[str]]]]; syl = If[ListQ[hyp] && Length[hyp] == Length[fauxSyl], hyp, fauxSyl]; {wd, str, syl}}; wordInfo = getWordInfo[#][[1]] & /@ words; stops1IPA = Select[DeleteMissing[ WordData[#, "PhoneticForm"] & /@ WordData[All, "Stopwords"]], StringCount[#, ipaVowels] < 2 &]; wordInfo = wordInfo /. {a_, b_List, c_} /; MemberQ[stops1IPA, WordData[a, "PhoneticForm"]] -> {a, {.5}, c}; wordInfo = wordInfo /. {a_, b_List, c_} /; ! MemberQ[stops1IPA, WordData[a, "PhoneticForm"]] && b == {.5} -> {a, {1}, c}; preMeter = wordInfo[[;; , 2]] // Flatten; meter = preMeter //. { {a___, .5, 1, 1, b___} -> {a, 0, 1, 1, b}, {a___, 1, 1, .5, b___} -> {a, 1, 1, 0, b}, {a___, 1, .5, 1, b___} -> {a, 1, 0, 1, b}, {a___, 0, .5, 0, b___} -> {a, 0, 1, 0, b}, {a___, .5, 1, b___} -> {a, 0, 1, b}, {a___, 1, .5, b___} -> {a, 1, 0, b}, {a___, 0, .5} -> {a, 0, 1}, {.5, 0, b___} -> {1, 0, b}, {a___, .5, 0, 1, 0, 1, b___} -> {a, 1, 0, 1, 0, 1, b}, {a___, .5, 1, 0, 1, 0, b___} -> {a, 0, 1, 0, 1, 0, b}, {a___, .5, 0, 0, 1, 0, 0, 1, b___} -> {a, 1, 0, 0, 1, 0, 0, 1, b}, {a___, 1, 0, 1, 0, .5, b___} -> {a, 1, 0, 1, 0, 1, b}, {a___, 0, 1, 0, 1, .5, b___} -> {a, 0, 1, 0, 1, 0, b}, {a___, 1, 0, 0, 1, 0, 0, .5, b___} -> {a, 1, 0, 0, 1, 0, 0, 1, b}, {a___, .5, .5, .5} -> {a, 0, 1, 0}, {.5, .5, b___} -> {1, 0, b}}; coords = Partition[Riffle[Range[Length[meter]], meter], 2]; syllab = Flatten[wordInfo[[;; , 3]]]; visual = Graphics[{Line[coords], MapIndexed[ Style[Text[#1, {#2[[1]], .5}], 15, FontFamily -> "Times"] &, syllab]}, ImageMargins -> {{10, 10}, {0, 0}}, ImageSize -> 48*Length[meter]] }; analyzeMeter["Once upon a midnight dreary, while I pondered, weak and \ weary,"]  Thanks to Edgar Allan Poe for his poem "The Raven." The zigzag line zigs up for stressed syllables and down for unstressed. The program analyzes this verse without error or deviation from the expected meter. However, poets don't always follow the expected pattern, and the program occasional makes mistakes. Consider the program's output for the entire second stanza of "The Raven."The graphic makes it easy to see deviations from the pattern. In the second line of this stanza, the program mistakenly considers "separate" to have three syllables as if it were a verb. However, when "separate" is used as an adjective, as in "separate dying ember," it only has two syllables. In the third verse, the last syllable of "eagerly" is so weak that the program marks it as unstressed. This is a reasonable and arguably correct way to assess the syllable, though traditionally it should be marked as stressed. The fifth verse also has an anomaly. Poe has added an extra syllable to the line with the word "radiant."As an English teacher, I think this visual gives insight into such subtle poetic notions as elision, secondary stress, and masculine/feminine rhyme. A possible activity is for students to use the program to analyze the prevailing pattern in a stanza of poetry and then explain the variations from that pattern as nuances of the language (as in "eagerly"), deliberate deviations by the poet (as in "radiant"), or mistakes by the program (as in "separate"). "The Raven" follows a duple meter pattern of alternating stressed and unstressed syllables. The program can also handle poems that follow the other major metrical pattern, triple meter. Here are verses from "Evangeline" by Henry Wadsworth Longfellow and "'Twas the Night Before Christmas" by Clement Clarke Moore. One would expect that the program would show free verse and prose as having no recognizable metrical pattern. Let's see. Here are two lines of Walt Whitman's free verse poem "When I Heard the Learn'd Astronomer": And here is a sentence from the Wikipedia article on butterflies. The traditional way to teach meter in poetry is to explain about iambs, trochees, etc. and then have students try to mark lines of poetry with those units. Students, who may be distinguishing stressed syllables for the first time are hard pressed to find metrical feet in a verse. With this program, a student has a starting point to explore, analyze, interpret, and critique. It's like using Wolfram Alpha to understand the graph of a rational function rather than trying to sketch it yourself following the rules the teacher lectured about.I would call the program a work in progress rather than a success. If you experiment with poems of your choice, you'll find that it sometimes fails to resolve a syllable, leaving it stuck halfway between stressed and unstressed. Also, if it misinterprets a syllable, marking it stressed, for instance, when it shouldn't be, the error can spread to neighboring syllables and corrupt the interpretation of the whole line. It works more consistently with duple meter than triple meter.Twice I tried to improve the program with machine learning. I thought that if machine learning could classify the unresolved pattern as either duple meter, triple meter, or neither, then the program could better resolve the undetermined syllables. I was encouraged when it had 99% confidence that lines from "The Raven" were duple meter, but then I realized it was just as certain that any input was duple meter. My second attempt was to make a neural net that accepted a word and returned a likely stress pattern. For instance, I would feed it "Lenore" and it would return {0,1}. I think this should be doable, training it on data from WordData, but I am not strong enough in machine learning to make it happen (yet).I subtitled this "Part 1," which implies that there is more to come. I intend to follow this with a program that makes rhyme visible, including alliteration, assonance, and other sound features loosely associated with rhyme.Thanks for sticking with this to the end,Mark Greenberg
25 Replies
Sort By:
Posted 2 years ago
 - Congratulations! This post is now featured in our Staff Pick column as distinguished by a badge on your profile of a Featured Contributor! Thank you, keep it coming, and consider contributing your work to the The Notebook Archive!
Posted 2 years ago
 Nice post! It would be interesting to do this for Hungarian. It is phonetically spelt, so parsing it easy. Vowel and consonant lengths are clearly marked. Finally, the language lends itself naturally to classical meters such as the hexameter.
Posted 2 years ago
 Thanks, Szabolcs. The code is specific to English in two ways. First, it relies on WordData, which is an English language database. Second, the replacement rules are based on English language features like the preference to avoid three stressed syllables in a row. I don't know anything about Hungarian, but maybe it is possible to rework this code to draw on a database of Hungarian words for the syllable and phonetic information and resolve ambiguous syllables through different replacement rules.
Posted 2 years ago
 Mark, Please consider submitting this to the Wolfram Function Repository (at least a couple of us on the review team would like to see it there). PoeticMeterDiagram was one name that was suggested in some discussion.
Posted 2 years ago
 Hi, Daniel. Yes, now that the Wolfram Summer School is over, I'll turn my energies to writing such a function. My project at Summer School was to improve the code outlined here with machine learning and extend the analysis of poetry to include rhyme. The post is here. The meter function should be ready in a month or so. : )
Posted 2 years ago
 I look forward to receiving it. Thanks.
Posted 2 years ago
 I did submit the function, called ScansionDiagram, about two weeks ago to the Function Repository. It was more robust than what I posted here  more accurate, better graphics, more options. I'm assuming that it was rejected, because I have not seen it appear in the repository. Is there a way I can track the status of a submission and perhaps find out why it was rejected?
Posted 2 years ago
 What would be a reasonable name for the function in the function repository (where all functions share a namespace)?This is indeed a very interesting function, and it would be nice to have it.It is clear now that it is specific to English and the implementation does not generalize to other languages in a straightforward manner (even the concept of WordData doesn't generalize well to an agglutinative language like Hungarian).AnalyzeMeter would be a nice name, but what name would other language implementations use?AnalyzeEnglishMeter is a bit too long and ugly for my taste ...Would it make sense to split out the reusable parts, e.g. have a separate function that can take a line annotated with syllable boundaries and lengths, and visualize it? Then have a language specific function that takes a string and creates an annotated verse from it?(I don't think it's a big deal even if it uses up a general name in this case, just thinking aloud ... as this is a more general issue with the function repo.)
Posted 2 years ago
 It might be called ScansionDiagram, TextMeter, or something else. I think that the words poetry, poem, and verse should not be in the name because it could be used for prose too.
Posted 2 years ago
 This is really nice. Would it be possible to do this with classical Hebrew poetry as well? And should it be transliterated?
Posted 2 years ago
 I know very little about ancient Hebrew poetry. Assuming it has meter, which seems to be a debated question, the approach I use should work. First you need the syllabification and stress information for each word. Then you try to string that together to tell the pattern for the entire line. I have improved the English version in my post here to include machine learning, and that also should be transferable to Hebrew. As for transliteration, no, that does not seem to be a good way to apply my method to Hebrew.
Posted 2 years ago
 Mark,You should have received an email notification, It was by no means rejected. But there was a serious problem: we were unable to get anything like the results you showed in the examples. Our request was that you try to figure out what was the problem, and we included the result we were obtaining.I apologize for this not reaching you. We need to look into our email notification setup to try to diagnose the issue there.
Posted 2 years ago
 Thanks, Daniel, for your response. I looked through my email and did not find one from Wolfram Research about the function. Perhaps it was sent to the wrong address. Mine is mgreenberg1520@gmail.com. I would like to see the results you were getting so I can diagnose the problem.
Posted 3 months ago
 I enjoyed the post.
Posted 3 months ago
 Yes, my code uses 1 and 0 for stressed and unstressed syllables. You show 2 and 1, which amounts to the same thing.I'm not sure that I agree entirely with the statement that shorter code indicates a more proficient programmer. In this case, the code needs to display the metrical pattern consistently for any given input, so how rarely it makes mistakes is also a measure of code quality. Other considerations would be readability of the diagram, readability of the code itself, and maintainability of the program.I see that your comment refers to an article about meter in Arabic verse. It is interesting that Arabic verse can also be analyzed systematically. It is difficult to compare the metrical systems of languages. Latin and English, for instance, share a lot of vocabulary and a common Indo-Eurpoean origin, but their metrical systems are vastly different. Writing a program that determines the meter of verse also will be influenced by how closely the written version of the language matches the phonetics. In English, the spelling is notoriously not aligned with the phonetics. In Spanish, Italian, Latin, and Hebrew (the few languages I know about well enough to make this statement), the phonetics, including the stresses, can be mostly determined by the way a word is written.I am currently working on code to update and improve the program shown in this post. I hope to be done with it by June, fingers crossed.
Posted 3 months ago
 Thanks for your quick response I feel that we are on the verge of an interesting dialogue Yes, programming is based on 1 and 0.T he question remains: why don't you plot the graph directly using these 1 and o?Thank you for your tolerance. I was banned 0n/from an English forum for presenting the idea and comparison with Arabic. http://www.everypoet.org/pffa/showthread.php?16003-Does-it-work-in-English&s
Posted 3 months ago
 Khashan, the graphic produced by my code is a direct plot of the 1 and 0 values over time, wrapping at the end of each poetic line and aligned on the line's last stressed syllable. Then I put the orthographic representation of each syllable in the graphic so we humans can follow along. (In the new version I'm working on, I also put a tooltip of the IPA representation of each syllable.)The Wolfram Language gives phonetic information on many English words with WordData[word, "PhoneticForm"]. This includes primary and secondary stress markers. Typical output of such an expressions is fənˈɛtɪks. This is a starting point for what I'm doing, but for many reasons it isn't enough to figure out the stress patterns in poetry. As you mentioned, word boundaries are not a consideration in determining metrical patterns. I would argue that the Greek feet should also not be considered. A major obstacle for me has been how to consistently divide up the syllables. Another has been what to do with undetermined syllables that can be either stressed or unstressed, depending on context (or sometimes depending on how the reader chooses to read the passage).From what I can tell, there are many similarities from your approach in Arabic and mine in English. I have even used some of the same notation CVC, etc. As long as we keep the discussion relevant to programming in the Wolfram Language, I imagine that a rich conversation can be sustained on this topic.I don't think that you have to worry about being kicked off this forum for giving Arabic examples. This is a very diverse community. And last time I checked, my friend Ahmed was the moderator. : )
Posted 3 months ago
 Please follow and correct me. I started with the impression that you reached the graph by a program based on the binary numbers 1 and 0 then that program produced the graph. I understand now that you used wolfram language to produce an already traditionally plotted graph using 1 and 0 for the y-axis to prove the efficiency of Wolfram Language. Am I correct?
Posted 3 months ago
 English syllables are stressed over a continuum from totally unstressed to undeniably stressed. In speech, the continuum of stresses become a nuance of pronunciation and musical intonation patterns. Our sense of rhythm in poetry though seems to rely on just two kinds of syllables: stressed and unstressed. Therefore, I try to boil every syllable down to either stressed (1) or unstressed (0). I use many factors to determine this, including the phonetic representation provided by WordData, whether there is a stop punctuation or end of line after the syllable, the consonant and vowel patterns, and some replacement rules that avoid too many consecutive stressed or unstressed syllables. I have also tried predicting what kind of rhythm it is, duple-meter or triple-meter, to resolve the remaining stresses, though my most recent version doesn't do this.After that, I simply graph the 1's and 0's. In some ways what I'm doing is very traditional; in some ways not. I use Graphics[] to draw the output on a grid, but the lines are very much like an x-y plot of the stress values over time.
Posted 3 months ago
 Was the graph plotting achieved by two methods: traditional and Wolfram computational language?
Posted 3 months ago
 The graph plotting was done in this manner: create a grid on the coordinate plane {{{1,1}, {1,2}, {1,3}...}, {{2,1}, {2,2}, {2,3}}...} draw a line to the coordinate + {0, .5} for stressed syllables and + {0, -.5} for unstressed This is a very simple thing to do in the Wolfram Language.It is traditional in the sense that meter is traditionally thought of as a series of stressed and unstressed syllables, forming so many feet, in a predictable pattern. I don't recognize the foot as a unit though.
Posted 3 months ago
 The way you plotted is what I call the traditional way. As in A in the attached shape. If this is the case, what is the need for the Wolfram language or where does it fit as far as the graph is concerned?I am just trying to understand, because I expect this language will be of great use in discovering analogy or relation in many fields like comparative prosody and Architecture, poetry and music. So far I have been doing that by traditional plotting. I will notify you once I publish a new subject about this. B is an introduction to verse and architecture.
Posted 3 months ago
 @Khashan Khashan your images and ideas are interesting. It would be great if you could post you Wolfram Language code in these posts here, like Mark did and everyone else does on this form, without referencing to external documents. Then it would be very fun to reproduce and explore your results. I would love to see your code. You can also embed your notebook into the post or attach it to the post. Thank you for sharing ideas.