Message Boards Message Boards

GROUPS:

Computer Analysis of Poetry — Part 1: Metrical Pattern

Posted 2 months ago
2586 Views
|
10 Replies
|
24 Total Likes
|

Poets pay attention to the natural stresses in words, and sometimes they arrange words so that the stresses form patterns. Typical patterns stress every other syllable (duple meter) or every third syllable (triple meter). Conventions exist to further classify poetic lines according to a unit of two or three syllables, called a foot. I choose not to follow this convention, instead looking at the line of poetry as a continuous pattern. The goal of step 1 is to display the pattern of a line of poetry graphically around the printed syllables .

The function below accepts a line of English poetry (or prose) and returns the stress pattern with syllables. It gets the stress information from the "PhoneticForm" property in WordData and the syllabification information from the "Hyphenation" property. Sometimes words are not in WordData, or the database doesn't have phonetic or hyphenation values for the word. Much of the code deals with how to guess at those values when they are missing. Also, 1-syllable words are stressed in the database, but stopwords are usually unstressed in context. So the code demotes single-syllable stopwords from stressed to undetermined. A series of replacement rules attempts to resolve syllables that the program has not yet determined to be stressed or unstressed.

analyzeMeter[verse_] := {
   ipaVowels = {"aɪ", "aʊ", "eɪ", "ɔɪ", "oʊ", "ɐ", "ɑ", "ɒ", "ɔ", "ɘ",
      "ə", "ɛ", "ɜ", "ɝ", "ɞ", "ɤ", "ɨ", "ɪ", "ɯ", "ɵ", "ɶ", "ʉ", "ʊ",
      "ʌ", "ʏ", "a", "æ", "e", "i", "o", "œ", "ø", "u", "y"};
   words = ToLowerCase[TextWords[verse]];
   getWordInfo[wd_] := {
     ipa = WordData[wd, "PhoneticForm"];
     str = If[StringQ[ipa],
       vow = StringCases[ipa, "ˈ" | "ˌ" ... ~~ ipaVowels];
       ToExpression[
        StringReplace[
         vow, {"ˈ" ~~ __ -> "1", "ˌ" ~~ __ -> ".5", __ -> "0"}]],
       dips = {"ae", "ai", "au", "ay", "ea", "ee", "ei", "eu", "ey", 
         "ie", "oa", "oe", "oi", "oo", "ou", "oy", "ue", "ui", "uy"};
       vows = {"a", "e", "i", "o", "u", "y"};
       Table[.5, 
        Total[ToExpression[
          Characters[
           StringReplace[
            wd, {StartOfString ~~ "y" -> "0", 
             "e" ~~ EndOfString -> "0", dips -> "1", 
             vows -> "1", _ -> "0"}]]]]]];
     hyp = WordData[wd, "Hyphenation"];
     fauxSyl = 
      StringPartition[wd, UpTo[Ceiling[StringLength[wd]/Length[str]]]];
     syl = 
      If[ListQ[hyp] && Length[hyp] == Length[fauxSyl], hyp, fauxSyl];
     {wd, str, syl}};
   wordInfo = getWordInfo[#][[1]] & /@ words;
   stops1IPA = 
    Select[DeleteMissing[
      WordData[#, "PhoneticForm"] & /@ WordData[All, "Stopwords"]], 
     StringCount[#, ipaVowels] < 2 &];
   wordInfo = 
    wordInfo /. {a_, b_List, c_} /; 
       MemberQ[stops1IPA, WordData[a, "PhoneticForm"]] -> {a, {.5}, c};
   wordInfo = 
    wordInfo /. {a_, b_List, 
        c_} /; ! MemberQ[stops1IPA, WordData[a, "PhoneticForm"]] && 
        b == {.5} -> {a, {1}, c};
   preMeter = wordInfo[[;; , 2]] // Flatten;
   meter =
    preMeter //. {
      {a___, .5, 1, 1, b___} -> {a, 0, 1, 1, b},
      {a___, 1, 1, .5, b___} -> {a, 1, 1, 0, b},
      {a___, 1, .5, 1, b___} -> {a, 1, 0, 1, b},
      {a___, 0, .5, 0, b___} -> {a, 0, 1, 0, b},
      {a___, .5, 1, b___} -> {a, 0, 1, b},
      {a___, 1, .5, b___} -> {a, 1, 0, b},
      {a___, 0, .5} -> {a, 0, 1},
      {.5, 0, b___} -> {1, 0, b},
      {a___, .5, 0, 1, 0, 1, b___} -> {a, 1, 0, 1, 0, 1, b},
      {a___, .5, 1, 0, 1, 0, b___} -> {a, 0, 1, 0, 1, 0, b},
      {a___, .5, 0, 0, 1, 0, 0, 1, b___} -> {a, 1, 0, 0, 1, 0, 0, 1, 
        b},
      {a___, 1, 0, 1, 0, .5, b___} -> {a, 1, 0, 1, 0, 1, b},
      {a___, 0, 1, 0, 1, .5, b___} -> {a, 0, 1, 0, 1, 0, b},
      {a___, 1, 0, 0, 1, 0, 0, .5, b___} -> {a, 1, 0, 0, 1, 0, 0, 1, 
        b},
      {a___, .5, .5, .5} -> {a, 0, 1, 0},
      {.5, .5, b___} -> {1, 0, b}};
   coords = Partition[Riffle[Range[Length[meter]], meter], 2];
   syllab = Flatten[wordInfo[[;; , 3]]];
   visual = 
    Graphics[{Line[coords], 
      MapIndexed[
       Style[Text[#1, {#2[[1]], .5}], 15, FontFamily -> "Times"] &, 
       syllab]}, ImageMargins -> {{10, 10}, {0, 0}}, 
     ImageSize -> 48*Length[meter]]
   };
analyzeMeter["Once upon a midnight dreary, while I pondered, weak and \
weary,"]

enter image description here Thanks to Edgar Allan Poe for his poem "The Raven." The zigzag line zigs up for stressed syllables and down for unstressed. The program analyzes this verse without error or deviation from the expected meter. However, poets don't always follow the expected pattern, and the program occasional makes mistakes. Consider the program's output for the entire second stanza of "The Raven."

enter image description here

The graphic makes it easy to see deviations from the pattern. In the second line of this stanza, the program mistakenly considers "separate" to have three syllables as if it were a verb. However, when "separate" is used as an adjective, as in "separate dying ember," it only has two syllables. In the third verse, the last syllable of "eagerly" is so weak that the program marks it as unstressed. This is a reasonable and arguably correct way to assess the syllable, though traditionally it should be marked as stressed. The fifth verse also has an anomaly. Poe has added an extra syllable to the line with the word "radiant."

As an English teacher, I think this visual gives insight into such subtle poetic notions as elision, secondary stress, and masculine/feminine rhyme. A possible activity is for students to use the program to analyze the prevailing pattern in a stanza of poetry and then explain the variations from that pattern as nuances of the language (as in "eagerly"), deliberate deviations by the poet (as in "radiant"), or mistakes by the program (as in "separate").

"The Raven" follows a duple meter pattern of alternating stressed and unstressed syllables. The program can also handle poems that follow the other major metrical pattern, triple meter. Here are verses from "Evangeline" by Henry Wadsworth Longfellow and "'Twas the Night Before Christmas" by Clement Clarke Moore. enter image description here

One would expect that the program would show free verse and prose as having no recognizable metrical pattern. Let's see. Here are two lines of Walt Whitman's free verse poem "When I Heard the Learn'd Astronomer": enter image description here

And here is a sentence from the Wikipedia article on butterflies. enter image description here

The traditional way to teach meter in poetry is to explain about iambs, trochees, etc. and then have students try to mark lines of poetry with those units. Students, who may be distinguishing stressed syllables for the first time are hard pressed to find metrical feet in a verse. With this program, a student has a starting point to explore, analyze, interpret, and critique. It's like using Wolfram Alpha to understand the graph of a rational function rather than trying to sketch it yourself following the rules the teacher lectured about.

I would call the program a work in progress rather than a success. If you experiment with poems of your choice, you'll find that it sometimes fails to resolve a syllable, leaving it stuck halfway between stressed and unstressed. Also, if it misinterprets a syllable, marking it stressed, for instance, when it shouldn't be, the error can spread to neighboring syllables and corrupt the interpretation of the whole line. It works more consistently with duple meter than triple meter.

Twice I tried to improve the program with machine learning. I thought that if machine learning could classify the unresolved pattern as either duple meter, triple meter, or neither, then the program could better resolve the undetermined syllables. I was encouraged when it had 99% confidence that lines from "The Raven" were duple meter, but then I realized it was just as certain that any input was duple meter. My second attempt was to make a neural net that accepted a word and returned a likely stress pattern. For instance, I would feed it "Lenore" and it would return {0,1}. I think this should be doable, training it on data from WordData, but I am not strong enough in machine learning to make it happen (yet).

I subtitled this "Part 1," which implies that there is more to come. I intend to follow this with a program that makes rhyme visible, including alliteration, assonance, and other sound features loosely associated with rhyme.

Thanks for sticking with this to the end,

Mark Greenberg

10 Replies

enter image description here - Congratulations! This post is now featured in our Staff Pick column as distinguished by a badge on your profile of a Featured Contributor! Thank you, keep it coming, and consider contributing your work to the The Notebook Archive!

Nice post! It would be interesting to do this for Hungarian. It is phonetically spelt, so parsing it easy. Vowel and consonant lengths are clearly marked. Finally, the language lends itself naturally to classical meters such as the hexameter.

Thanks, Szabolcs. The code is specific to English in two ways. First, it relies on WordData, which is an English language database. Second, the replacement rules are based on English language features like the preference to avoid three stressed syllables in a row. I don't know anything about Hungarian, but maybe it is possible to rework this code to draw on a database of Hungarian words for the syllable and phonetic information and resolve ambiguous syllables through different replacement rules.

Mark, Please consider submitting this to the Wolfram Function Repository (at least a couple of us on the review team would like to see it there). PoeticMeterDiagram was one name that was suggested in some discussion.

Hi, Daniel. Yes, now that the Wolfram Summer School is over, I'll turn my energies to writing such a function. My project at Summer School was to improve the code outlined here with machine learning and extend the analysis of poetry to include rhyme. The post is here. The meter function should be ready in a month or so. : )

I look forward to receiving it. Thanks.

What would be a reasonable name for the function in the function repository (where all functions share a namespace)?

This is indeed a very interesting function, and it would be nice to have it.

It is clear now that it is specific to English and the implementation does not generalize to other languages in a straightforward manner (even the concept of WordData doesn't generalize well to an agglutinative language like Hungarian).

AnalyzeMeter would be a nice name, but what name would other language implementations use?

AnalyzeEnglishMeter is a bit too long and ugly for my taste ...

Would it make sense to split out the reusable parts, e.g. have a separate function that can take a line annotated with syllable boundaries and lengths, and visualize it? Then have a language specific function that takes a string and creates an annotated verse from it?

(I don't think it's a big deal even if it uses up a general name in this case, just thinking aloud ... as this is a more general issue with the function repo.)

It might be called ScansionDiagram, TextMeter, or something else. I think that the words poetry, poem, and verse should not be in the name because it could be used for prose too.

This is really nice. Would it be possible to do this with classical Hebrew poetry as well? And should it be transliterated?

I know very little about ancient Hebrew poetry. Assuming it has meter, which seems to be a debated question, the approach I use should work. First you need the syllabification and stress information for each word. Then you try to string that together to tell the pattern for the entire line. I have improved the English version in my post here to include machine learning, and that also should be transferable to Hebrew. As for transliteration, no, that does not seem to be a good way to apply my method to Hebrew.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract