Group Abstract Group Abstract

Message Boards Message Boards

[?] Count syllables in speeches?

Posted 6 years ago

wrt text analysis, want to count syllables in speeches.

saw WordData's hyphenation option, but the elementary task of applying it to my list is just out of my reach (as I just started to work with text analysis and the Language.)

thanks

POSTED BY: Jay Weininger
5 Replies
Posted 6 years ago

It looks like you are satisfied with the answers so far. I continued to work on the problem though to come up with the best possible (non-machine-learning) solution. The first line takes a list of words or just run as is.

wds = RandomWord[10];
ipaVs = ipaVowels = {"a?", "a?", "e?", "??", "o?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?","?", "?", "?", "?", "?", "?", "?", "a", "æ", "e", "i", "o", "œ", "ø", "u", "y"};
dip = {"ai", "au", "ay", "ea", "ee", "ei", "eu", "ey", "ie", "io", "oa", "oe", "oi", "oo", "ou", "oy", "ua", "ue", "ui", "uy"};
vow = {"a", "e", "i", "o", "u", "y"};
hyp[wd_] := (h = WordData[wd, "Hyphenation"] /. Missing[_] -> {}; {h, Length[h]});
ipa[wd_] := (p = WordData[wd, "PhoneticForm"] /. Missing[_] -> "X"; {p, StringCount[p, ipaVs]});
reg[wd_] := {wd, 
   Total@StringCases[wd,
     {"e" ~~ EndOfString -> Nothing, "-" -> Nothing, "qu" -> 0, "eness" -> 1, "ement" -> 1,
     {"p", "b", "c", "d", "f", "g", "k", "s", "t", "z"} ~~ "le" ~~EndOfString -> 1, 
     {"d", "t"} ~~ "ed" ~~ EndOfString -> 1, "ed" ~~ EndOfString -> 0, dip -> 1,
     vow -> 1, Except[vow] .. -> 0}]};
calcScr[cts_] := (
   If[cts[[1]] == cts[[2]] > 0, Return[{cts[[1]], "very high"}]];
   If[cts[[1]] == 1 && cts[[2]] == 2, Return[{2, "high"}]];
   If[cts[[1]] == 0 && cts[[2]] == cts[[3]], Return[{cts[[2]], "very high"}]];
   If[cts[[1]] > 0 && cts[[2]] != cts[[1]], Return[{cts[[1]], "high"}]];
   If[cts[[1]] == 0 && cts[[2]] == 0, Return[{cts[[3]], "medium"}]];
   Return[{cts[[2]], "high"}]);
cts = {#, hyp[#][[2]], ipa[#][[2]], reg[#][[2]]} & /@ ToLowerCase[wds];
scr = ({#[[1]], calcScr[#[[2 ;; 4]]]} &) /@ cts;
dts = Dataset[
  Association[#[[1]] -> <|"syllables" -> #[[2, 1]],  "confidence" -> #[[2, 2]]|> & /@ scr]]

Output is in the form of a dataset, but you can change that to your needs:

enter image description here

I have tested this on about 500 random words and only found 3 wrongly assessed words: nationalism, socialism, and nth. Though I think it is probably overkill for your question, it was a fun side project. Thanks. : )

POSTED BY: Mark Greenberg

thank you

worked on determining syllables in English famous speeches, for the staff in that department

realized need for ; in many statements so as not to produce unneeded output

POSTED BY: Jay Weininger

There is a nice work on this subject in "Un Divisor Silábico (Spanish)" from the Wolfram Demonstrations Project  http://demonstrations.wolfram.com/UnDivisorSilabicoSpanish/ Contributed by: Jaime Rangel-Mondragon, which has the drawback of being written in Spanish. He has devised a number of clever rules for syllabic decomposition of words in Spanish which work extremely well. He doesn't go farther, but I did some work (unpublished) to apply his rules to longer texts and obtain interesting (IMHO) statistics. For example, the total number of different syllables for the 86,016 words which make up the Spanish language dictionary used by Wolfram is 3,707 (including repetitions due to the presence of accents, proper names, and some words originated in foreign languages). Unfortunately, there was no hectic or even mildly enthusiastic response from specialists in the field, so I didn't pursue the matter further.

POSTED BY: Tomas Garza

thank you

I now have a list of {nation, 3, future, 2}

I just need to add the results, so looking in documentation.

POSTED BY: Jay Weininger
Posted 6 years ago
POSTED BY: Mark Greenberg
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard