Group Abstract

Message Boards

WOLFRAM COMMUNITY

9.6K Views

5 Replies

2 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

[?] Count syllables in speeches?

Jay Weininger

Jay Weininger, Santa Fe College

Posted 6 years ago

wrt text analysis, want to count syllables in speeches. saw WordData's hyphenation option, but the elementary task of applying it to my list is just out of my reach (as I just started to work with text analysis and the Language.) thanks

POSTED BY: Jay Weininger

5 Replies

Sort By:

Mark Greenberg

Posted 6 years ago

It looks like you are satisfied with the answers so far. I continued to work on the problem though to come up with the best possible (non-machine-learning) solution. The first line takes a list of words or just run as is. wds = RandomWord[10]; ipaVs = ipaVowels = {"a?", "a?", "e?", "??", "o?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?","?", "?", "?", "?", "?", "?", "?", "a", "æ", "e", "i", "o", "", "ø", "u", "y"}; dip = {"ai", "au", "ay", "ea", "ee", "ei", "eu", "ey", "ie", "io", "oa", "oe", "oi", "oo", "ou", "oy", "ua", "ue", "ui", "uy"}; vow = {"a", "e", "i", "o", "u", "y"}; hyp[wd_] := (h = WordData[wd, "Hyphenation"] /. Missing[_] -> {}; {h, Length[h]}); ipa[wd_] := (p = WordData[wd, "PhoneticForm"] /. Missing[_] -> "X"; {p, StringCount[p, ipaVs]}); reg[wd_] := {wd, Total@StringCases[wd, {"e" ~~ EndOfString -> Nothing, "-" -> Nothing, "qu" -> 0, "eness" -> 1, "ement" -> 1, {"p", "b", "c", "d", "f", "g", "k", "s", "t", "z"} ~~ "le" ~~EndOfString -> 1, {"d", "t"} ~~ "ed" ~~ EndOfString -> 1, "ed" ~~ EndOfString -> 0, dip -> 1, vow -> 1, Except[vow] .. -> 0}]}; calcScr[cts_] := ( If[cts[[1]] == cts[[2]] > 0, Return[{cts[[1]], "very high"}]]; If[cts[[1]] == 1 && cts[[2]] == 2, Return[{2, "high"}]]; If[cts[[1]] == 0 && cts[[2]] == cts[[3]], Return[{cts[[2]], "very high"}]]; If[cts[[1]] > 0 && cts[[2]] != cts[[1]], Return[{cts[[1]], "high"}]]; If[cts[[1]] == 0 && cts[[2]] == 0, Return[{cts[[3]], "medium"}]]; Return[{cts[[2]], "high"}]); cts = {#, hyp[#][[2]], ipa[#][[2]], reg[#][[2]]} & /@ ToLowerCase[wds]; scr = ({#[[1]], calcScr[#[[2 ;; 4]]]} &) /@ cts; dts = Dataset[ Association[#[[1]] -> <\|"syllables" -> #[[2, 1]], "confidence" -> #[[2, 2]]\|> & /@ scr]] Output is in the form of a dataset, but you can change that to your needs: I have tested this on about 500 random words and only found 3 wrongly assessed words: nationalism, socialism, and nth. Though I think it is probably overkill for your question, it was a fun side project. Thanks. : )

It looks like you are satisfied with the answers so far. I continued to work on the problem though to come up with the best possible (non-machine-learning) solution. The first line takes a list of words or just run as is.

wds = RandomWord[10];
ipaVs = ipaVowels = {"a?", "a?", "e?", "??", "o?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?","?", "?", "?", "?", "?", "?", "?", "a", "æ", "e", "i", "o", "", "ø", "u", "y"};
dip = {"ai", "au", "ay", "ea", "ee", "ei", "eu", "ey", "ie", "io", "oa", "oe", "oi", "oo", "ou", "oy", "ua", "ue", "ui", "uy"};
vow = {"a", "e", "i", "o", "u", "y"};
hyp[wd_] := (h = WordData[wd, "Hyphenation"] /. Missing[_] -> {}; {h, Length[h]});
ipa[wd_] := (p = WordData[wd, "PhoneticForm"] /. Missing[_] -> "X"; {p, StringCount[p, ipaVs]});
reg[wd_] := {wd, 
   Total@StringCases[wd,
     {"e" ~~ EndOfString -> Nothing, "-" -> Nothing, "qu" -> 0, "eness" -> 1, "ement" -> 1,
     {"p", "b", "c", "d", "f", "g", "k", "s", "t", "z"} ~~ "le" ~~EndOfString -> 1, 
     {"d", "t"} ~~ "ed" ~~ EndOfString -> 1, "ed" ~~ EndOfString -> 0, dip -> 1,
     vow -> 1, Except[vow] .. -> 0}]};
calcScr[cts_] := (
   If[cts[[1]] == cts[[2]] > 0, Return[{cts[[1]], "very high"}]];
   If[cts[[1]] == 1 && cts[[2]] == 2, Return[{2, "high"}]];
   If[cts[[1]] == 0 && cts[[2]] == cts[[3]], Return[{cts[[2]], "very high"}]];
   If[cts[[1]] > 0 && cts[[2]] != cts[[1]], Return[{cts[[1]], "high"}]];
   If[cts[[1]] == 0 && cts[[2]] == 0, Return[{cts[[3]], "medium"}]];
   Return[{cts[[2]], "high"}]);
cts = {#, hyp[#][[2]], ipa[#][[2]], reg[#][[2]]} & /@ ToLowerCase[wds];
scr = ({#[[1]], calcScr[#[[2 ;; 4]]]} &) /@ cts;
dts = Dataset[
  Association[#[[1]] -> <|"syllables" -> #[[2, 1]],  "confidence" -> #[[2, 2]]|> & /@ scr]]

Output is in the form of a dataset, but you can change that to your needs:

enter image description here

I have tested this on about 500 random words and only found 3 wrongly assessed words: nationalism, socialism, and nth. Though I think it is probably overkill for your question, it was a fun side project. Thanks. : )

POSTED BY: Mark Greenberg

Jay Weininger

Jay Weininger, Santa Fe College

Posted 6 years ago

thank you worked on determining syllables in English famous speeches, for the staff in that department realized need for ; in many statements so as not to produce unneeded output

POSTED BY: Jay Weininger

Tomas Garza

Tomas Garza, Retired, freelance

Posted 6 years ago

There is a nice work on this subject in "Un Divisor Silábico (Spanish)" from the Wolfram Demonstrations Project http://demonstrations.wolfram.com/UnDivisorSilabicoSpanish/ Contributed by: Jaime Rangel-Mondragon, which has the drawback of being written in Spanish. He has devised a number of clever rules for syllabic decomposition of words in Spanish which work extremely well. He doesn't go farther, but I did some work (unpublished) to apply his rules to longer texts and obtain interesting (IMHO) statistics. For example, the total number of different syllables for the 86,016 words which make up the Spanish language dictionary used by Wolfram is 3,707 (including repetitions due to the presence of accents, proper names, and some words originated in foreign languages). Unfortunately, there was no hectic or even mildly enthusiastic response from specialists in the field, so I didn't pursue the matter further.

POSTED BY: Tomas Garza

Jay Weininger

Jay Weininger, Santa Fe College

Posted 6 years ago

thank you I now have a list of {nation, 3, future, 2} I just need to add the results, so looking in documentation.

POSTED BY: Jay Weininger

Mark Greenberg

Posted 6 years ago

POSTED BY: Mark Greenberg

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback