Hi Young,
Glad to hear that you are making progress. There are a few problems with the code.
allWords
is a list of words, so a phrase like "based on discretion" will never match a word. The match should be done against plainText
. The right function to use is StringCount
, not Position
. allWords
contains newlines, so a phrase you are looking for can be split across lines and will not match exactly. The word/phrase in plainText
may be in mixed case, but the words/phrases you are looking for are all lowercase. Much better to use allWords
converted to lowercase and joined with spaces to form a single string of all the words.
keyWords = {"judgment-based", "based on discretion", "exercise judgment", "exercising judgment",
"exercise discretion", "exercising discretion", "subjectivity", "discretion",
"not based on any mathematical", "not quantifiable", "adjusted upward", "adjusted up",
"upward adjustment", "qualitative", "non-financial", "nonfinancial"};
cleanPlainText = StringRiffle@ToLowerCase@allWords
keyWordCounts = AssociationMap[StringCount[cleanPlainText, #] &, keyWords]
(*
<|"judgment-based" -> 0, "based on discretion" -> 0,
"exercise judgment" -> 0, "exercising judgment" -> 0,
"exercise discretion" -> 0, "exercising discretion" -> 0,
"subjectivity" -> 0, "discretion" -> 2,
"not based on any mathematical" -> 0, "not quantifiable" -> 0,
"adjusted upward" -> 0, "adjusted up" -> 0, "upward adjustment" -> 0,
"qualitative" -> 0, "non-financial" -> 0, "nonfinancial" -> 0|>
*)
Similarly for keywords2
(give it a better name) and trapWords
. Add keyWordCounts
and the counts for the other lists to the results association i.e. "keywordsfrequency" -> keyWordCounts
. etc.
You should manually check that the counts match by searching for the text in the browser to make sure there are no other edge cases to be considered.