Message Boards Message Boards

Select elements in my texts and count the frequency of each code "@..." ?

Posted 5 years ago

Good morning. After processing a few texts, I ended up with an XML file with a lot of semantic anotation codes initiated by @ and a certain amount of numbers (7 or 8) to identify some words of my texts.

Example: It follows an incident<incident,Noun@7307477[incident]> at UC Berkeley when police<police,Noun@8209687[police,police force,constabulary,law]>

Does anyone know an efficient and (semi)automatic way to extract all these @xxxxxxx and compile them in a list so that I can count the frequency of each code? I appreciate any help. Thank you.

POSTED BY: Eurov Stars

Does something like this do the job?

text = "xxx@1234 abcd xxx@2345 xxx@3456 xxx@3456 xxx@3456 xxx@1234 @xyz" // 
   StringSplit;
text // InputForm

{"xxx@1234", "abcd", "xxx@2345", "xxx@3456", "xxx@3456", "xxx@3456", "xxx@1234", "@xyz"}

codes = StringCases[text, ___ ~~ "@" ~~ code__ -> code] // Flatten

{"1234", "2345", "3456", "3456", "3456", "1234", "xyz"}

Counts[codes]

<|"1234" -> 2, "2345" -> 1, "3456" -> 3, "xyz" -> 1|>

All the best,

Ian

POSTED BY: Ian Williams
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract