Group Abstract

Message Boards

WOLFRAM COMMUNITY

6.6K Views

1 Reply

1 Total Like

View groups...

Follow this post

Share this post:

GROUPS:

Select elements in my texts and count the frequency of each code "@..." ?

Eurov Stars

Posted 7 years ago

Good morning. After processing a few texts, I ended up with an XML file with a lot of semantic anotation codes initiated by @ and a certain amount of numbers (7 or 8) to identify some words of my texts. Example: It follows an incident<incident,Noun@7307477[incident]> at UC Berkeley when police<police,Noun@8209687[police,police force,constabulary,law]> Does anyone know an efficient and (semi)automatic way to extract all these @xxxxxxx and compile them in a list so that I can count the frequency of each code? I appreciate any help. Thank you.

POSTED BY: Eurov Stars

1 Reply

Sort By:

Ian Williams

Ian Williams, GeoConsult Limited

Posted 7 years ago

Does something like this do the job? text = "xxx@1234 abcd xxx@2345 xxx@3456 xxx@3456 xxx@3456 xxx@1234 @xyz" // StringSplit; text // InputForm {"xxx@1234", "abcd", "xxx@2345", "xxx@3456", "xxx@3456", "xxx@3456", "xxx@1234", "@xyz"} codes = StringCases[text, ___ ~~ "@" ~~ code__ -> code] // Flatten {"1234", "2345", "3456", "3456", "3456", "1234", "xyz"} Counts[codes] <\|"1234" -> 2, "2345" -> 1, "3456" -> 3, "xyz" -> 1\|> All the best, Ian

Does something like this do the job?

text = "xxx@1234 abcd xxx@2345 xxx@3456 xxx@3456 xxx@3456 xxx@1234 @xyz" // 
   StringSplit;
text // InputForm

{"xxx@1234", "abcd", "xxx@2345", "xxx@3456", "xxx@3456", "xxx@3456", "xxx@1234", "@xyz"}

codes = StringCases[text, ___ ~~ "@" ~~ code__ -> code] // Flatten

{"1234", "2345", "3456", "3456", "3456", "1234", "xyz"}

Counts[codes]

<|"1234" -> 2, "2345" -> 1, "3456" -> 3, "xyz" -> 1|>

All the best,

Ian

POSTED BY: Ian Williams

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback