Group Abstract

Message Boards

WOLFRAM COMMUNITY

6.4K Views

9 Replies

5 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Mathematica Wolfram Language

WordCharacter?

Seokin Yeh

Seokin Yeh, Heritage High School

Posted 12 years ago

I've been writing a few essays on Mathematica, and I have been using the following code to count the number of words in my essay: StringCount["Hello, my name is Kyle", WordCharacter ..] This gives me the number of words in my essay. I understand what StringCount does, but I am curious to what WordCharacter does. I looked at the documentation, but I don't understand what it means by "Represents a letter or digit character in StringExpression". Can someone provide me a simpler explanation? Thank you,

POSTED BY: Seokin Yeh

9 Replies

Sort By:

David Reiss

David Reiss, Scientific Arts

Posted 12 years ago

I am happy to help.

POSTED BY: David Reiss

Seokin Yeh

Seokin Yeh, Heritage High School

Posted 12 years ago

Thank you very much. I have learned quite a bit through this discussion. Thank you very much again for your help.

POSTED BY: Seokin Yeh

David Reiss

David Reiss, Scientific Arts

Posted 12 years ago

WordCharacter represents a single word character. So the string pattern WordCharacter.. represents a sequence of one or more word characters. So your StringCount computation, StringCount["Hello my name is Kyle",WordCharacter..] is returning the "lumps" that correspond to full self-contained groups of word characters. This is what StringCount is designed to do. ?StringCount StringCount["string","sub"] gives a count of the number of times "sub" appears as a substring of "string". StringCount["string",patt] gives the number of substrings in "string" that match the general string expression patt. StringCount["string",{Subscript[patt, 1],Subscript[patt, 2],\[Ellipsis]}] counts the number of occurrences of any of the Subscript[patt, i]. StringCount[{Subscript[s, 1],Subscript[s, 2],\[Ellipsis]},p] gives the list of results for each of the Subscript[s, i]. >> For fun compare your example with StringCount["Hel8lo my name is Dav7id", LetterCharacter ..]

WordCharacter represents a single word character. So the string pattern

WordCharacter..

represents a sequence of one or more word characters. So your StringCount computation,

StringCount["Hello my name is Kyle",WordCharacter..]

is returning the "lumps" that correspond to full self-contained groups of word characters. This is what StringCount is designed to do.

?StringCount

StringCount["string","sub"] gives a count of the number of times "sub" appears as a substring of "string". 
StringCount["string",patt] gives the number of substrings in "string" that match the general string expression patt. 
StringCount["string",{Subscript[patt, 1],Subscript[patt, 2],\[Ellipsis]}] counts the number of occurrences of any of the Subscript[patt, i]. 
StringCount[{Subscript[s, 1],Subscript[s, 2],\[Ellipsis]},p] gives the list of results for each of the Subscript[s, i].  >>

For fun compare your example with

StringCount["Hel8lo my name is Dav7id", LetterCharacter ..]

POSTED BY: David Reiss

Seokin Yeh

Seokin Yeh, Heritage High School

Posted 12 years ago

Thank you very much, I ran a few tests and I do believe it is a set of all the standard characters. However, if WordCharacter is a set of all standard characters, how does it function to count words in StringCount["Hello my name is Kyle",WordCharacter..] Because it is definitely counting the number of words. Although it definitely doesn't count words composed only of non-standard characters. Thank you.

POSTED BY: Seokin Yeh

David Reiss

David Reiss, Scientific Arts

Posted 12 years ago

I think that the actual specification for WordCharacter (as well as LetterCharacter) applies only to the characters in the 256 length ASCII set. If you stick to that interpretation (and assume that the documentation is in error) then you will be fine.

POSTED BY: David Reiss

David Reiss

David Reiss, Scientific Arts

Posted 12 years ago

The fact that this is either a documentation error or a bug was noted on Mathematica StackExchange: http://mathematica.stackexchange.com/questions/24077/how-to-count-words-in-a-greek-text/24080#24080

POSTED BY: David Reiss

David Reiss

David Reiss, Scientific Arts

Posted 12 years ago

Actually I am a bit confused by this. The documentation says WordCharacter matches any character for which either LetterQ or DigitQ yields True. So this In[1]:= LetterQ /@ CharacterRange["\[Alpha]", "\[Omega]"] Out[1]= {True, True, True, True, True, True, True, True, True, True, \ True, True, True, True, True, True, True, True, True, True, True, \ True, True, True, True} would suggest that all lowercase greek characters are WordCharacters. However we have this: In[2]:= StringMatchQ[#, WordCharacter] & /@CharacterRange["\[Alpha]", "\[Omega]"] Out[2]= {False, False, False, False, False, False, False, False, \ False, False, False, False, False, False, False, False, False, False, \ False, False, False, False, False, False, False} So this clearly is not consistent with the statement in the documentation.... Any expert care to comment?

Actually I am a bit confused by this. The documentation says

WordCharacter matches any character for which either LetterQ or DigitQ yields True.

So this

In[1]:= LetterQ /@ CharacterRange["\[Alpha]", "\[Omega]"]

Out[1]= {True, True, True, True, True, True, True, True, True, True, \
True, True, True, True, True, True, True, True, True, True, True, \
True, True, True, True}

would suggest that all lowercase greek characters are WordCharacters.

However we have this:

In[2]:= StringMatchQ[#, WordCharacter] & /@CharacterRange["\[Alpha]", "\[Omega]"]

Out[2]= {False, False, False, False, False, False, False, False, \
False, False, False, False, False, False, False, False, False, False, \
False, False, False, False, False, False, False}

So this clearly is not consistent with the statement in the documentation....

Any expert care to comment?

POSTED BY: David Reiss

Seokin Yeh

Seokin Yeh, Heritage High School

Posted 12 years ago

So, in my understanding, WordCharacter stands for every alphabet and number. So, if I use a greek letter in my essay, or any word that is not selected by WordCharacter, will it not count that word?

POSTED BY: Seokin Yeh

David Reiss

David Reiss, Scientific Arts

Posted 12 years ago

It is one of the letters a through z (upper or lower case) or one of the integer digits 0 through 9, In[1]:= CharacterRange[" ", "~"] Out[1]= {" ", "!", "\"", "#", "$", "%", "&", "'", "(", ")", "*", "+", ",", "-", ".", \ "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=", \ ">", "?", "@", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", \ "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", \ "\\", "]", "^", "_", "`", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", \ "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", \ "z", "{", "\|", "}", "~"} Those of these which match WordCharacter: In[2]:= Select[CharacterRange[" ", "~"], StringMatchQ[#, WordCharacter] &] Out[2]= {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", \ "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", \ "U", "V", "W", "X", "Y", "Z", "a", "b", "c", "d", "e", "f", "g", "h", "i", \ "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", \ "y", "z"} Greek characters are not part of this: In[3]:= Select[CharacterRange["\[CapitalAlpha]", "\[CapitalOmega]"], StringMatchQ[#, WordCharacter] &] Out[3]= {}

It is one of the letters a through z (upper or lower case) or one of the integer digits 0 through 9,

In[1]:= CharacterRange[" ", "~"]

Out[1]= {" ", "!", "\"", "#", "$", "%", "&", "'", "(", ")", "*", "+", ",", "-", ".", \
"/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=", \
">", "?", "@", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", \
"M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", \
"\\", "]", "^", "_", "`", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", \
"k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", \
"z", "{", "|", "}", "~"}

Those of these which match WordCharacter:

In[2]:= Select[CharacterRange[" ", "~"], StringMatchQ[#, WordCharacter] &]

Out[2]= {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", \
"F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", \
"U", "V", "W", "X", "Y", "Z", "a", "b", "c", "d", "e", "f", "g", "h", "i", \
"j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", \
"y", "z"}

Greek characters are not part of this:

In[3]:= Select[CharacterRange["\[CapitalAlpha]", "\[CapitalOmega]"], 
 StringMatchQ[#, WordCharacter] &]

Out[3]= {}

POSTED BY: David Reiss

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback