Message Boards Message Boards

0
|
4915 Views
|
9 Replies
|
5 Total Likes
View groups...
Share
Share this post:

WordCharacter?

Posted 11 years ago

I've been writing a few essays on Mathematica, and I have been using the following code to count the number of words in my essay:

StringCount["Hello, my name is Kyle", WordCharacter ..]

This gives me the number of words in my essay. I understand what StringCount does, but I am curious to what WordCharacter does.

I looked at the documentation, but I don't understand what it means by "Represents a letter or digit character in StringExpression".

Can someone provide me a simpler explanation?

Thank you,

POSTED BY: Seokin Yeh
9 Replies

I think that the actual specification for WordCharacter (as well as LetterCharacter) applies only to the characters in the 256 length ASCII set. If you stick to that interpretation (and assume that the documentation is in error) then you will be fine.

POSTED BY: David Reiss

It is one of the letters a through z (upper or lower case) or one of the integer digits 0 through 9,

In[1]:= CharacterRange[" ", "~"]

Out[1]= {" ", "!", "\"", "#", "$", "%", "&", "'", "(", ")", "*", "+", ",", "-", ".", \
"/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=", \
">", "?", "@", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", \
"M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", \
"\\", "]", "^", "_", "`", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", \
"k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", \
"z", "{", "|", "}", "~"}

Those of these which match WordCharacter:

In[2]:= Select[CharacterRange[" ", "~"], StringMatchQ[#, WordCharacter] &]

Out[2]= {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", \
"F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", \
"U", "V", "W", "X", "Y", "Z", "a", "b", "c", "d", "e", "f", "g", "h", "i", \
"j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", \
"y", "z"}

Greek characters are not part of this:

In[3]:= Select[CharacterRange["\[CapitalAlpha]", "\[CapitalOmega]"], 
 StringMatchQ[#, WordCharacter] &]

Out[3]= {}
POSTED BY: David Reiss

Actually I am a bit confused by this. The documentation says

WordCharacter matches any character for which either LetterQ or DigitQ yields True. 

So this

In[1]:= LetterQ /@ CharacterRange["\[Alpha]", "\[Omega]"]

Out[1]= {True, True, True, True, True, True, True, True, True, True, \
True, True, True, True, True, True, True, True, True, True, True, \
True, True, True, True}

would suggest that all lowercase greek characters are WordCharacters.

However we have this:

In[2]:= StringMatchQ[#, WordCharacter] & /@CharacterRange["\[Alpha]", "\[Omega]"]

Out[2]= {False, False, False, False, False, False, False, False, \
False, False, False, False, False, False, False, False, False, False, \
False, False, False, False, False, False, False}

So this clearly is not consistent with the statement in the documentation....

Any expert care to comment?

POSTED BY: David Reiss

WordCharacter represents a single word character. So the string pattern

WordCharacter..

represents a sequence of one or more word characters. So your StringCount computation,

StringCount["Hello my name is Kyle",WordCharacter..]

is returning the "lumps" that correspond to full self-contained groups of word characters. This is what StringCount is designed to do.

?StringCount

StringCount["string","sub"] gives a count of the number of times "sub" appears as a substring of "string". 
StringCount["string",patt] gives the number of substrings in "string" that match the general string expression patt. 
StringCount["string",{Subscript[patt, 1],Subscript[patt, 2],\[Ellipsis]}] counts the number of occurrences of any of the Subscript[patt, i]. 
StringCount[{Subscript[s, 1],Subscript[s, 2],\[Ellipsis]},p] gives the list of results for each of the Subscript[s, i].  >>

For fun compare your example with

StringCount["Hel8lo my name is Dav7id", LetterCharacter ..]
POSTED BY: David Reiss

So, in my understanding, WordCharacter stands for every alphabet and number.

So, if I use a greek letter in my essay, or any word that is not selected by WordCharacter, will it not count that word?

POSTED BY: Seokin Yeh

The fact that this is either a documentation error or a bug was noted on Mathematica StackExchange:

http://mathematica.stackexchange.com/questions/24077/how-to-count-words-in-a-greek-text/24080#24080

POSTED BY: David Reiss

Thank you very much,

I ran a few tests and I do believe it is a set of all the standard characters.

However, if WordCharacter is a set of all standard characters, how does it function to count words in

StringCount["Hello my name is Kyle",WordCharacter..]

Because it is definitely counting the number of words.

Although it definitely doesn't count words composed only of non-standard characters.

Thank you.

POSTED BY: Seokin Yeh

Thank you very much. I have learned quite a bit through this discussion.

Thank you very much again for your help.

POSTED BY: Seokin Yeh

I am happy to help.

POSTED BY: David Reiss
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract