# WordCharacter?

Posted 9 years ago
3912 Views
|
9 Replies
|
5 Total Likes
|
 I've been writing a few essays on Mathematica, and I have been using the following code to count the number of words in my essay: StringCount["Hello, my name is Kyle", WordCharacter ..] This gives me the number of words in my essay. I understand what StringCount does, but I am curious to what WordCharacter does.I looked at the documentation, but I don't understand what it means by "Represents a letter or digit character in StringExpression".Can someone provide me a simpler explanation?Thank you,
9 Replies
Sort By:
Posted 9 years ago
 I am happy to help.
Posted 9 years ago
 Thank you very much. I have learned quite a bit through this discussion.Thank you very much again for your help.
Posted 9 years ago
 WordCharacter represents a single word character. So the string pattern WordCharacter.. represents a sequence of one or more word characters. So your StringCount computation, StringCount["Hello my name is Kyle",WordCharacter..] is returning the "lumps" that correspond to full self-contained groups of word characters. This is what StringCount is designed to do. ?StringCount StringCount["string","sub"] gives a count of the number of times "sub" appears as a substring of "string". StringCount["string",patt] gives the number of substrings in "string" that match the general string expression patt. StringCount["string",{Subscript[patt, 1],Subscript[patt, 2],\[Ellipsis]}] counts the number of occurrences of any of the Subscript[patt, i]. StringCount[{Subscript[s, 1],Subscript[s, 2],\[Ellipsis]},p] gives the list of results for each of the Subscript[s, i].  >> For fun compare your example with StringCount["Hel8lo my name is Dav7id", LetterCharacter ..] 
Posted 9 years ago
 Thank you very much,I ran a few tests and I do believe it is a set of all the standard characters.However, if WordCharacter is a set of all standard characters, how does it function to count words in StringCount["Hello my name is Kyle",WordCharacter..] Because it is definitely counting the number of words.Although it definitely doesn't count words composed only of non-standard characters.Thank you.
Posted 9 years ago
 I think that the actual specification for WordCharacter (as well as LetterCharacter) applies only to the characters in the 256 length ASCII set. If you stick to that interpretation (and assume that the documentation is in error) then you will be fine.
Posted 9 years ago
 The fact that this is either a documentation error or a bug was noted on Mathematica StackExchange:http://mathematica.stackexchange.com/questions/24077/how-to-count-words-in-a-greek-text/24080#24080
Posted 9 years ago
 Actually I am a bit confused by this. The documentation says WordCharacter matches any character for which either LetterQ or DigitQ yields True.  So this In[1]:= LetterQ /@ CharacterRange["\[Alpha]", "\[Omega]"] Out[1]= {True, True, True, True, True, True, True, True, True, True, \ True, True, True, True, True, True, True, True, True, True, True, \ True, True, True, True} would suggest that all lowercase greek characters are WordCharacters.However we have this: In[2]:= StringMatchQ[#, WordCharacter] & /@CharacterRange["\[Alpha]", "\[Omega]"] Out[2]= {False, False, False, False, False, False, False, False, \ False, False, False, False, False, False, False, False, False, False, \ False, False, False, False, False, False, False} So this clearly is not consistent with the statement in the documentation....Any expert care to comment?
Posted 9 years ago
 So, in my understanding, WordCharacter stands for every alphabet and number.So, if I use a greek letter in my essay, or any word that is not selected by WordCharacter, will it not count that word?
Posted 9 years ago
 It is one of the letters a through z (upper or lower case) or one of the integer digits 0 through 9, In[1]:= CharacterRange[" ", "~"] Out[1]= {" ", "!", "\"", "#", "\$", "%", "&", "'", "(", ")", "*", "+", ",", "-", ".", \ "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=", \ ">", "?", "@", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", \ "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", \ "\\", "]", "^", "_", "", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", \ "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", \ "z", "{", "|", "}", "~"} Those of these which match WordCharacter: In[2]:= Select[CharacterRange[" ", "~"], StringMatchQ[#, WordCharacter] &] Out[2]= {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", \ "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", \ "U", "V", "W", "X", "Y", "Z", "a", "b", "c", "d", "e", "f", "g", "h", "i", \ "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", \ "y", "z"} Greek characters are not part of this: In[3]:= Select[CharacterRange["\[CapitalAlpha]", "\[CapitalOmega]"], StringMatchQ[#, WordCharacter] &] Out[3]= {} `
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.