# Stripping Strings to Get Numerics

Posted 9 years ago
4995 Views
|
8 Replies
|
1 Total Likes
|
 I am importing a CSV file of tabulation data generated by a market research package.Each value field starts with a number, but then has character qualifiers and flags that relate to whether or not it is a percentage, whether the sample size is low, and so on. So example fields might include...55 12.7 27% 33.98% 89* 91%*abe...and so on. The numerical values are ALWAYS the first part of the field. So you never see abe91%, for example. I already know which fields represent percentages (so I don't need to preserve the percent sign, nor do I need even to know it is there).I can think of several inelegant ways of stripping out the text to get at the raw number (and decimals). But I know there must be one -- or several -- elegant solution(s).Anyone know an elegant solution?Thanks in advanceBrad
8 Replies
Sort By:
Posted 9 years ago
 other variation: iNumbers[str_String] := StringCases[str, x : Except[" ", (DigitCharacter | ".") ..] :> ToExpression[x]] iNumbers[str_List] := Flatten[stringNumber /@ str] iNumbers[str_] := str result: In[74]:= iNumbers[{"5 is a number. But 6 is not", 234, "55", "12.7", "27%", "33.98%", "91%abe"}] Out[74]= {5, 6, 234, 55, 12.7, 27, 33.98, 91} 
Posted 9 years ago
 How's this? list = {"55", "12.7", "27%", "33.98%", "89", "91%abe"}; ToExpression/@StringJoin/@(StringCases[#, DigitCharacter | "."] & /@ list) Result is {55, 12.7, 27, 33.98, 89, 91} Or this one: (I find it theoretically more elegant, but it's slightly longer.) ToExpression@StringJoin@StringSplit[#, Except[DigitCharacter | "."]] & /@ list 
Posted 9 years ago
 I could do more exclusions such as: extractNumbers[list_List] := ToExpression[StringSplit[#, {"%", " ", CharacterRange["a", "Z"] ..}][[1]]] & /@ list which would make extractNumbers[{"5 is a number. But 6 is not"}] give {5}. But this would end up patching every case individually.Best wishes,Marco
Posted 9 years ago
 Yes, your solution should work--I should have realized that the import mechanism would already create numbers for some of those things that are unambiguously interpreted as numbers. And as I was reading the first lines of your post I was already gong to write a solution like that for you. But to be more confident that something odd might slip through the generality of what you wrote (since it uses val_ which is a pattern that matches anything) I'd use, GetMyNumber[val_?NumberQ] := val I think that this should server you well. Let us know if there's anything that slips through...
Posted 9 years ago
 Thanks, David and MarcoI've employed David's solution, but it gives rise to another problem.Import[], which is how I get the CSV file into Mathematica in the first place, returns strings for values such as...27% 91%abe...but returns what are effectively integers for values such as...9356... and these choke on the str_String parameter specifier in David's solution.I thought the quick fix would be to supply another function of the same name without the _String type specifier, which should be automatically called if I supplied it with an integer, or any other non-string value. So I coded...GetMyNumber[val_] := val This seems to work, but I am never entirely sure about whether relying on such polymorphism is a more efficient route than, say, some method of coding just one function with a non-type-specific parameter, and sorting through the type alternatives at run-time.Any thoughts?
Posted 9 years ago
 Of course my function gives bizarre results in cases like this! So careful restriction to exactly the described cases is important: In[5]:= GetMyNumber["5 is a number. But 6 is not"] Out[5]= 5.6 And those cases cannot have either numbers or periods after the actual leading number.
Posted 9 years ago
 David's solution is much more elegant and can deal with more cases, but I was already trying to get the same, so I thought I might as well post my solution. extractNumbers[list_List] := ToExpression[StringSplit[#, {"%", CharacterRange["a", "Z"] ..}][[1]]] & /@ list and then list={"55", "12.7", "27%", "33.98%", "89", "91%abe"}; extractNumbers[list] which gives {55, 12.7, 27, 33.98, 89, 91} I guess I did not know the function DigitCharacter.Cheers,Marco
Posted 9 years ago
 I don't know if this is "elegant" but it certainly works because your numbers come first... In[1]:= GetMyNumber[str_String] := ToExpression[StringJoin@StringCases[str, DigitCharacter | "."]] In[2]:= someNumbers = {"55", "12.7", "27%", "33.98%", "89", "91%abe"} Out[2]= {"55", "12.7", "27%", "33.98%", "89", "91%abe"} In[3]:= GetMyNumber /@ someNumbers Out[3]= {55, 12.7, 27, 33.98, 89, 91} Unfortunately something more elegant like using Interpreter["Number"] on your strings will not work since a number of the forms you have are not known number forms. Also, often using Interpreter[...] can be quite slow as it often uses the Cloud to do the interpretation. But here is an example of the problem for a more general (semantic interpreter) case showing examples of an Interpreter working and not working on your examples: In[4]:= Interpreter["InactiveSemanticExpression"] /@ someNumbers Out[4]= {55, 12.7, Inactive[Quantity][27, "Percent"], Inactive[Quantity][33.98, "Percent"], 89, Failure[ "InterpretationFailure", Association[ "MessageTemplate" :> MessageName[Interpreter, "semantic"], "MessageParameters" -> Association["Input" -> "91%abe"], "Input" -> "91%abe", "Type" -> "Expression"]]} Which may not format correctly in this forum