Group Abstract Group Abstract

Message Boards Message Boards

0
|
414 Views
|
5 Replies
|
9 Total Likes
View groups...
Share
Share this post:

How can I stop Mathematica from treating strings as numbers in scientific notation?

Posted 1 month ago

Hello everyone,

I have a large CSV data file that includes a variable that is an alpha-numeric ID (CUSIP) for financial securities. When I import the file, Mathematica treats a CUSIP such as 0452E105 as the number 425^105, which it is not. How can I stop Mathematica from doing this?

POSTED BY: Gregory Lypny
5 Replies
Posted 1 month ago

SemanticImport allows a CSV file to be imported while specifying the column types. This can be converted to tabular. For example, with an appropriate test file:

ToTabular@ SemanticImport["data.csv", {"Integer", "Real", "String", "String"}]

Example files are attached.

POSTED BY: David Keith

In Version 14.2, together with Tabular object, we introduced "Schema" element and option in Import of CSV. By default Import interprets 0452E105 as a number:

In[1]:= ImportString["1, -2.34, 0452E105", "CSV"]

Out[1]= {{1, -2.34, 4.52*10^107}}

We can get the TabularSchema object using:

In[2]:= schema = ImportString["1, -2.34, 0452E105", {"CSV", "Schema"}]

Out[2]= TabularSchema[<|"ColumnProperties" -> {<|"ElementType" -> "Integer64"|>, 
<|"ElementType" -> "Real64"|>, <|"ElementType" -> "Real64"|>}, 
  "KeyColumns" -> None, "Backend" -> "WolframKernel"|>]

Notice the type of the third column is "Real64". To change it to "String" we need to create a new TabularSchema and pass it to Import:

In[3]:= schema2 = TabularSchema[<|"ColumnProperties" -> {<|"ElementType" -> 
       "Integer64"|>, <|"ElementType" -> "Real64"|>, <|"ElementType" ->
        "String"|>}|>]

Out[3]= TabularSchema[<|"ColumnProperties" -> {<|"ElementType" -> "Integer64"|>,
 <|"ElementType" -> "Real64"|>, <|"ElementType" -> "String"|>}|>]

In[4]:= ImportString["1, -2.34, 0452E105", "CSV", "Schema" -> schema2]

Out[4]= {{1, -2.34, " 0452E105"}}

To work with large CSV data I strongly suggest to import it as "Tabular" instead of "Data":

In[5]:= ImportString["1, -2.34, 0452E105", {"CSV", "Tabular"}, "Schema" -> schema2] // TabularQ

Out[5]= True
POSTED BY: Piotr Wendykier
Posted 1 month ago

Thank you, Gianluca and Carl, for your suggestions. Both work in that they force all elements of the CSV array to be imported as strings. The necessary extra step is to convert all other columns that contain numbers, as intended, back to their type as integers or reals.

POSTED BY: Gregory Lypny

Set the "Numeric" option to False.

POSTED BY: Carl Verdon

Among the csv import element I see "RawData", which imports as a "two-dimensional array of strings". Have you tried it?

POSTED BY: Gianluca Gorni
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard