Message Boards Message Boards

How to convert string to date object to sort the data ?

Posted 3 years ago

Hello, I have been struggling with how to sort my data by date.
I have a CSV file with one attribute as date. I have tried importing the file using Sematic Import in order to obtain the date data as a date object but it has been futile.
Here is an example of my data:
A.Name 2022-01-01 1:02 Subject Message
Whenever i import the data as a dataset, it is imported as a string. How can i convert it into a date object so that i can sort the entire dataset by date.
I have lost hope and will try to use some other language if this is not possible with Mathematica.
Any help would be much appreciated. Been struggling with this for 4 weeks.

POSTED BY: Rashmi Dhakad
6 Replies

I am providing the file. The problem is that others seem to be able to read that file using SemanticImport[]. When I edit the sample file so that it has 200 rows rather than the 257, I am able to read it directly with SemanticImport[] but beyond that, SemanticImport just hangs where I have to Abort the evaluation.

In case it helps, my specs are:

Edition Windows 11 Home Version 21H2 Installed on ‎2/‎4/‎2022 OS build 22000.856 Experience Windows Feature Experience Pack 1000.22000.856.0

Attachments:
POSTED BY: Henrick Jeanty
Posted 3 years ago

POSTED BY: Eric Rimbey
Posted 3 years ago

Alternatively, using Eric's sample data file

rawData = Import["~/sampledata.csv", "Dataset", HeaderLines -> 1]
data = rawData[All, <|#, "date" -> DateObject[#date]|> &]

To sort by the date column

data[SortBy["date"]]
POSTED BY: Rohit Namjoshi

The last suggestion proposed by Rohit has solved a problem I am/was having. My problem is the following. I have a csv file that contains the typical stock data (Symbol, Sector, Date, Open, High, Low, Close, Volume) and a few indicators (e.g. Relative Strength, moving average). There are 49 columns of which the Symbol and Sector are Strings. The 3rd columns ("Date") a date represented as "Month/Day/Year". When I used SemanticImport as in:

rawData = SemanticImport[filename, HeaderLines -> 1]

Things worked well until the file size got to about 200 rows. I first noticed the problem when I tried to do the SemanticImport on a file of 257 rows with 49 columns (as mentioned above). The cell where I executed the code above simply ran for ever (meaning hours) until I had to abort the evaluation. However, if I was doing the same thing with a file of 128 rows, the SemanticImport worked and returned a Dataset. Not knowing what was wrong, I thought that there might be a bad field in the file but simply using:

rawData = Import[filename,"Dataset",HeaderLines->1]
data = rawData[All, <|#, "Date" -> DateObject[DateList[{#Date, {"Month","Day", "YearShort"}}]]|> &]

Worked without a hitch. So, I figured that the size of the file was not a problem since Import[] was able to load the data and I could then convert all 256 dates strings to date objects. But SemanticImport simply stopped working once the file had about 200 rows. Though your suggestion has allowed me to solve my problem, I still would like to understand why SemanticImport stopped working once my file reached a certain size whereas Import[] was able to read the file without any issues.

POSTED BY: Henrick Jeanty
Posted 2 years ago

Hi Henrick,

Since the "Date" column imports fine, SemanticImport is probably failing or taking a long time to interpret data in some other column. Can you share a file that demonstrates the problem?

POSTED BY: Rohit Namjoshi
Posted 3 years ago

Hi Rashmi,

Could you please attach a sample (first 10 rows) of your CSV file to your question. Not obvious why SemanticImport would fail.

do = DateObject["2022-01-01 1:02"]
do // InputForm
(* DateObject[{2022, 1, 1, 1, 2}, "Minute", "Gregorian", -6.] *)
POSTED BY: Rohit Namjoshi
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract