Message Boards Message Boards

0
|
2592 Views
|
11 Replies
|
4 Total Likes
View groups...
Share
Share this post:

How to perform a delete query on a dataset?

Posted 2 years ago

I have an inconsistent data with values which i want to remove. I performed the following query:

cleanfinal[Select[Length[ToCharacterCode[#Date]] != 19 &], All]

This gave me the result of the dataset that i want to remove from my data set. I tried using Delete Case:

xyz = Dataset[
  DeleteCases[Normal[cleanfinal], 
   cleanfinal[Select[Length[ToCharacterCode[#Date]] != 19 &], All]]]

But This doesn't work. I tried using query function:

Query[Select[Length[ToCharacterCode[#Date]] != 19]@cleanfinal]

But this also failed. Is there any other method through which i can execute the query on the dataset? Any help will be immensely appreciated.

POSTED BY: Rashmi Dhakad
11 Replies
Posted 2 years ago

Rashmi, we're struggling to understand your question. This is my understanding so far:

  • This isn't specific to dates, the query you want to perform could involve any part of the dataset of any type
  • You're comfortable creating queries for Dataset expressions, so you're not asking how to select rows that match a given criteria.

Okay, but then you say "I want to remove the rows...", but you already know how to do that with a Query expression. Please explain how removing rows is different than performing a filter Query.

Then you ask "how do I go about it in a dataset without converting it into a database". But that's exactly what you've been doing. And that's exactly what my and Rohit's answers do. When you use the term "dataset", are you referring to a Mathematica Dataset[...] expression? That's what I was assuming, but I'm now thinking that my assumption was wrong. Please explain what you mean by "dataset".

As for "database", are you having difficulty integrating with a database engine like MySQL? If that is the issue, then we can help with that. Please confirm if this is your challenge.

Another possibility that I can think of is that you want to import the csv data, clean it, and then export it back out as csv. Maybe when you say "dataset" you mean "csv file". If that's the case, we can help with that (or you can just look at the documentation for Export). Please confirm if this is your challenge.

POSTED BY: Eric Rimbey
Posted 2 years ago

Hi Eric, the solution provided by Rohit helped out. My challenge was to first check if the column is even convertible for the format i want, for example here, whether date column can be converted into date object. I am able to filter out the csv data. I was able to complete the task. Thank you once again for all you kind help. It is very very appreciated.

POSTED BY: Rashmi Dhakad
Posted 2 years ago

Some other thoughts after actually looking at the details of your query. Presumably #Date is expected to be a string. ToCharacterCode will produce a list of character codes corresponding to each character in the string. Length will then give you the length of that list. Thus, you seem to be wanting to filter on string length. There is a function for that: StringLength. So, maybe something like this?

cleanfinal[Select[StringLength[#Date] == 19 &]]
POSTED BY: Eric Rimbey
Posted 2 years ago

It would be easier to help if you showed the data and told us how you want to transform it.

POSTED BY: Eric Rimbey
Posted 2 years ago

Hello Eric,

Thank you for your quick answer. But my problem is not this particular query. If i were to perform any Select query for example:

cleanfinal[Select[#Date === "String" &], All]

This gives me the result. But i cannot figure out a way to perform deletecases on the dataset. Do i need to convert the dataset into a database anytime i were to perform Queries similar to SQL or any relational database?

POSTED BY: Rashmi Dhakad
Posted 2 years ago

I guess I'm confused. If you have a query that gives you the desired result, then you're done, right? You don't need "to figure out a way to perform deletecases", because you've already done it. Selecting the desired cases is the same as deleting the undesired cases. You're finished. What am I missing?

POSTED BY: Eric Rimbey
Posted 2 years ago

Oh, wait. Is this a variable update question? Do you want the variable cleanfinal to be updated with the results of your query? That's easy:

cleanfinal = cleanfinal[Select[StringLength[#Date] == 19 &]]
POSTED BY: Eric Rimbey
Posted 2 years ago

Are you sure this is right? It selects rows where the Date column has the literal value "String"

cleanfinal[Select[#Date === "String" &], All]

Did you mean this, which selects all rows where the Date column is a String.

cleanfinal[Select[Head@#Date === String &], All]

As both Eric and I have requested, please attach a sample of your data to your question.

POSTED BY: Rohit Namjoshi
Posted 2 years ago

Hello Rohit, Thank you for your quick answer. I have attached a sample copy of the data. This is a CSV file in which the date column is having different format for a lot of the records, i want to remove the rows where the date is not in the format that can be converted to date object. The problem is not the date format, if I were to perform another operation on a different column where I am required to delete rows which doesn't have the desired form of value, how do I go about it in a dataset without converting it into a database. For example, for any column that contains special characters, or contains space, etc. Thanks.

Attachments:
POSTED BY: Rashmi Dhakad
Posted 2 years ago

Hi Rashmi,

Here is one way to handle invalid values in the date column

rawData = Import["~/Downloads/sample.csv"]

(* Add column headers and convert to Dataset *)
dataset = 
 AssociationThread[{"Id", "Name", "Date", "Subject", "MessageId"} -> #] & /@ rawData // Dataset

(* Helper function to check if a string is convertible to DateObject *)
checkDate[date_] := 
 If[Quiet@Check[DateObject[date], False] === False, False, DateObject[date]]

(* Convert Date column to DateObject where possible, filter out unconverted *)
cleanDates = 
 dataset[All, <|#, "Date" -> checkDate[#Date]|> &][Select[#Date =!= False &]]

Other filter examples

(* Only rows starting with "compat:" (ignoring case) in the Subject column *)
compatData = 
 cleanDates[Select[StringStartsQ[#Subject, "compat:", IgnoreCase -> True] &]]

(* Name is sophia *)
sophiaData = cleanDates[Select[#Name == "sophia" &]]

(* With "version" <anything> "999" in the Subject *)
v999Data = 
 cleanDates[Select[StringContainsQ[#Subject, ___ ~~ "version" ~~ ___ ~~ "999"] &]]

You can also use regular expressions for string matching and filtering.

POSTED BY: Rohit Namjoshi
Posted 2 years ago

If

cleanfinal[Select[Length[ToCharacterCode[#Date]] != 19 &], All]

gave you the result to be removed, then just negate the select condition:

cleanfinal[Select[Length[ToCharacterCode[#Date]] == 19 &], All]

Also, the All is probably superfluous. I haven't actually tested this query, because I don't know what your data is, so don't actually know if this works as you want.

POSTED BY: Eric Rimbey
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract