Message Boards Message Boards

0
|
10390 Views
|
5 Replies
|
3 Total Likes
View groups...
Share
Share this post:

How can I get rid of raw double quotes within strings?

Posted 9 years ago

Hi everyone,

I imported some data from a spreadsheet (CSV), but what I got within Mathematica was not a rectangular array but rather a list of lists of varying length. I know that each list should have 300 items, so I selected the lists that have fewer. What I found was that some of those lists have a double raw quote and other times a double raw quote and a back-slash. When I remove these manually, the number of items reverts to what it is supposed to be, 300. There are a lot of these bothersome lists. How can I remove the raw double quotes using Mathematica functions? I can't enter a raw double quote in StringReplace as the character to be replaced because I'd have to surround it with quotes and Mathematica treats that as an error.

Any help would be much appreciated,

Gregory

POSTED BY: Gregory Lypny
5 Replies

Try using the Import function with the following additional option:

Import[file,  "TextDelimiters" -> {}]

let me know if this works for you.

Take a look at the discussion of "TextDelimiters" in this documentaiton:

http://reference.wolfram.com/language/ref/format/CSV.html

POSTED BY: David Reiss

Gregory:

With the understanding that you are asking to clean a comma separated value file by first removing what may truly be escaped double quotes in a text field; you may not need to do this. CSV see RFC and Wiki wiki . A good way to represent fields in CSV is to have all fields surrounded by double quotes. This way you can have commas inside field that need to represent long text that has a comma and/or double quotes, even tabs and returns. So what you may need to try is import the file as a "Table". Here is a quick example

ImportString["\"1\",\"2\r3\",\"2ec\"\n\"2\",\"3\",\"3d3\"", "Table", 
{"FieldSeparators" -> ",", "TextDelimiters" -> {"\""}, 
  "LineSeparators" -> {"\n"}, "CharacterEncoding" -> "ASCII", 
  "HeaderLines" -> 0, "EmptyField" -> "", 
  "RepeatedSeparators" -> False, "Numeric" -> False}]

This should return something like this

{{"1", "2
  3", "2ec"}, {"2", "3", "3d3"}}

If your data contains date you may need to define your "DateStringFormat" options. You will need to experiment with these options and use Import[] instead of ImportString[]. Hans

POSTED BY: Hans Michel

If you are just trying to clean the list after it has been Imported to Mathematica and you wish to use StringReplace the just escape the raw double quote

StringReplace["bad\" stuff \"from \"\"copy paste", "\"" -> ""]

Does this help/

POSTED BY: Hans Michel
Posted 9 years ago

Hi David and Hans,

Thanks for the good insights on formatting characters. By playing with different combinations of Import (CSV vs. Table) and the options you suggested, I was able to reduce 139 bad cases to 55. All of this resulted from my collaborator copy and pasting hand-collected data from web pages, especially HTML tables. You know, highlight a little too much and you grab the wrong thing.

I'll keep at it.

Thanks once again,

Gregory

POSTED BY: Gregory Lypny
Posted 9 years ago

Thanks yet again, Hans. It didn't clean up all of the corrupted lists, but it sure helped.

Regards,

Gregory

POSTED BY: Gregory Lypny
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract