Message Boards

3594 Views

3 Replies

0 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Biological Sciences Data Science Wolfram Language Machine Learning

Parse a .txt file through the Classify function and assign a class?

Posted 5 years ago

I have 2 txt files that contain 500 DNA sequences each. Both files use 2 hard returns to separate the sequences like so: ATCGATCT... ATCGATCT... ATCGATCT... I want to parse these two files into the Classify function such that each item in the txt file is assigned a class, Gene1 and Gene2. I have already separated the two classes of DNA in their respective txt files. Because of this, I was thinking I can classify all the items in `txt1` as `Gene1` and all the items in `txt2` as `Gene2`. As for how to do this, I am not sure. Does Mathematica allow parsing of items from a txt file into the Classify function? So far, I have this line of code for importing and classifying the contents of my files c = Classify[Join@@Map[Thread[StringSplit[Import["txt1.txt"<>#<>"txt2.txt"], "\n\n"]-> "Gene"<>#]&, ToString/@Range[2]]] I'm getting 2 errors from from the output: 1) it won't import the 2 files, 2) StringSplit fails If anyone has a suggestion on how to do this or an easier way to import files and classify their contents, please let me know!

POSTED BY: Sam Liu

3 Replies

Sort By:

Posted 5 years ago

`Import` can only import one file at a time. The import of multiple files happens via the surrounding `Map` over `Range[2]`. Break apart the expression to understand what is going on. Map[Thread["txt" <> # <> ".txt" -> "Gene" <> #] &, ToString /@ Range[2]] {"txt1.txt" -> "Gene1", "txt2.txt" -> "Gene2"} Now, instead of "txt1.txt" -> "Gene1" we want the contents of the file "txt1.txt", that is done via the `Import` and splitting on the two newline characters. If no path is specified, `Import` will use the default directory. Evaluate `Directory[]` to see what that is set to on your system. If the files are in a different directory that needs to be specified. e.g. classes = Join @@ Map[Thread[StringSplit[Import["C:\\filepath\\txt" <> # <> ".txt"], "\n\n"] -> "Gene" <> #] &, ToString /@ Range[2]]; For an introduction to how `Map` and pure functions work, take a look at this and this.

POSTED BY: Rohit Namjoshi

Posted 5 years ago

I'm confused by your `Import` argument. Is the first `"txt"` telling the `Import` argument that I am parsing in a `.txt` file? Also, in your `Import` statement, where would I specify the 2 `.txt` files I want to import? This is how I tried to import 2 files after your suggestion, but it still gave me the same errors. Also, I notice that I am getting an error where I enter 4 arguments when 2 arguments were expected. Import["txt" <> # <> "C:filepath\\file1.txt" "C:filepath\\file2.txt"]

POSTED BY: Sam Liu

Posted 5 years ago

The argument passed to `Import` is incorrect. Try this: classes = Join @@ Map[Thread[StringSplit[Import["txt" <> # <> ".txt"], "\n\n"] -> "Gene" <> #] &, ToString /@ Range[2]]; classes // Short

POSTED BY: Rohit Namjoshi

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback