Message Boards Message Boards

GROUPS:

Parse a .txt file through the Classify function and assign a class?

Posted 20 days ago
119 Views
|
3 Replies
|
0 Total Likes
|

I have 2 txt files that contain 500 DNA sequences each. Both files use 2 hard returns to separate the sequences like so:

ATCGATCT...

ATCGATCT...

ATCGATCT...

I want to parse these two files into the Classify function such that each item in the txt file is assigned a class, Gene1 and Gene2. I have already separated the two classes of DNA in their respective txt files.

Because of this, I was thinking I can classify all the items in txt1 as Gene1 and all the items in txt2 as Gene2. As for how to do this, I am not sure. Does Mathematica allow parsing of items from a txt file into the Classify function?

So far, I have this line of code for importing and classifying the contents of my files

c = Classify[Join@@Map[Thread[StringSplit[Import["txt1.txt"<>#<>"txt2.txt"], "\n\n"]-> 
"Gene"<>#]&, ToString/@Range[2]]]

I'm getting 2 errors from from the output: 1) it won't import the 2 files, 2) StringSplit fails

If anyone has a suggestion on how to do this or an easier way to import files and classify their contents, please let me know!

3 Replies
Posted 19 days ago

The argument passed to Import is incorrect. Try this:

classes = Join @@ Map[Thread[StringSplit[Import["txt" <> # <> ".txt"], "\n\n"] -> "Gene" <> #] &, ToString /@ Range[2]];

classes // Short

I'm confused by your Import argument. Is the first "txt" telling the Import argument that I am parsing in a .txt file? Also, in your Import statement, where would I specify the 2 .txt files I want to import?

This is how I tried to import 2 files after your suggestion, but it still gave me the same errors. Also, I notice that I am getting an error where I enter 4 arguments when 2 arguments were expected.

Import["txt" <> # <> "C:filepath\\file1.txt" "C:filepath\\file2.txt"]
Posted 18 days ago

Import can only import one file at a time. The import of multiple files happens via the surrounding Map over Range[2]. Break apart the expression to understand what is going on.

Map[Thread["txt" <> # <> ".txt" -> "Gene" <> #] &, ToString /@ Range[2]]

{"txt1.txt" -> "Gene1", "txt2.txt" -> "Gene2"}

Now, instead of "txt1.txt" -> "Gene1" we want the contents of the file "txt1.txt", that is done via the Import and splitting on the two newline characters. If no path is specified, Import will use the default directory. Evaluate Directory[] to see what that is set to on your system. If the files are in a different directory that needs to be specified. e.g.

classes = Join @@ Map[Thread[StringSplit[Import["C:\\filepath\\txt" <> # <> ".txt"], "\n\n"] -> "Gene" <> #] &, ToString /@ Range[2]];

For an introduction to how Map and pure functions work, take a look at this and this.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract