I have 2 txt files that contain 500 DNA sequences each. Both files use 2 hard returns to separate the sequences like so:
ATCGATCT...
ATCGATCT...
ATCGATCT...
I want to parse these two files into the Classify function such that each item in the txt file is assigned a class, Gene1 and Gene2. I have already separated the two classes of DNA in their respective txt files.
Because of this, I was thinking I can classify all the items in txt1
as Gene1
and all the items in txt2
as Gene2
. As for how to do this, I am not sure. Does Mathematica allow parsing of items from a txt file into the Classify function?
So far, I have this line of code for importing and classifying the contents of my files
c = Classify[Join@@Map[Thread[StringSplit[Import["txt1.txt"<>#<>"txt2.txt"], "\n\n"]->
"Gene"<>#]&, ToString/@Range[2]]]
I'm getting 2 errors from from the output: 1) it won't import the 2 files, 2) StringSplit fails
If anyone has a suggestion on how to do this or an easier way to import files and classify their contents, please let me know!