Message Boards Message Boards

0
|
10246 Views
|
9 Replies
|
0 Total Likes
View groups...
Share
Share this post:

Merging Large Text Files Without Reading Entire File

Posted 10 years ago

I have some large data files that each contain 2 columns of data as shown below (strings in row are separated by spaces). I need to merge these files into one large file that contains 4 columns each separated by tabs. Is there an efficient built-in function that will let me do this using file streams without the need to read these large files in completely?

thanks

input1.txt 613 -112 1737 1685 1443 891 -39 273 1761 356 2007 -8 -198 -882 -1634

input2.txt 1044 249 -716 207 -841 989 1076 1446 899 969 -1574 1003 -1833 1192 241 -406 804 -1836

Desired output file 613 -112 1044 249 1737 392 -716 207 1685 -176 -841 989 1443 -813 1076 1446 891 -39 899 969 273 1761 -1574 1003 356 2007 -1833 1192 -8 -198 241 -406 -882 -1634 804 -1836

POSTED BY: Bob Stephens
9 Replies
Posted 10 years ago

sorry, missed the export to a file........thanks for your help

POSTED BY: Bob Stephens
Posted 10 years ago

Yes, here is another solution I came up with - last question, what is the best way to export this result to a file if we read everything in at once?

enter image description here

POSTED BY: Bob Stephens

A direct (but not very elegant) way could be:

ClearAll["Global`*"]
SetDirectory[NotebookDirectory[]];
input1 = Import["first.txt", "Data"];
input2 = Import["second.txt", "Data"];
result = Join[input1, input2, 2];
Export["result.txt", 
  StringReplace[
   ToString[result], {"}, {" -> "\n", ", " -> "\t", "{{" -> "", 
    "}}" -> ""}]];

But I have to confess that this is not solving your primary question: This way the entire input files have to be red in first.

Henrik

POSTED BY: Henrik Schachner
Posted 10 years ago

Looks like the issue is that each row is currently a string and so the Join function is not getting the number of expected elements

POSTED BY: Bob Stephens

Well, yes, according to you second posting I was assuming that your data are already in form of a matrix like so:

{{613, -112}, {1737, 
   392}, {1685, -176}, {1443, -813}, {891, -39}, {273, 1761}, {356, 
   2007}, {-8, -198}, {-882, -1634}} // TableForm

(*output: 613   -112
1737    392
1685    -176
1443    -813
891 -39
273 1761
356 2007
-8  -198
-882    -1634*)

Obviously this is not the case in your example ...

POSTED BY: Henrik Schachner
Posted 10 years ago

Adding sample files to discussion.

Attachments:
POSTED BY: Bob Stephens
Posted 10 years ago

So reading in the enclosed files and trying the join on the lists resulted in an error

file1List = ReadList[ "first.txt", Number]; file2List = ReadList[ "second.txt", Number];

Join[file1List , file2List , 2]; Resulted in

Expression {1836 -12,1012 -2661,-156 -2916,967 232,1224 1637,-1030 -754,-2002 -2651,-667 -1822,-267 -561,-[Ellipsis] 33,[Ellipsis] ,1292 1055,584 -469,584 -1311,1150 -974,-309 -767,-1949 -212,-719 1262,838 1754,-21 405} at position 1 is expected to have nonatomic subexpression at level 2. >>

POSTED BY: Bob Stephens

Try with:

Join[list1, list2, 2]

Cheers Henrik

POSTED BY: Henrik Schachner
Posted 10 years ago

Sorry, text formatting (ironically) in the web app does not clearly illustrate the problem. See image below

thanks

enter image description here

POSTED BY: Bob Stephens
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract