Edit: Forgot to reply...
My first try was using ReadList
, but the files were so massive that I didn't know if anything was being done, or if the computer froze.
My goal was to get some feedback on the import status. A new version of this function is shown below.
With this is possible to get an update status (which can be very helpful!). My files can take over 1 min to import, hence is of utmost importance to know how "long" it will take.
Options@ImportSequantial = {BatchSize -> 1024};
ImportSequantial[file_String, OptionsPattern[]] /; FileExistsQ@file:= Module[{stream, fsize = FileByteCount@file, l, lines = {}},
stream = OpenRead@file;
PrintTemporary@ProgressIndicator[Dynamic@ByteCount@lines, {0, fsize}];
While[True,
l = ReadList[stream, Record, OptionValue@BatchSize];
If[Length@l == 0, Break[]];
lines = {lines, l};
];
Close@stream;
Flatten@lines
]
Importing a 1Gb file we have using your benchmark:
ImportSequantial
Memory: 1350468128 Timing: 17.455
ReadList
$Aborted (More than 5 min and the kernel was with >4Gb of memory!)
Where I took the precaution of quitting the kernel before each evaluation.
ReadList
was so slow I thought it was never going to finish (no patience).
Hence my first post.
With this new improved version we can import by batches, which is pretty fast and we can get a feedback as a status bar (an added bonus).