I have a problem in which I need to work with a very large data set. The data is available as multiple CSV files, with the total byte count as much as 100G. The structure is relatively simple and would structure easily into a key->value system , I want to be able to analyze the data, which would usually require extracting subsets of the data -- these may be 10Gb -- and then analyzing the results by usual statistical methods.
If this were a smaller data set I would feel comfortable importing it into structured lists, as we did before V10, or importing it and assembling an associative dataset with the new tools. But here I am concerned about the size, which will certainly exceed what can be kept in RAM.
I have considered trying to map it into a V10 dataset. I have also wondered whether it would be possible to import it and export it as an SQL database, and then work with that.
I would be grateful for any advice.