Message Boards Message Boards

GROUPS:

Import large data files for Machine Learning?

Posted 6 months ago
1358 Views
|
11 Replies
|
8 Total Likes
|

I want to run a machine learning task on my Win 10 PC, 16GB RAM, Mathematica 11.3.0, but I am facing the following problems: training set size 10GB CSV file, with 700,000,000 x 2 datasets. Mathematica simply stops during import via Import or ReadList function. My idea is to split the input file into several smaller files that could be imported and to load the smaller files in a batch to feed the Predict function or perhabs a neural network. Any idea how to make it happen? Do you have a better idea?

Many thanks in advance for support!

11 Replies

I uploaded my data to Mysql and than just created views in MySQL and read them from Mathematica

Posted 6 months ago

Good hint, thank you. I am currently trying to import the data into PostgreSQL. This approach means that the Predict function or a neural network would get the Data in a stream with the consequence that not all data could be available in one moment of time (due to limited RAM).

Is it possible for Mathematica to build a ML model based on a stream of data?

I don't know, but I know Mathematica can take advantage of Hadoop and MapReduce.

Posted 6 months ago

Is it possible for Mathematica to build a ML model based on a stream of data?

See my answer to this question.

Posted 6 months ago

That is very useful! Thank you very much. It seems that I can solve the problem with a database based stream of data and a neural network. I will make a try.

I suggest to use a generating function for training of large data sets, as it is described here

Training on large data sets

Posted 4 months ago

Thank you for the hint.

Posted 4 months ago

I suggest to use a generating function for training of large data sets, as it is described here

Training on large data sets

I actually just made this account to say thanks for posting this - it realy helped me and I thought it deserved some recognition!

Another possibility is to use a Mongo database --- as it is described in the same link I gave above.

Posted 4 months ago

I am sorry for late response. Well, I have to admit that I am not a MongoDB expert. It is not possible for me to import the entire Csv file in Mongo. The import stops after ca. 1% (5,5 * 10^6) datasets. Now I am trying to parse CSV to Json, and I hope the Mongo import will lead to success with Json.

Thanks again for support.

Posted 4 months ago

Thank You again to all participants for their contributions. After some trials I have found an efficient way to insert the big csv file in MongoDB via mongoimport.

The already above mentioned reference page is excellent to get all needed Information to connect Mathematica to MongoDB and to make use of data which do not fit into memory.https://reference.wolfram.com/language/tutorial/NeuralNetworksLargeDatasets.html

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract