Group Abstract Group Abstract

Message Boards Message Boards

Import large data files for Machine Learning?

Posted 7 years ago

I want to run a machine learning task on my Win 10 PC, 16GB RAM, Mathematica 11.3.0, but I am facing the following problems: training set size 10GB CSV file, with 700,000,000 x 2 datasets. Mathematica simply stops during import via Import or ReadList function. My idea is to split the input file into several smaller files that could be imported and to load the smaller files in a batch to feed the Predict function or perhabs a neural network. Any idea how to make it happen? Do you have a better idea?

Many thanks in advance for support!

POSTED BY: Jürgen Kanz
11 Replies
POSTED BY: Jürgen Kanz
Posted 6 years ago
POSTED BY: Jojen Bourgain
POSTED BY: Jürgen Kanz

Another possibility is to use a Mongo database --- as it is described in the same link I gave above.

POSTED BY: Wolfgang Hitzl

Thank you for the hint.

POSTED BY: Jürgen Kanz

I suggest to use a generating function for training of large data sets, as it is described here

Training on large data sets

POSTED BY: Wolfgang Hitzl

That is very useful! Thank you very much. It seems that I can solve the problem with a database based stream of data and a neural network. I will make a try.

POSTED BY: Jürgen Kanz
Posted 7 years ago

Is it possible for Mathematica to build a ML model based on a stream of data?

See my answer to this question.

POSTED BY: Rohit Namjoshi

I don't know, but I know Mathematica can take advantage of Hadoop and MapReduce.

Good hint, thank you. I am currently trying to import the data into PostgreSQL. This approach means that the Predict function or a neural network would get the Data in a stream with the consequence that not all data could be available in one moment of time (due to limited RAM).

Is it possible for Mathematica to build a ML model based on a stream of data?

POSTED BY: Jürgen Kanz

I uploaded my data to Mysql and than just created views in MySQL and read them from Mathematica

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard