Message Boards Message Boards

1
|
6638 Views
|
5 Replies
|
2 Total Likes
View groups...
Share
Share this post:

Use NetTrain[net,f] to read in a large set of rules (Is there an example?)

Posted 5 years ago

Hello, Is there an example of using NetTrain[net,f] to read in a large file (1.8 million lines). The file is in text format with lines of the form

{17.,1.,34.,6.,14.,21.}->{26.,2.,53.,13.,20.,45.},
{16.,2.,3.,21.,17.,10.}->{13.,21.,49.,47.,51.,27.},
{4.,38.,40.,20.,5.,49.}->{15.,32.,51.,28.,50.,24.},
{37.,52.,29.,15.,39.,5.}->{34.,53.,25.,21.,47.,12.}

. . .

Thank You

Michel

POSTED BY: Michel Mesedahl
5 Replies

Thank you Okazaki-san

Very much appreciated this

POSTED BY: Michel Mesedahl
Posted 5 years ago

Much easier and faster if each line did not have the , at the end. Strip them off using sed or awk which would be much faster than using Mathematica. Then

$fileStream = OpenRead["data.txt"];
NetTrain[net, ReadList[$fileStream, Expression, 1000]] (* batches of 1000 *)
Close[$fileStream]

Training is probably going to take a while with such a large dataset and depending on the complexity of the model. So, if you have not already done so, you should train / test on a random subset of the data.

POSTED BY: Rohit Namjoshi

Michel, I hope this will help if you cannot delete the , at the end.

in = Import["data.txt", "Words"];
f = ToExpression@StringTrim[#, ","] & /@ in

enter image description here

POSTED BY: Kotaro Okazaki
Posted 5 years ago

Okazaki-san,

I assumed Michel is concerned that 1.8M training samples may not fit into available memory and was looking for a solution that loaded samples in smaller batches. Some techniques for dealing with this are documented here.

If that is not an issue then certainly your solution is more straightforward.

POSTED BY: Rohit Namjoshi

Thank You Rohit Namjoshi

What you offered, is what I needed.

Thank You

POSTED BY: Michel Mesedahl
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract