Group Abstract

Message Boards

WOLFRAM COMMUNITY

8.6K Views

5 Replies

2 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Use NetTrain[net,f] to read in a large set of rules (Is there an example?)

Michel Mesedahl

Posted 7 years ago

Hello, Is there an example of using NetTrain[net,f] to read in a large file (1.8 million lines). The file is in text format with lines of the form {17.,1.,34.,6.,14.,21.}->{26.,2.,53.,13.,20.,45.}, {16.,2.,3.,21.,17.,10.}->{13.,21.,49.,47.,51.,27.}, {4.,38.,40.,20.,5.,49.}->{15.,32.,51.,28.,50.,24.}, {37.,52.,29.,15.,39.,5.}->{34.,53.,25.,21.,47.,12.} . . . Thank You Michel

Hello, Is there an example of using NetTrain[net,f] to read in a large file (1.8 million lines). The file is in text format with lines of the form

{17.,1.,34.,6.,14.,21.}->{26.,2.,53.,13.,20.,45.},
{16.,2.,3.,21.,17.,10.}->{13.,21.,49.,47.,51.,27.},
{4.,38.,40.,20.,5.,49.}->{15.,32.,51.,28.,50.,24.},
{37.,52.,29.,15.,39.,5.}->{34.,53.,25.,21.,47.,12.}

. . .

Thank You

Michel

POSTED BY: Michel Mesedahl

5 Replies

Sort By:

Michel Mesedahl

Posted 7 years ago

Thank you Okazaki-san Very much appreciated this

POSTED BY: Michel Mesedahl

Rohit Namjoshi

Posted 7 years ago

Much easier and faster if each line did not have the `,` at the end. Strip them off using `sed` or `awk` which would be much faster than using Mathematica. Then $fileStream = OpenRead["data.txt"]; NetTrain[net, ReadList[$fileStream, Expression, 1000]] (* batches of 1000 *) Close[$fileStream] Training is probably going to take a while with such a large dataset and depending on the complexity of the model. So, if you have not already done so, you should train / test on a random subset of the data.

POSTED BY: Rohit Namjoshi

Kotaro Okazaki

Kotaro Okazaki, Fsas Technologies

Posted 7 years ago

Michel, I hope this will help if you cannot delete the , at the end. in = Import["data.txt", "Words"]; f = ToExpression@StringTrim[#, ","] & /@ in

POSTED BY: Kotaro Okazaki

Rohit Namjoshi

Posted 7 years ago

Okazaki-san, I assumed Michel is concerned that 1.8M training samples may not fit into available memory and was looking for a solution that loaded samples in smaller batches. Some techniques for dealing with this are documented here. If that is not an issue then certainly your solution is more straightforward.

POSTED BY: Rohit Namjoshi

Michel Mesedahl

Posted 7 years ago

Thank You Rohit Namjoshi What you offered, is what I needed. Thank You

POSTED BY: Michel Mesedahl

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback