Message Boards Message Boards

Function to read batches into a neural Network from a large file

Posted 5 years ago

After reading documentation on Training on Large Data Sets, I am still at a loss on how to create a generator function, to read a batch of 1000 of this

{1, 17, 39, 53, 44, 23} -> {18, 53, 50, 38, 6, 31} For an entire 66GB file. I have tried

ReadList[$fileStream, Expression,1000]

but this only reads the first 1000.

I posted this also on Stack Exchange and received the following response: Create a test:

f = OpenWrite["data.txt"];
SeedRandom[0];
Do[Write[f, RandomInteger[{0, 100}, 5] -> RandomInteger[{0, 100}, 5]], {10000}]
Close[f];

f = OpenRead["data.txt"];

generator = Function[
   Table[
      With[
       {r = Read[f, Record]},
       If[r === EndOfFile,
        SetStreamPosition[f, 0]; Read[f, Record],
        r
        ]
       ],
      {#BatchSize}
      ] // ToExpression // <|"Input" -> #[[;; , 1]], "Output" -> #[[;; , 2]]|> &
   ];

net = NetChain[
  {
   LinearLayer[16],
   LinearLayer[5]
   },
  "Input" -> 5,
  "Output" -> 5
  ]

SetStreamPosition[f, 0];
netT = NetTrain[
  net,
  {generator, "RoundLength" -> 10},
  All,
  BatchSize -> 1000, MaxTrainingRounds -> 10
  ]

This also will only read the first 1000 entries. It is like the function will not move further than 1000 entries into the file.

Any assistance would be appreciated.

Also, any good books on this subject?

POSTED BY: Michel Mesedahl

It might work better if you change "RoundLength" -> 10 to "RoundLength" -> 10000. That is the size of training data.

netT = NetTrain[net, {generator, "RoundLength" -> 10000}, All, 
  BatchSize -> 1000, MaxTrainingRounds -> 10]

enter image description here

POSTED BY: Kotaro Okazaki
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract