After reading documentation on Training on Large Data Sets, I am still at a loss on how to create a generator function, to read a batch of 1000 of this
{1, 17, 39, 53, 44, 23} -> {18, 53, 50, 38, 6, 31} For an entire 66GB file. I have tried
ReadList[$fileStream, Expression,1000]
but this only reads the first 1000.
I posted this also on Stack Exchange and received the following response: Create a test:
f = OpenWrite["data.txt"];
SeedRandom[0];
Do[Write[f, RandomInteger[{0, 100}, 5] -> RandomInteger[{0, 100}, 5]], {10000}]
Close[f];
f = OpenRead["data.txt"];
generator = Function[
Table[
With[
{r = Read[f, Record]},
If[r === EndOfFile,
SetStreamPosition[f, 0]; Read[f, Record],
r
]
],
{#BatchSize}
] // ToExpression // <|"Input" -> #[[;; , 1]], "Output" -> #[[;; , 2]]|> &
];
net = NetChain[
{
LinearLayer[16],
LinearLayer[5]
},
"Input" -> 5,
"Output" -> 5
]
SetStreamPosition[f, 0];
netT = NetTrain[
net,
{generator, "RoundLength" -> 10},
All,
BatchSize -> 1000, MaxTrainingRounds -> 10
]
This also will only read the first 1000 entries. It is like the function will not move further than 1000 entries into the file.
Any assistance would be appreciated.
Also, any good books on this subject?