Message Boards Message Boards

3 Replies
0 Total Likes
View groups...
Share this post:

NetTrain exceeds size of dataset when using a generator function

Posted 1 year ago


My dataset consists of 192 one-hot encoded feature-bits that I have compressed to 24 bytes. The entire compressed feature-set is loaded in RAM as a ByteArray. The labels are loaded in RAM as a floating-point array. I have defined a generator function to batch uncompress the features and associate them with the labels:

genTrain = 
   "Input" -> 
       Normal[trainfeatures[[(#Round - 1)* 24 * #BatchSize + 
            1 ;; #Round * 24 * #BatchSize ]]], {#BatchSize, 24}], 2, 
      8]], "Output" -> 
    trainlabels[[(#Round - 1) * #BatchSize + 
        1 ;; #Round * #BatchSize ]]|>]

When I now train a network with this generator function:

ntrain = 100
npack = 24
trainfeatures = 
 ByteArray[Table[RandomInteger[{0, 255}], npack*ntrain]]
trainlabels = Table[RandomReal[{0.0, 1.0}], ntrain]
batchsize = 10
roundlength = Length[trainfeatures]/ (npack  * batchsize)
trained = 
 NetTrain[net, {genTrain, "RoundLength" -> roundlength}, All, 
  BatchSize -> batchsize, TargetDevice -> "GPU"]

I get the error: 'Cannot take positions 2401 through 2640 in ByteArray'. Clearly NetTrain asks for batch 11 which does not exist, even if I change "RoundLength" to 1. What am I doing wrong? Perhaps I misinterpreted the use of RoundLength and Round?

Regards, GW

3 Replies

Also, 'RoundLength' turns out to be 'the number of samples that is expected to be seen during a Round (epoch)', so it should be equal to the number of training samples, 'ntrain' in this case:

trained = 
 NetTrain[net, {genTrain, "RoundLength" -> ntrain}, All, 
  BatchSize -> batchsize, TargetDevice -> "GPU"]

With 'ntrain' set to 100000000 and 'batchsize' to 65536 you get 1525 batches. The Training Progress shows 'round 1//10, batch x/1525)' as expected.

Regards, GW

Posted 1 year ago

Hi wiesenekker, I'm not an expert but here

trainfeatures[[(#Round - 1)* 24 * #BatchSize + 
            1 ;; #Round * 24 * #BatchSize ]]]

Trainfeatures[[99 * 24 * 10 + 1]]=trainfeatures[[23761]]
When Length[trainfeatures]=2400

Batchsize is an error. Maybe I think

POSTED BY: Mauro Bertani

The following generator function randomly selects a batch rather then relying on the 'Round' number:

batchsize = 10
nbatches = Length[trainfeatures]/ (npack  * batchsize)
genTrain = 
  With[{ibatch = RandomInteger[{1, nbatches}]}, <|
    "Input" -> 
        Normal[trainfeatures[[(ibatch - 1)* 24 * #BatchSize + 1 ;; 
            ibatch * 24 * #BatchSize ]]], {#BatchSize, 24}], 2, 8]], 
    "Output" -> 
     trainlabels[[(ibatch - 1) * #BatchSize + 1 ;; 
        ibatch* #BatchSize ]]|>]]

With this generator function the error is gone.

Regards, GW

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract