Message Boards Message Boards

0
|
1901 Views
|
3 Replies
|
2 Total Likes
View groups...
Share
Share this post:

How to unpack a batch of bytes for training a neural network?

Posted 1 year ago

Hi,

I have a dataset of 192 binary inputs (0/1) and corresponding outputs. I have compressed the inputs to 24 8-bit integers, so that the entire dataset can fit into RAM which greatly speeds up training of the neural network. Obviously, before a batch is trained the 24 8-bit integers have to be expanded to 192 integers. You can achieve this using NetEncoder (see https://community.wolfram.com/groups/-/m/t/3096538) so that works. I can load the entire compressed dataset in RAM using ReadByteArray, but how can I now apply NetEncoder to a (batch of) 24 bytes for training the neural network?

Regards, GW

3 Replies

I am confused by your problem setup. A single example is in -> out where in is 192 binary digits. A single modified example is in$mod -> out where in$mod is 24 integers stored as ByteArray

If this is correct you can use normal to unpack the ByteArray into a list

NetEncoder[{"Function", 
  Flatten@IntegerDigits[Normal@#, 2, 8] &, {192}}]

Now in the context of training, a batch of examples is a list of these 192-d vectors (or equally a list of 24-d byte arrays).

Let me know if this is incorrect otherwise please add a minimal working example to the original post.

Hi,

Thank you for your reply. After reading the dataset into RAM my dataset exists as a linear array of multiples of 24 bytes in RAM. Suppose the batch size is 65536. For the next batch I have to take the next block of (65536, 24) bytes of my compressed dataset, apply NetEncoder to the batch and train the network on the batch. However, it is unclear to me how I can configure NetTrain to do that. FYI, in Tensorflow I read the compressed dataset as a two-dimensional array in RAM, I turn the two-dimensional array into a batched dataset and apply a map function to the dataset to batch-unpack the 24 bytes (see also https://discuss.tensorflow.org/t/can-you-bit-pack-and-then-unpack-binary-inputs/19224). Tensorflow's model.fit will then loop over the batched dataset.

Regards, GW

I think you should use a generator function, for example:

enc = NetEncoder[{"Function", Flatten[IntegerDigits[#, 2, 8]] &, {192}}]
b = ByteArray[Table[RandomInteger[{0, 255}], 10 * 24]]
genTrain = Function[ArrayReshape[Normal[b[[1 ;; 24 * #BatchSize]]], {#BatchSize, 24}]]
enc[genTrain[<|"BatchSize" -> 2|>]]

As the dataset resides in RAM I could use RandomSample to select a batch, but am I right that if I want to batch the dataset linearly I have to specify the "RoundLength" option of NetTrain as:

Length[b] / (24 * BatchSize)

and add a "Round" argument to the generator function:

genTrain = Function[ArrayReshape[Normal[b[[(#Round - 1)* 24 * #BatchSize + 1 ;; #Round * 24 * #BatchSize - 1]]], {#BatchSize, 24}]]

Regards, GW

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract