Message Boards

WOLFRAM COMMUNITY

837 Views

3 Replies

2 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Data Science Wolfram Language Neural Networks

How to unpack a batch of bytes for training a neural network?

Gijsbert Wiesenekker

Posted 4 months ago

Hi, I have a dataset of 192 binary inputs (0/1) and corresponding outputs. I have compressed the inputs to 24 8-bit integers, so that the entire dataset can fit into RAM which greatly speeds up training of the neural network. Obviously, before a batch is trained the 24 8-bit integers have to be expanded to 192 integers. You can achieve this using NetEncoder (see https://community.wolfram.com/groups/-/m/t/3096538) so that works. I can load the entire compressed dataset in RAM using ReadByteArray, but how can I now apply NetEncoder to a (batch of) 24 bytes for training the neural network? Regards, GW

POSTED BY: Gijsbert Wiesenekker

3 Replies

Sort By:

Gijsbert Wiesenekker

Posted 3 months ago

I think you should use a generator function, for example: enc = NetEncoder[{"Function", Flatten[IntegerDigits[#, 2, 8]] &, {192}}] b = ByteArray[Table[RandomInteger[{0, 255}], 10 * 24]] genTrain = Function[ArrayReshape[Normal[b[[1 ;; 24 * #BatchSize]]], {#BatchSize, 24}]] enc[genTrain[<\|"BatchSize" -> 2\|>]] As the dataset resides in RAM I could use RandomSample to select a batch, but am I right that if I want to batch the dataset linearly I have to specify the "RoundLength" option of NetTrain as: Length[b] / (24 * BatchSize) and add a "Round" argument to the generator function: genTrain = Function[ArrayReshape[Normal[b[[(#Round - 1)* 24 * #BatchSize + 1 ;; #Round * 24 * #BatchSize - 1]]], {#BatchSize, 24}]] Regards, GW

I think you should use a generator function, for example:

enc = NetEncoder[{"Function", Flatten[IntegerDigits[#, 2, 8]] &, {192}}]
b = ByteArray[Table[RandomInteger[{0, 255}], 10 * 24]]
genTrain = Function[ArrayReshape[Normal[b[[1 ;; 24 * #BatchSize]]], {#BatchSize, 24}]]
enc[genTrain[<|"BatchSize" -> 2|>]]

As the dataset resides in RAM I could use RandomSample to select a batch, but am I right that if I want to batch the dataset linearly I have to specify the "RoundLength" option of NetTrain as:

Length[b] / (24 * BatchSize)

and add a "Round" argument to the generator function:

genTrain = Function[ArrayReshape[Normal[b[[(#Round - 1)* 24 * #BatchSize + 1 ;; #Round * 24 * #BatchSize - 1]]], {#BatchSize, 24}]]

Regards, GW

POSTED BY: Gijsbert Wiesenekker

Giulio Alessandrini

Giulio Alessandrini, Wolfram Research Inc.

Posted 3 months ago

I am confused by your problem setup. A single example is `in -> out` where `in` is 192 binary digits. A single modified example is `in$mod -> out` where `in$mod` is 24 integers stored as `ByteArray` If this is correct you can use normal to unpack the ByteArray into a list NetEncoder[{"Function", Flatten@IntegerDigits[Normal@#, 2, 8] &, {192}}] Now in the context of training, a batch of examples is a list of these 192-d vectors (or equally a list of 24-d byte arrays). Let me know if this is incorrect otherwise please add a minimal working example to the original post.

POSTED BY: Giulio Alessandrini

Gijsbert Wiesenekker

Posted 3 months ago

Hi, Thank you for your reply. After reading the dataset into RAM my dataset exists as a linear array of multiples of 24 bytes in RAM. Suppose the batch size is 65536. For the next batch I have to take the next block of (65536, 24) bytes of my compressed dataset, apply NetEncoder to the batch and train the network on the batch. However, it is unclear to me how I can configure NetTrain to do that. FYI, in Tensorflow I read the compressed dataset as a two-dimensional array in RAM, I turn the two-dimensional array into a batched dataset and apply a map function to the dataset to batch-unpack the 24 bytes (see also https://discuss.tensorflow.org/t/can-you-bit-pack-and-then-unpack-binary-inputs/19224). Tensorflow's model.fit will then loop over the batched dataset. Regards, GW

POSTED BY: Gijsbert Wiesenekker

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Group Abstract

Feedback