Message Boards Message Boards

[WSS16] Deep compression

Posted 8 years ago

The aim of my project is to implement within the Wolfram Language a version of the "deep compression" algorithm proposed by Han et al in arXiv:1510.00149v5.

Neural networks are often memory costly, which difficult their embedding in systems with limited hardware resources. Thus, it would be very desirable to find a procedure to compress neural networks minimizing the lost in accuracy. The "deep compression" algorithm by Han et al is one possible solution to this problem.

The algorithm has three stages:

  1. pruning: all weights at each layer of the neural net that are smaller than a certain threshold are put to zero and the neural net is retrained with these constraints;

  2. trained quantization: at each layer we perform a cluster analysis and exchange the weights in each cluster by their centroid value;

  3. Huffman coding: we look at the distribution of weights and cluster indices in the whole neural net and using their frequency we Huffman encode them.

Han et al verified the efficiency of their algorithm applying it to two deep convolutional networks: AlexNet (240MB) and VGG-16 (552MB), being able to get a compression factor of 35x (240MB -> 6.9MB) for the first one and 49x (552MB -> 11.3MB) for the second.

We applied the algorithm to a multilayer perceptron neural network, trained on the MNIST dataset. The neural network is formed by three fully connected layers, the first with 100 neurons and the other two with 10 neurons each and we managed to achieve a compression rate of ~25 with a loss in accuracy within 1%.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract