Towards Accurate Binary Convolutional Neural Network

Size: px

Start display at page:

Download "Towards Accurate Binary Convolutional Neural Network"

Duane Wade
5 years ago
Views:

1 Paper: #261 Poster: Pacific Ballroom #101 Towards Accurate Binary Convolutional Neural Network Xiaofan Lin, Cong Zhao and Wei Pan* Photos and videos are either original work or taken from Wikimedia, under Creative Commons license

Billions of floating point multiplication-accumulation Challenges

2 DJI Drones use CNN for many tasks Large model size: Hundreds of megabytes of floating point weight values Expensive computation: Billions of floating point multiplication-accumulation Challenges for DJI Drones Limited resources for computation and power in DJI drones

Compression of deep neural networks for mobile applications Synapse and neuron pruning Quantization Sparse, irregular computation -- difficult to process efficiently Regular computation, smaller

3 Compression of deep neural networks for mobile applications Synapse and neuron pruning Quantization Sparse, irregular computation -- difficult to process efficiently Regular computation, smaller datapaths, fewer bits per weight and activation [1] S.Han, H.Mao, and J.W.Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arxiv preprint arxiv: , [2] P.Molchanov, S.Tyree, T.Karras, et al. Pruning Convolutional Neural Networks for Resource Efficient Inference. ICLR [3] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. arxiv preprint arxiv: , [4] S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arxiv preprint arxiv: , [5] G.A.Howard, M.Zhu, B.Chen, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arxiv preprint arxiv: , [6] N.F.Iandola, S.Han, W.M.Moskewicz, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arxiv preprint arxiv: , 2016.

4 Compression of deep neural networks for mobile applications Synapse and neuron pruning Quantization Sparse, irregular computation -- difficult to process efficiently Regular computation, smaller datapaths, fewer bits per weight and activation [1] S.Han, H.Mao, and J.W.Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arxiv preprint arxiv: , [2] P.Molchanov, S.Tyree, T.Karras, et al. Pruning Convolutional Neural Networks for Resource Efficient Inference. ICLR [3] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. arxiv preprint arxiv: , [4] S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arxiv preprint arxiv: , [5] G.A.Howard, M.Zhu, B.Chen, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arxiv preprint arxiv: , [6] N.F.Iandola, S.Han, W.M.Moskewicz, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arxiv preprint arxiv: , Floating point multiplication and accumulation is still the bottleneck!

5 Binarized neural networks (BNNs) The extreme case of quantization: binary weight and activation +1 and -1 (using sign() function) Key computation: binary matrix multiplication and accumulation x y = POPCOUNT(x XNOR y),x i,y i 2 { 1, +1}, 8i An example: 1 1 apple 1 1 Floating point operation 1 ( 1) + ( 1) 1 Bitwise operation! POPCOUNT(1 XNOR ( 1), 1 XNOR 1) XNOR Truth Table Input Output x y XNOR [7] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio. Binarized neural networks: Training deep neural networks with weights and activations constrained to + 1 or-1, ICML [8] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. ECCV 2016.

6 Prediction accuracy with BNNs Competitive on small benchmarks: MNIST (handwritten digits), SVHN (street number), CIFAR-10 (10 classes objects) Too much loss on large benchmarks: ImageNet (1000 classes objects). MNIST SVHN CIFAR-10 ImageNet Binary weights & activations 99.04% 97.47% 89.85% 51.2% Full Precision weights & activations 99.06% 98.31% 92.38% 69.3% Accuracy loss 0.2% 0.84% 2.53% 18.1% [8] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. ECCV 2016.

7 Binary matrix multiplication Observation: too much accuracy loss using sign() for binarization Real-Value Weights Binary Weights Real-Value Inputs Binary Inputs Top-1 accuracy: 69.3% 60.8% Top-1 accuracy: 69.3% 51.2% Plan: approximate the weights and activations more precisely Intuitive example: say, we want to approximate a real number x =1.512 f 1 (x) =sign(x) ) x>0 f 1 (x) =sign(x),f 2 (x) = sign(x 1) ) x>1 f 1 (x) =sign(x),f 2 (x) = sign(x 1),f 3 (x) = sign(x 2) ) 1 <x<2 base 1 base 2 base 3

8 Approximate full precision weights using shift parameters Construct binary wight bases by shift B 1, B 2,, B M 2 { 1, +1} w h c in c out B i = F ui (W ) := sign(w mean(w )+u i std(w )) move the weight in sign() function by certain shift parameters shift parameters can be learned Approximate full precision weights using binary bases W 1 B B M B M

9 Approximate full precision activations using shift parameters Construct binary activation bases by shift 1 R = ReLU(Input) 2 h v (R) =clip(r + v, 0, 1) 3 H v (R) :=2I hv (R) Approximate full precision activations using binary bases A 1,, A N = H v1 (R),,H vn (R) R 1 A N A N

10 Approximate full precision activations by shift Construct binary activation bases by shift 1 R = ReLU(Input) 2 h v (R) =clip(r + v, 0, 1) 3 H v (R) :=2I hv (R) Approximate full precision activations using binary bases A 1,, A N = H v1 (R),,H vn (R) R 1 A N A N

11 Approximate full precision activations by shift Construct binary activation bases by shift 1 R = ReLU(Input) 2 h v (R) =clip(r + v, 0, 1) 3 H v (R) :=2I hv (R) Approximate full precision activations using binary bases A 1,, A N = H v1 (R),,H vn (R) R 1 A N A N

12 Approximate full precision activations by shift Construct binary activation bases by shift 1 R = ReLU(Input) 2 h v (R) =clip(r + v, 0, 1) 3 H v (R) :=2I hv (R) Approximate full precision activations using binary bases A 1,, A N = H v1 (R),,H vn (R) R 1 A N A N

13 Parallel & multiple binary convolution Conv(W, R)! MX NX Conv m B m, na n NX = m=1 nconv MX n=1 m B m, A n! = n=1 m=1 m=1 n=1 sum-sum operation can be parallel MX NX m n Conv (B m, A n ) Binary Conv: x y = POPCOUNT(x XNOR y),x i,y i 2 { 1, +1}, 8i Advantages 1. Bitwise operations 2. More bases, better approximation 3. Parallel computation (sum-sum) is hardware friendly

14 Result Model on ImageNet Benchmark Full-Precision ResNet-18 [full-precision weights and activations] BWN [full-precision activation] [8] Rastegari et al. (2016) DoReFa-Net [1-bit weight and 4-bit activation] [4] Zhou et al. (2016) XNOR-Net [binary weight and activation] [8] Rastegari et al. (2016) BNN [binary weight and activation] [7] Courbariaux et al. (2016) weight (bit) activation (bit) Accuracy (Top-1) Accuracy Loss % % 7.5% % 10.1% % 18.1% % 27.1% Ours [3 weight bases, 3 activation bases] % 5.4% Ours [5 weight bases, 5 activation bases] % 4.3% Full-Precision ResNet-34 [full-precision weights and activations] % Ours [5 weight bases, 5 activation bases] % 4.9%

15 Future work 1.Applicability to other tasks, e.g., object detection, parsing, face recognition, speech recognition, etc. 2.Hardware acceleration on mobile 3.Software acceleration on cloud Paper: #261 Poster: Pacific Ballroom #101 Contact: Thank You!

TYPES OF MODEL COMPRESSION. Soham Saha, MS by Research in CSE, CVIT, IIIT Hyderabad

TYPES OF MODEL COMPRESSION. Soham Saha, MS by Research in CSE, CVIT, IIIT Hyderabad TYPES OF MODEL COMPRESSION Soham Saha, MS by Research in CSE, CVIT, IIIT Hyderabad 1. Pruning 2. Quantization 3. Architectural Modifications PRUNING WHY PRUNING? Deep Neural Networks have redundant parameters.