Nearly-tight VC-dimension bounds for piecewise linear neural networks

Size: px

Start display at page:

Download "Nearly-tight VC-dimension bounds for piecewise linear neural networks"

Alfred McCoy
5 years ago
Views:

1 Nearly-tight VC-dimension bounds for piecewise linear neural networks Nicholas J. A. Harvey, Christopher Liaw, Abbas Mehrabian University of British Columbia COLT 17 July 1, 217

2 Neural networks (w * x + b) (x) = max {x, } (ReLU) x " x # x $ x % Identity x & input layer hidden layer 1 hidden layer 2 output layer

3 VC-dimension Defn: If F is a family of functions then VCdim F k iff X = {x ",, x A } s.t. F achieves all 2 A signings, i.e. {(sign(f(x " )),, sign(f(x A ))): f F} = {,1} A e.g. Hyperplanes in R K have VC-dimension d + 1. Can shatter Impossible to shatter any 4 points

4 VC-dimension Defn: If F is a family of functions then VCdim F k iff X = {x ",, x A } s.t. F achieves all 2 A signings, i.e. {(sign(f(x " )),, sign(f(x A ))): f F} = {,1} A Thm [Fund. thm. of learning]: F is learnable iff VCdim F <. Moreover, sample complexity is Θ(VCdim F ).

5 VC-dimension of NNs W - # parameters/edges L - # layers Known upper bounds: O WL log W + WL # [BMM 98] O W # [GJ 95] Known lower bounds: Ω WL [BMM 98] Ω W log W [M 94]

6 VC-dimension of NNs W - # parameters/edges L - # layers Known upper bounds: O WL log W + WL # [BMM 98] O W # [GJ 95] Known lower bounds: Ω WL [BMM 98] Ω W log W [M 94] Main Thm [HLM 17]: For a ReLU NN w/ W params, L layers Ω WL log(w/l) VCdim O(WL log W) Means there exists NN with this VCdim

7 VC-dimension of NNs W - # parameters/edges L - # layers Known upper bounds: O WL log W + WL # [BMM 98] O W # [GJ 95] Known lower bounds: Ω WL [BMM 98] Ω W log W [M 94] Main Thm [HLM 17]: For a ReLU NN w/ W params, L layers Ω WL log(w/l) VCdim O(WL log W) Means there exists NN with this VCdim Independently proved by Bartlett 17

8 VC-dimension of NNs W - # parameters/edges L - # layers Known upper bounds: O WL log W + WL # [BMM 98] O W # [GJ 95] Known lower bounds: Ω WL [BMM 98] Ω W log W [M 94] Main Thm [HLM 17]: For a ReLU NN w/ W params, L layers Ω WL log(w/l) VCdim O(WL log W) Means there exists NN with this VCdim Independently proved by Bartlett 17 Recently, lots of work on power of depth for expressiveness of NNs [T 16, ES 16, Y 16, LS 16, SS 16, CSS 16, LGMRA 17, D 17]

9 Lower bound (refinement of [BMM 98]) Shattered set: S = e^ ^ [`] {e c } c [d] Encode f w/ weights a^ =. a^," a^,d where a^,c = f(e^, e c )

10 Lower bound (refinement of [BMM 98]) NN block extracts bits from a^ a^," a^,# Shattered set: S = e^ ^ [`] {e c } c [d] Encode f w/ weights a^ =. a^," a^,d where a^,c = f(e^, e c ) Given e^, easy to extract a^ e^ e c 1 a^ a^ a^,c a^,d 1 Select bit j from a^ Rest of NN

11 Lower bound (refinement of [BMM 98]) NN block extracts bits from a^ a^," Shattered set: S = e^ ^ [`] {e c } c [d] Encode f w/ weights a^ =. a^," a^,d where a^,c = f(e^, e c ) Given e^, easy to extract a^ Design bit extractor to extract a^,c [BMM 98] do this 1 bit per layer Ω(WL) More efficient: log(w/l) bits per layer Ω(WL log (W/L)) e^ e c 1 1 a^,# a^ a^ a^,c a^,d Select bit j from a^ Rest of NN

12 Lower bound (refinement of [BMM 98]) NN block extracts bits from a^ a^," Shattered set: S = e^ ^ [`] {e c } c [d] Encode f w/ weights a^ =. a^," a^,d where a^,c = f(e^, e c ) Given e^, easy to extract a^ Design bit extractor to extract a^,c [BMM 98] do this 1 bit per layer Ω(WL) More efficient: log(w/l) bits per layer Ω(WL log (W/L)) e^ e c 1 1 a^,# a^ a^ a^,c a^,d Select bit j from a^ Thm [HLM 17]: Suppose a ReLU NN w/ W params, L layers extracts m th bit of input. Then m O(L log (W/L)). Rest of NN

13 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Number of partition is small, i.e. (Cm) g Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) x "^ x #^ x $^ x %^ x &^

14 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Number of partition is small, i.e. (Cm) g Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) x "^ x #^ x $^ x %^ x &^

15 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Number of partition is small, i.e. (Cm) g Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) x "^ x #^ x $^ x %^ x &^

16 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Size of partition is small, i.e. (Cm) g [Warren 68] Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) x "^ x #^ x $^ x %^ x &^ * C > 1 is some constant

17 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Size of partition is small, i.e. (Cm) g [Warren 68] Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) * C > 1 is some constant x "^ x #^ x $^ x %^ x &^

18 Upper bound (refinement of [BMM 98] for ReLU) Fix a shattered set X = {x ",, x d } Partition parameter space s.t. input to 1 st hidden layer has constant sign can replace with (if < ) or identity (if > )! Size of partition is small, i.e. (Cm) g [Warren 68] Repeat procedure for each layer to get partition of size (CLm) h(gi) In each piece, output is polynomial of deg. L so total # of signings CLm h gi Since X is shattered, need 2 d CLm h gi which implies m = O(WL log W) * C > 1 is some constant x "^ x #^ x $^ x %^ x &^

19 Open questions Can we close the gap for ReLU NNs: Ω(WL log W/L ) vs O(WL log W)? For polynomial NNs, we have Ωk WL VCdim Ol WL #. Can we close this gap? Do poly NNs have higher VCdim than ReLU NNs? What about VC dimensions of CNNs, RNNs, ResNets, etc.?

20 Open questions Can we close the gap for ReLU NNs: Ω(WL log W/L ) vs O(WL log W)? For polynomial NNs, we have Ωk WL VCdim Ol WL #. Can we close this gap? Do poly NNs have higher VCdim than ReLU NNs? What about VC dimensions of CNNs, RNNs, ResNets, etc.? Thank you!

Nearly-tight VC-dimension bounds for piecewise linear neural networks

Proceedings of Machine Learning Research vol 65:1 5, 2017 Nearly-tight VC-dimension bounds for piecewise linear neural networks Nick Harvey Christopher Liaw Abbas Mehrabian Department of Computer Science,