Asymptotic behavior of Support Vector Machine for spiked population model

Size: px

Start display at page:

Download "Asymptotic behavior of Support Vector Machine for spiked population model"

Giles Wilkerson
6 years ago
Views:

1 Asymptotic behavior of Support Vector Machine for spiked population model Supplementary Material This note contains technical proofs of Propositions, and 3 Proof of proposition Denote X [x,, x M ] T, y y,, y M T, then the joint probability density function of X, y is px, y M i px i, y i In statistical mechanics, we start from the partition function Z β X, y dw exp β w M i yi x T i w Θ, S β is the inverse temperature and the SVM constraints are enforced strictly using Heaviside step function Θ defined as Θt if t 0 and Θt 0 if t<0 At the low temperature limit, ie β, Z β X, y in equation S is dominated by the solution vector w of the hard SVM algorithm 3 The properties of the SVM can be computed from the zero temperature average free energy F lim lim β β log Z βx, y X,y, S the bracket stands for the average over all training sets In order to evaluate the integration of a log function, we make use of the replica method based on the identity Z n log Z lim n 0 n lim n 0 n log Zn, S3 and rewrite S as F lim lim β β lim n 0 n log Ξ nβ, S4

2 Ξ n β {Z β X, y} n X,y M {Z β X, y} n px i, y i dx i dy i i S5 Equation S4 can be derived by using the fact that lim n 0 Ξ n β and exchanging the order of the averaging and the differentiation with respect to n In the replica method, we will first evaluate Ξ n β for integer n and then apply to real n and take the limit of n 0 For integer n, in order to represent {Z β X, y} n in the integrand of S5, we use the identity n fxpdx fx fx n pdx pdx n, and obtain {Z β X, y} n [ n dw ν exp { β w ν } M yi x T i w ν Θ ] S6 i we have introduced replicated parameters w ν [w ν,,, w ν, ], ν,, n Exchanging the order of the two limits and n 0 in S4, we have F lim β β lim n 0 lim n log Ξ nβ S7 The integrations over X and y can be performed which leads to Ξ n β n n dw ν exp dw ν exp β β n M w ν n i yi x T i w ν Θ x i,y i } {{ } G n n w ν + M log G n, S8

3 G n n yx T w ν Θ x,y { n x T w ν dx Θ px y p + + n x T w ν Θ x y n } x T w ν Θ px y p Here for the second equation, the two terms inside { } give equal contribution to the integration due to the reflection x x because px y p x y and p + + p By introducing the Fourier integral representation of Θ function x T w ν Θ π dt ν { } dˆt ν exp i t ν xt w ν ˆt ν, we have G n π n { d n t ν d nˆt ν exp i n } t ν xt w ν ˆt ν x y According to Berry-Esseen Central Limit Theorem, in the limit, the joint distribution of x T w /,, x T w n / for fixed w,, w n is a multivariate normal with mean µ T w /,, µ T w n / and covariance matrix Q Q νν, Q νν wν T Σw ν /, Σ is defined in Further define Q νν w T ν w ν / and R ν,m w T ν v m / Since ˆµ v, one has w T ν ˆµ/ R ν, ˆµ µ/µ and µ µ The integration over x T w ν and ˆt gives G n π n/ d n t ν det Q exp { νν t ν µr ν, Q νν t ν µr ν, } S9 We have used a shorthand notation: d n t ν stands for n dt ν Substitute the covariance matrix ex- 3

4 pression into Q νν, we have Q νν σ w ν T w ν + λ m vmw T ν vmw T ν σ Q νν + λ m R ν,m R ν,m m m Therefore the dependence of G n on w ν is explicitly through the order parameters Q νν and R ν,m The integration over w ν are performed in terms of integrations over R ν,m and Q νν, ie dw ν ν ν dw ν dr ν,m δ R ν,m vt mw ν dq νν δ Q νν wt ν w ν ν,m νν We rewrite these delta functions by using the Fourier representations In doing so, constant factors can be applied to the Fourier integration variables, and we choose convenient factors for later calculations After dropping irrelevant prefactors, we get dw ν ν dq νν d ˆQ νν dr ν,m d ˆR ν,m exp νν,m ˆR ν,m ˆQ νν ˆRν,m νν log det ˆQ ˆQνν Q νν + i ν,m ˆR ν,m R ν,m We have used a shorthand notation: dx ν,m stands for K n m dx ν,m X is one of R, ˆR; dq νν and d ˆQ νν stand for νν Q νν and νν ˆQνν respectively To obtain the leading-order contribution, we integrate over the Fourier variables ˆQ νν and ˆR ν,m, and retain only terms in the exponent of the integrand that are extensive in This gives dw ν ν dq νν dr ν,m expt n, T n Q log det R m R T m, S0 m and vector R m R,m,, R n,m T We rewrite S8 in terms of the integrations over R ν,m and 4

5 Q νν Ξ n β { } n dq νν dr ν,m exp β Q νν + T n + α log G n ow we apply steepest descent method to the remaining integrations over Q νν and R ν,m According to Varadhan s proposition Tanaka, 00, only the saddle points of the exponent of the integrand contribute to the integration in the limit of However, looking for saddle-points over all the entire space is in general difficult to perform We assume replica symmetry for saddle-points such that they are invariant under exchange of any two replica indices ν and ν, ν ν Under this symmetry assumption, the space is greatly reduced and the exponent of the integrand can be explicitly evaluated We put Q νν q 0 ν, Q νν q ν, ν, ν ν, R νm R m ν S The matrix Q can be simplified as Q q 0 q I + q Rm T m Further define ˆq 0 σ q 0 + λ m Rm, ˆq σ q + λ m Rm m m Substituting into S9, log G n can be simplified as log G n n Dz log Φt S t µr ˆq z ˆq0 ˆq, 5

6 and Dz dz π exp z denotes the weight of standard normal distribution and Φs s Dz denotes the standard normal cumulative distribution function Similarly, we can simplify S0 as T n n { logq 0 q + q q 0 q K m R m } S3 Substituting S and S3 into S8, and then into S7, we get F q 0 logq 0 q β q K m R m α βq 0 q β Dz log Φt In order for F to be non-trivial and well behaved in the limit β, we introduce the scaled parameter q βq 0 q We then find in the limit β, the free energy becomes F q 0 q K m R m + αˆq zc z c z, S4 q q z c µr σ ˆq, ˆq q + λ m Rm m From S4, we get the saddle-point equations zc q 0 Rm αˆq Dzz c z, m zc R α ˆq µ Dzz c z, σ R m 0 m,, K Denote θ the angle between w and µ, and define ρ cosθ R / q 0, the above equations can be 6

7 simplified as ρ zc α + λ ρ ρ αµ + λ ρ σ zc Dzz c z, Dzz c z S5 S6 Therefore, given α, λ, µ, σ, equations S5 and S6 can be used to solve two unknown parameters ρ and z c Proof of proposition The proof of Proposition is similar to the proof for Proposition We start from the partition function Z β X, y dwd M ξ i exp { βw T w + τ } M M yi x T i w ξ i Θ + ξ i, S7 i i τ is SVM tuning parameter At the low temperature limit, ie β, Z β X, y in equation S7 is dominated by the solution vector w of the soft-margin SVM algorithm 4 The replicated partition function is {Z β X, y} n [ n { dw ν d M ξ ν,i exp β w ν + τ 0 } M M yi x T i w ν ξ ν,i Θ + ξ ν,i ] i i Integration over X and y gives {Z β X, y} n X,y M i n dw ν exp β n w ν n n n yi x T i w dξ ν,i exp βτ ξ ν,i Θ ν + ξ ν,i 0 x i,y i }{{} n dw ν exp β G n n w ν + M log G n, 7

8 G n 0 0 d n ξ ν exp d n ξ ν exp βτ βτ n n ξ ν n n ξ ν yx T w ν Θ + ξ ν x,y x T w ν Θ + ξ ν x y Under replica symmetry assumption S, in the limit n 0 through some lengthy but standard calculation one obtains log G n β nβ ˆq zc qˆτ zc Dz{qˆτ ˆτz c z} Dz z c z, S8 q ˆτ στ ˆq, q βq 0 q Substituting S8 and S3 into S8, and then into S7, we find F q 0 q m R m q q 0 zc qˆτ Dz{qˆτ ˆτz c z} zc Dz z c z, q z c µr σ ˆq, ˆτ στ ˆq 8

9 We then find the saddle-point equations q 0 K m R m αq ˆτ zc qˆτ ˆq q αˆτ R q ˆq αˆτµ σ zc qˆτ zc qˆτ zc Dz α Dzz α q zc zc Dzz c z 0 z c zz 0 Dz αµ Dzz c z 0 qσ R m 0 m,, K, which can be simplified as ρ zc qˆτ + λ ρ αqˆτ q αˆτ ρ q + λ ρ αˆτµ σ zc qˆτ zc qˆτ zc Dz α Dzz α q Dz αµ qσ zc zc Dzz c z 0, S9 z c zz 0, S0 Dzz c z 0, S z c / q 0 µρ σ + λ ρ, ˆτ στ q0 + λ ρ Therefore, given α, λ, µ, σ, τ, equations S9, S0, S can be used to solve 3 unknown parameters ρ, q 0, q Proof of proposition 3 We start from the partition function Z β X, y dw exp [ β { M w M + i x T M i w + r + i x T i w r }] S 9

10 At the low temperature limit, ie β, Z β X, y in equation S is dominated by the vector µ c x + x, x + and x represent the sample means for Class + and Class respectively Following the same procedure as the proofs for Propositions and, we obtain the free energy as F q 0 R x + αq 0 ασ + x αrµ r + r The saddle-point equations are x α, R µ, q 0 4µ + σ αr + r Therefore the squared distance between two class means converges to the value given by 3 References Tanaka, T 00 A statistical-mechanics approach to large-system analysis of cdma multiuser detectors Information Theory, IEEE Transactions on 48,

Asymptotic behavior of Support Vector Machine for spiked population model

Journal of Machine Learning Research 18 (2017) 1-21 Submitted 11/16; Revised 3/17; Published 4/17 Asymptotic behavior of Support Vector Machine for spiked population model Department of Epidemiology and