UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics

UC Berkeley Department of Electrical Engineering an Computer Science Department of Statistics EECS 8B / STAT 4B Avance Topics in Statistical Learning Theory Solutions 3 Spring 9 Solution 3. For parti, we will use the fact that a convex loss function φ is classification-calibrate if an only if it is ifferentiable at an φ < ; for part ii we will use the fact that if φ is convex an classification-calibrate, then Ψθ φ H +θ ; for part iii, as shown in the class, the ual program has the form } max α R n λ φ λα i αt yy T K α i where φ s sup t R st φt is the conjugate ual function of φ. a φt log + exp t. i φ t e t + an φ t is classification-calibrate. et >, hence φ is convex. φ e t + < implies φ ii By efinition, H φ η inf α R η log + e α + ηlog + e α }. The infimum is achieve at α η log for η,. Substituting it back gives us η η H φ η η log η ηlog η. + θ Ψθ φ H θlog θ + + θlog + θ. iii st log + exp t is unboune if s > or s <. For s,, st log + exp t achieves the maximum at t log +s s, so we have φ s +slog+s s log s. For s or s, φ s. With the interpretation log, we have φ + slog + s s log s if s, s + otherwise Therefore, the ual program is max α R n λ such that α i λ λα i log λα i + λα i logλα i αt yy T K α i i,...,n, b φt exp t.

i φ t e t an φ t e t >, hence φ is convex. φ < implies φ is classification-calibrate. ii By efinition, H φ η inf α R ηe α + ηe α }. The infimum is achieve at α η log η η for η,. Substituting it back gives us H φ η η η. + θ Ψθ φ H θ. iii st e t is unboune if s >. For s <, st e t achieves the maximum at t log s, so we have φ s s s log s. For s, φ s. By convention log, we have φ s s log s if s s + otherwise Therefore, the ual program is max α R n α i α i logλα i αt yy T K α i such that α i i,...,n Solution 3. a P min i,...,n Xi X > t P min i,...,n Xi X t P t} X i X i,...,n np X X t np } X j X j t j,..., n P X X t n t n t t n t, where the union boun is use in the first inequality.

b The quantity of interest is P min i,...,n X i X /4. From part a, we know that P min i,...,n Xi X /4 P min i,...,n Xi X > /4 n n. To guarantee P min i,...,n X i X /4 /, we must make sure / n/, which is equivalent to n. Therefore, the number of samples must grow exponentially fast as the imension increases. c By Fubini s theorem, ρ,n P n / min i,...,n Xi X > t n t + t + n /. n t t t The lower bouns for ifferent values of an n is as follows: n.5 3.868.3783.5 4.78.337.5 5.8.35.5 6.437.678 Solution 3.3 In the follows, we assume P ǫ B ǫ x i,i,...,mǫ;s,ρ} is one of ǫ-packings with the largest carinality. a Let C ǫ B ǫ y j,j,...,nǫ;s,ρ} be an ǫ-covering with the smallest carinality. By efinition of ǫ-covering, we know that for each x i,i,...,mǫ;s,ρ}, there exists some y j,j,...,nǫ;s,ρ} such that x i B ǫ y j. If Mǫ;S,ρ > Nǫ;S,ρ, then by Pigeonhole principle, there must exist two istinct centers x i,x i for some i,i,...,mǫ;s,ρ} an y j for some j,...,nǫ;s,ρ} such that x i B ǫ y j an x i B ǫ y j. This is equivalent to y j B ǫ x i an y j B ǫ x i, which contraicts the fact that B ǫ x i B ǫ x i. Therefore, Mǫ;S,ρ Nǫ;S,ρ. 3

b We claim C ǫ B ǫ x i,i,...,mǫ;s,ρ} is a ǫ-covering of set S, which implies Nǫ;S,ρ Mǫ;S,ρ. If there exists an s S that cannot be covere by C ǫ, then ρs,x i > ǫ for all i...,mǫ;s,ρ. We can show that B ǫ s is isjoint from all ǫ-balls in P ǫ : suppose there exists i,...,mǫ;s,ρ} such that y B ǫ s B ǫ x i, then ρs,x i ρs,y + ρy,x i ǫ. Therefore P ǫ B ǫ s is also an ǫ-packing, which contraicts that P ǫ is the maximal ǫ-packing. Solution 3.4 In this problem, the efinition of packing number Mǫ;S, is the maximum number of points in S such that the l metric between each pair is at least ǫ it iffers from problem 3.3 by a factor of. a For each point x inclue in the packing, consier an l ball centere at x with raius ǫ/. We will call these l balls as packing balls. Because the istance between each pair in the packing is at least ǫ, the packing balls are mutually isjoint. Therefore the maximum volumes covere by the packing balls are Mǫ;S, cǫ/, where c is a scaling constant. Because each point in the packing must belong to S, all the packing balls are inee containe in an l ball with raius + ǫ/. Hence, Mǫ;S, cǫ/ c + ǫ/ Mǫ;S, ǫ + 4. ǫ b First we prove the concentration boun for chi-square istribute ranom variable. Assume Z χ, we notice that EesZ s / an apply the Chernoff boun, PZ + δ where in the secon inequality we set s e s+δ s / exp s + δ /log s} exp δ log + δ /} exp δ /6, exp δ /6. Next we write x i x j z i z j z i z j z i z j z i δ +δ. Similarly, we have PZ δ z i z j z i + z j z i z j z i z j z j. Notice that z i z j χ, z i χ, z j χ. Applying the concentration boun erive above, we get 4

zi z j P δ zi P + δ P z i δ P zi z j exp δ /6 δ P z i + δ P δ z i + δ exp δ /6 P δ z i P + δ z i P δ z i + δ exp δ /6 δ δ Therefore, with probability at least exp δ /6 + exp δ /6 5exp δ /6, x i x j fδ δ + δ. δ Now we can set appropriate δ to let fδ /4 to guarantee P x i x j /4 5exp δ /6. This boun inicates that the probability that the istance between two uniformly ranom points on the surface of unit l ball is less than /4 is exponentially small. For part ii, we use the following probabilistic argument: ranomly choose M points x,...,x M on the surface of unit l ball, the probability that they form a /4-packing is 4} P x i x j > 4} P x i x j i j i j MM P x x /4 > M 5exp δ /6. So for M /5 expδ /3, we can guarantee the probability above is strictly greater than, which implies there must exist a /4-packing of size M. Hence M/4;S, /5 expδ /3. 5