UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics

Similar documents
15.1 Upper bound via Sudakov minorization

arxiv: v1 [math.co] 15 Sep 2015

REAL ANALYSIS I HOMEWORK 5

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Lecture 10: October 30, 2017

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

The chromatic number of graph powers

LECTURE NOTES ON DVORETZKY S THEOREM

Lecture 10 February 23

Large Triangles in the d-dimensional Unit-Cube (Extended Abstract)

Lecture 2: Correlated Topic Model

Lower bounds on Locality Sensitive Hashing

Function Spaces. 1 Hilbert Spaces

Least-Squares Regression on Sparse Spaces

26.1 Metropolis method

Acute sets in Euclidean spaces

Collaborative Ranking for Local Preferences Supplement

arxiv: v4 [cs.ds] 7 Mar 2014

Lower Bounds for k-distance Approximation

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

An extension of Alexandrov s theorem on second derivatives of convex functions

PDE Notes, Lecture #11

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

1. Aufgabenblatt zur Vorlesung Probability Theory

164 Final Solutions 1

Iterated Point-Line Configurations Grow Doubly-Exponentially

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

Convergence of Random Walks

WUCHEN LI AND STANLEY OSHER

On non-antipodal binary completely regular codes

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

Floating Body, Illumination Body, and Polytopal Approximation

Extreme Values by Resnick

Problem set 3: Solutions Math 207A, Fall where a, b are positive constants, and determine their linearized stability.

3.6. Let s write out the sample space for this random experiment:

Tractability results for weighted Banach spaces of smooth functions

Classification Methods with Reject Option Based on Convex Risk Minimization

Monotonicity for excited random walk in high dimensions

Final Exam Study Guide and Practice Problems Solutions

Lecture 6: September 19

Lecture 3 January 28

7.1 Support Vector Machine

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

On the Cauchy Problem for Von Neumann-Landau Wave Equation

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim

Lecture 7: Interchange of integration and limit

Witten s Proof of Morse Inequalities

Further Differentiation and Applications

Lecture 2: August 31

6 General properties of an autonomous system of two first order ODE

Topic 7: Convergence of Random Variables

arxiv: v2 [math.pr] 27 Nov 2018

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

1 Lecture 20: Implicit differentiation

On the enumeration of partitions with summands in arithmetic progression

Range and speed of rotor walks on trees

On combinatorial approaches to compressed sensing

CHAPTER 1 : DIFFERENTIABLE MANIFOLDS. 1.1 The definition of a differentiable manifold

Error Floors in LDPC Codes: Fast Simulation, Bounds and Hardware Emulation

10-704: Information Processing and Learning Fall Lecture 21: Nov 14. sup

Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners

Problem set 2: Solutions Math 207B, Winter 2016

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

Transformations of Random Variables

Proof of SPNs as Mixture of Trees

Jointly continuous distributions and the multivariate Normal

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Multi-agent Systems Reaching Optimal Consensus with Time-varying Communication Graphs

Self-normalized Martingale Tail Inequality

Consistency and asymptotic normality

Relative Loss Bounds for Multidimensional Regression Problems

Node Density and Delay in Large-Scale Wireless Networks with Unreliable Links

A Course in Machine Learning

Fall 2016: Calculus I Final

Introduction to Machine Learning

Analysis IV, Assignment 4

Exam 2 Review Solutions

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.

INDEPENDENT COMPONENT ANALYSIS VIA

Statics. There are four fundamental quantities which occur in mechanics:

Step 1. Analytic Properties of the Riemann zeta function [2 lectures]

Math 242: Principles of Analysis Fall 2016 Homework 7 Part B Solutions

Bayesian Estimation of the Entropy of the Multivariate Gaussian

Lecture 5: Importance sampling and Hamilton-Jacobi equations

On colour-blind distinguishing colour pallets in regular graphs

Monte Carlo Methods with Reduced Error

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

MAINTAINING LIMITED-RANGE CONNECTIVITY AMONG SECOND-ORDER AGENTS

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]

The Sokhotski-Plemelj Formula

Section 7.2. The Calculus of Complex Functions

u!i = a T u = 0. Then S satisfies

HITTING TIMES FOR RANDOM WALKS WITH RESTARTS

LECTURE 15: COMPLETENESS AND CONVEXITY

In the usual geometric derivation of Bragg s Law one assumes that crystalline

. ISSN (print), (online) International Journal of Nonlinear Science Vol.6(2008) No.3,pp

SPECIALIST MATHEMATICS

A stability result for mean width of L p -centroid bodies.

Transcription:

UC Berkeley Department of Electrical Engineering an Computer Science Department of Statistics EECS 8B / STAT 4B Avance Topics in Statistical Learning Theory Solutions 3 Spring 9 Solution 3. For parti, we will use the fact that a convex loss function φ is classification-calibrate if an only if it is ifferentiable at an φ < ; for part ii we will use the fact that if φ is convex an classification-calibrate, then Ψθ φ H +θ ; for part iii, as shown in the class, the ual program has the form } max α R n λ φ λα i αt yy T K α i where φ s sup t R st φt is the conjugate ual function of φ. a φt log + exp t. i φ t e t + an φ t is classification-calibrate. et >, hence φ is convex. φ e t + < implies φ ii By efinition, H φ η inf α R η log + e α + ηlog + e α }. The infimum is achieve at α η log for η,. Substituting it back gives us η η H φ η η log η ηlog η. + θ Ψθ φ H θlog θ + + θlog + θ. iii st log + exp t is unboune if s > or s <. For s,, st log + exp t achieves the maximum at t log +s s, so we have φ s +slog+s s log s. For s or s, φ s. With the interpretation log, we have φ + slog + s s log s if s, s + otherwise Therefore, the ual program is max α R n λ such that α i λ λα i log λα i + λα i logλα i αt yy T K α i i,...,n, b φt exp t.

i φ t e t an φ t e t >, hence φ is convex. φ < implies φ is classification-calibrate. ii By efinition, H φ η inf α R ηe α + ηe α }. The infimum is achieve at α η log η η for η,. Substituting it back gives us H φ η η η. + θ Ψθ φ H θ. iii st e t is unboune if s >. For s <, st e t achieves the maximum at t log s, so we have φ s s s log s. For s, φ s. By convention log, we have φ s s log s if s s + otherwise Therefore, the ual program is max α R n α i α i logλα i αt yy T K α i such that α i i,...,n Solution 3. a P min i,...,n Xi X > t P min i,...,n Xi X t P t} X i X i,...,n np X X t np } X j X j t j,..., n P X X t n t n t t n t, where the union boun is use in the first inequality.

b The quantity of interest is P min i,...,n X i X /4. From part a, we know that P min i,...,n Xi X /4 P min i,...,n Xi X > /4 n n. To guarantee P min i,...,n X i X /4 /, we must make sure / n/, which is equivalent to n. Therefore, the number of samples must grow exponentially fast as the imension increases. c By Fubini s theorem, ρ,n P n / min i,...,n Xi X > t n t + t + n /. n t t t The lower bouns for ifferent values of an n is as follows: n.5 3.868.3783.5 4.78.337.5 5.8.35.5 6.437.678 Solution 3.3 In the follows, we assume P ǫ B ǫ x i,i,...,mǫ;s,ρ} is one of ǫ-packings with the largest carinality. a Let C ǫ B ǫ y j,j,...,nǫ;s,ρ} be an ǫ-covering with the smallest carinality. By efinition of ǫ-covering, we know that for each x i,i,...,mǫ;s,ρ}, there exists some y j,j,...,nǫ;s,ρ} such that x i B ǫ y j. If Mǫ;S,ρ > Nǫ;S,ρ, then by Pigeonhole principle, there must exist two istinct centers x i,x i for some i,i,...,mǫ;s,ρ} an y j for some j,...,nǫ;s,ρ} such that x i B ǫ y j an x i B ǫ y j. This is equivalent to y j B ǫ x i an y j B ǫ x i, which contraicts the fact that B ǫ x i B ǫ x i. Therefore, Mǫ;S,ρ Nǫ;S,ρ. 3

b We claim C ǫ B ǫ x i,i,...,mǫ;s,ρ} is a ǫ-covering of set S, which implies Nǫ;S,ρ Mǫ;S,ρ. If there exists an s S that cannot be covere by C ǫ, then ρs,x i > ǫ for all i...,mǫ;s,ρ. We can show that B ǫ s is isjoint from all ǫ-balls in P ǫ : suppose there exists i,...,mǫ;s,ρ} such that y B ǫ s B ǫ x i, then ρs,x i ρs,y + ρy,x i ǫ. Therefore P ǫ B ǫ s is also an ǫ-packing, which contraicts that P ǫ is the maximal ǫ-packing. Solution 3.4 In this problem, the efinition of packing number Mǫ;S, is the maximum number of points in S such that the l metric between each pair is at least ǫ it iffers from problem 3.3 by a factor of. a For each point x inclue in the packing, consier an l ball centere at x with raius ǫ/. We will call these l balls as packing balls. Because the istance between each pair in the packing is at least ǫ, the packing balls are mutually isjoint. Therefore the maximum volumes covere by the packing balls are Mǫ;S, cǫ/, where c is a scaling constant. Because each point in the packing must belong to S, all the packing balls are inee containe in an l ball with raius + ǫ/. Hence, Mǫ;S, cǫ/ c + ǫ/ Mǫ;S, ǫ + 4. ǫ b First we prove the concentration boun for chi-square istribute ranom variable. Assume Z χ, we notice that EesZ s / an apply the Chernoff boun, PZ + δ where in the secon inequality we set s e s+δ s / exp s + δ /log s} exp δ log + δ /} exp δ /6, exp δ /6. Next we write x i x j z i z j z i z j z i z j z i δ +δ. Similarly, we have PZ δ z i z j z i + z j z i z j z i z j z j. Notice that z i z j χ, z i χ, z j χ. Applying the concentration boun erive above, we get 4

zi z j P δ zi P + δ P z i δ P zi z j exp δ /6 δ P z i + δ P δ z i + δ exp δ /6 P δ z i P + δ z i P δ z i + δ exp δ /6 δ δ Therefore, with probability at least exp δ /6 + exp δ /6 5exp δ /6, x i x j fδ δ + δ. δ Now we can set appropriate δ to let fδ /4 to guarantee P x i x j /4 5exp δ /6. This boun inicates that the probability that the istance between two uniformly ranom points on the surface of unit l ball is less than /4 is exponentially small. For part ii, we use the following probabilistic argument: ranomly choose M points x,...,x M on the surface of unit l ball, the probability that they form a /4-packing is 4} P x i x j > 4} P x i x j i j i j MM P x x /4 > M 5exp δ /6. So for M /5 expδ /3, we can guarantee the probability above is strictly greater than, which implies there must exist a /4-packing of size M. Hence M/4;S, /5 expδ /3. 5