Part (a) (Number of collisions) Recall we showed that if we throw m balls in n bins, the average number of. Use Chebyshev s inequality to show that:

Similar documents
U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

18.1 Introduction and Recap

Lecture 4: Universal Hash Functions/Streaming Cont d

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Generalized Linear Methods

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

Eigenvalues of Random Graphs

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Feature Selection: Part 1

Expected Value and Variance

Notes on Frequency Estimation in Data Streams

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

11 Tail Inequalities Markov s Inequality. Lecture 11: Tail Inequalities [Fa 13]

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

A 2D Bounded Linear Program (H,c) 2D Linear Programming

Calculation of time complexity (3%)

Finding Dense Subgraphs in G(n, 1/2)

Assortment Optimization under MNL

E Tail Inequalities. E.1 Markov s Inequality. Non-Lecture E: Tail Inequalities

Homework Assignment 3 Due in class, Thursday October 15

Kernel Methods and SVMs Extension

APPENDIX A Some Linear Algebra

1 Matrix representations of canonical matrices

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Randomness and Computation

Limited Dependent Variables

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Lecture Randomized Load Balancing strategies and their analysis. Probability concepts include, counting, the union bound, and Chernoff bounds.

Errors for Linear Systems

COS 521: Advanced Algorithms Game Theory and Linear Programming

Lecture 21: Numerical methods for pricing American type derivatives

Min Cut, Fast Cut, Polynomial Identities

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Problem Set 9 Solutions

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

NP-Completeness : Proofs

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Introduction to Algorithms

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Lecture 3 January 31, 2017

6.842 Randomness and Computation February 18, Lecture 4

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Lecture 4. Instructor: Haipeng Luo

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Module 9. Lecture 6. Duality in Assignment Problems

Foundations of Arithmetic

Lecture 10: May 6, 2013

Some modelling aspects for the Matlab implementation of MMA

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

Lecture Notes on Linear Regression

Lecture 3: Probability Distributions

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Vapnik-Chervonenkis theory

Ensemble Methods: Boosting

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Maximal Margin Classifier

Estimation: Part 2. Chapter GREG estimation

Applied Stochastic Processes

Lecture 10 Support Vector Machines II

find (x): given element x, return the canonical element of the set containing x;

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Linear Regression Analysis: Terminology and Notation

Math 217 Fall 2013 Homework 2 Solutions

Computing Correlated Equilibria in Multi-Player Games

P exp(tx) = 1 + t 2k M 2k. k N

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Affine transformations and convexity

Difference Equations

Low correlation tensor decomposition via entropy maximization

MMA and GCMMA two methods for nonlinear optimization

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

Section 8.3 Polar Form of Complex Numbers

Appendix B. Criterion of Riemann-Stieltjes Integrability

1 Convex Optimization

Learning Theory: Lecture Notes

and problem sheet 2

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Note on EM-training of IBM-model 1

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

The Second Anti-Mathima on Game Theory

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Math1110 (Spring 2009) Prelim 3 - Solutions

First Year Examination Department of Statistics, University of Florida

Mean Field / Variational Approximations

Complete subgraphs in multipartite graphs

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0

The Geometry of Logit and Probit

Hidden Markov Models

Chapter 1. Probability

Lecture 14: Bandits with Budget Constraints

Transcription:

Problem 1: Practce wth Chebyshev and Chernoff bounds) When usng concentraton bounds to analyze randomzed algorthms, one often has to approach the problem n dfferent ways dependng on the specfc bound beng used. Typcally, Chebyshev s useful when dealng wth more complcated random varables, and n partcular, when they are parwse ndependent; Chernoff bounds are usually used along wth the unon bound for events whch are easer to analyze. We ll now go back and look at a few of our older examples usng both these technques. Part a) Number of collsons) Recall we showed that f we throw m balls n n bns, the average number of collsons X m,n to be µ m,n = ) m 1 n. Use Chebyshev s nequalty to show that: P X m,n µ m,n c µ m,n 1/c. Next suppose we choose m = n, then µ m,n 1. Use Chernoff bounds plus the unon bound to bound the probablty that no bn has more than 1 ball. Compare ths to the more exact analyss you dd n homework 1. Soluton: Let σ m,n be the standard devaton of X m,n. As we dd before, for every par of balls, j), let Y,j be the ndcator that the two balls collde. Note that snce Y,j are {0, 1} valued r.vs, we have EY,j = EY,j, and thus V ary,j ) EY,j. Moreover, observe that Y,j are parwse ndependent,.e., for any, j), j ), the r.v.s Y,j and Y,j are ndependent n partcular, note that Y,j, Y,k and Y j,k are parwse ndependent however they are not mutually ndependent). Now we have X m,n =,j Y,j, and thus µ m,n =,j EY,j and V arx m,n ) =,j V ary,j),j EY,j = µ m,n. Thus, by Chebyshev s nequalty, we have: P X m,n µ m,n c µ m,n 1/c. To use Chernoff bounds, we nstead consder the number of balls n each bn. Let Z b, be the ndcator that ball fell n bn b these are now mutually ndependent. Further, let B b = Z b, be the number of balls n bn b then we know that EB b = m/n, and moreover usng our standard Chernoff bound, we have: P B b = P B b 1 + n/m 1)) EB b exp n/m 1) m/n) + n/m 1) n m) = exp nm + n) ) ) Usng PX 1 + ɛ)µ exp ɛ )) µ + ɛ ) Fnally, by the unon bound, we have PSome bn has balls n exp n m) nm+n). Now f m = ) n, we get exp = Θ1), and thus the bound on PSome bn has balls s not n m) nm+n) 1

useful as t grows wth n). On the other hand, n the frst homework, we dd a drect calculaton to show that PNo collsons exp EX m,n ), and thus f m = n, we have PNo collsons 1/. Thus, the Chernoff bound does not gve us the tghtest scalng n ths case. Part b) Coupon collector) For n bns, recall that we defned T to be the frst tme when unque bns were flled, and used these random varables to show that T n,.e., the number of balls we need to throw before every bn has at least one ball, satsfes ET n = nh n = Θn log n). Usng the same random varables, show that P T n ET n cet n π. 6c Hn Next, suppose we throw n m = n log n+cn balls usng Chernoff bounds plus the unon bound, choose c such that no bn s empty wth probablty greater than 1 δ. Hnt: Use =1 1/ = π /6 Soluton: Frst, usng the ndependence of T s let s calculate V art n : n 1 V art n = V art = =0 n 1 =0 n 1 1 p = p =0 Now, usng Chebyshev s nequalty, we get: n n n ) = =1 nn ) n P T n ET n cet n = P T n ET n cnh n V art n cnh n ) n =1 1 π n 6. π 6c H n. Next, let m = n log n + cn, and let B b be the number of balls n bn b. We know EB b = log n + c. Then by the Chernoff bound, we have: ) ) EBb log n + c) PB b 0 = PB b 1 1)EB b exp = exp log n+c) Fnally, usng the unon bound, we have PSome bn s empty npb b 0 = explog n ), and we can set c = log n + log1/δ) to get ths less than δ. Here, usng a drect calculaton s better than the Chernoff bound. In partcular, we have: PB b 0 = 1 n) 1 m e m/n = e c /n By the unon bound, we have PSome bn s empty e c, and thus we need c = log1/δ) to ensure ths s less than δ. The man takeaway agan s that Chernoff bounds are fne when probabltes are small and we do not want the tghtest bounds, but may be weak when probabltes are larger and we want tghter bounds.

Problem : The Hoeffdng Extenson) In class, we saw that for a r.v. X Bernoullp ), we have: Ee θx = 1 p + pe θ Pluggng ths nto the Chernoff bound and optmzng over θ, we obtaned a varety of bounds n partcular, for ndependent r.vs {X } wth X = X and µ = EX = p, we showed for any ɛ > 0: ɛ ) µ P X µ ɛµ exp + ɛ We now extend ths to more general bounded r.vs. Part a) Frst, for any θ, argue that the functon fx) = e θx for x 0, 1 s bounded above by the lne jonng 0, 1) and 1, e θ ). Usng ths, fnd constants α, β such that x 0, 1: e θx αx + β Soluton: Recall that fx) = e θx s a convex functon. Now, note that f0) = 1 and f1) = e θ. Therefore, for x 0, 1, t s bounded above by the lne jonng 0, 1) and 1, e θ ), whch means: Part b) e θx e θ 1)x + 1. Next, for any random varable X takng values n 0, 1 such that EX = µ, show that: Ee θx 1 µ + µ e θ. Usng ths, for ndependent r.vs X takng values n 0, 1 wth EX = µ, and defnng X = X, µ = µ, show that: ɛ ) µ P X µ ɛµ exp + ɛ Note: You can drectly use the nequalty for Bernoull r.v.s no need to show the optmzaton.) Soluton: Let s take the expectatons of both sdes of the nequalty n the soluton of the part a): Ee θx Ee θ 1)X + 1 = e θ 1)EX + 1 = e θ 1)µ 1 + 1 = 1 µ + µ e θ. Now, let X = X, µ = µ. Usng exactly the same method s we dd n class for Bernoull r.vs, we get: ɛ ) µ P X µ ɛµ exp + ɛ 3

Part c) Optonal) Next, for any random varable X takng values n a, b such that EX = µ, a smlar boundng technque as above can be used to show: ) E e θx µ ) 1 exp 8 θ b a) Ths s sometmes referred to as Hoeffdng s lemma for the proof, see the wkpeda artcle) Now consder ndependent r.vs X takng values n a, b wth EX = µ, and as before, let X = X, µ = µ. Usng the above nequalty, optmze over θ to show that: ɛ µ ) PX µ) ɛµ exp n =1 b a ) Soluton: As n the standard Chernoff nequalty, we have: PX µ ɛµ = P e θx µ) e θɛµ mn θ>0 = mn θ>0 mn θ>0 exp Ee θx µ) e θɛµ Π Ee θx µ ) e θɛµ θ 8 ) b a ) θɛµ To optmze, we choose θ = 4ɛµ/ b a ), to get: ɛ µ ) PX µ ɛµ exp b a ) Problem 3: A Weaker Samplng Theorem: Adapted from MU Ex 4.9) In class we saw the followng samplng theorem for estmatng the mean of a {0, 1}-valued random varable: In order to get an estmate wthn ±ɛ wth confdence 1 δ, we need n +ɛ ɛ ln δ. A crucal component n ths proof was usng the Chernoff bound for Bernoullp) random varables. Suppose nstead we want to estmate a more general random varable X for example, the average number of hours of TV watched by a random person) we may not be able to use a Chernoff bound f we do not know the moment generatng functon We now show how to to get a smlar samplng theorem whch only uses knowledge of the mean and varance of ax. We want to estmate a r.v. X wth mean EX and varance V arx, gven..d samples X 1, X,.... Let r = V arx/ex ) we now show that we can estmate t up to accuracy ±ɛex and confdence 1 δ usng O ɛ ln 1 r δ samples. 4

Part a) Gven n samples X 1,..., X n, suppose we use the estmator X = n =1 X ) /n. Show that n = Or /ɛ δ) s suffcent to ensure: P X EX ɛex δ Soluton: By lnearty of expectaton, we have that E X = n =1 EX /n = EX. Moreover, snce X s are..d., we have that V ar X = n =1 V arx )/n = V arx)/n = r EX /n. Now, usng Chebyshev s nequalty we get: P X V ar X) EX ɛex ɛ EX = r nɛ To ensure P X EX ɛex δ, we can choose n such that r δ. Usng n = Or /ɛ δ) nɛ samples s thus suffcent. Part b) We say an estmator s a weak estmator f t satsfes that P X EX ɛex 1/4 usng part a), show, that we need Or /ɛ ) samples to obtan a weak estmator. Now suppose we are gven m weak estmates X 1, X,..., X m, and we defne a new estmator X to be the medan of these weak estmates. Show that usng m = Oln1/δ) weak estmates gves us an estmate X that satsfes: P X EX ɛex δ What could go wrong f we used the mean of X 1, X,..., X m nstead of the medan? NOTE: I got the defnton of weak estmator nverted n the Homework - ths s the correct defnton. Essentally, we want the probablty of samples beng close to the mean to be > 1/. Soluton: If we take δ = 1 4 n part a), we see that X s a weak estmator f we use n = Or /ɛ ) samples. Now we want to show that we need Olog 1/δ) such weak estmators to ensure that the medan of these estmators s wthn 1 ± ɛ)ex. Let us ntroduce new r.vs Y s.t. Y = 1, f X EX ɛex, and Y = 0, f X EX ɛex. Y = m = Y s the number of weak estmates for whch P X EX ɛex δ; to ensure that the medan s also wthn 1 ± ɛ)ex, we need Y > m/. In other words, we have: P X EX ɛex = P Y m/. 5

Moreover, PY = 0 3/4, and hence EY 3m/4. Now for ndependent random varables Z Bernoullp ), wth Z = n =1 Z, we saw the followng Chernoff Bound: PZ 1 ɛ)ez exp ɛ ) EZ Usng ths, we can wrte: P Y m/ = P Y 1 1/3)3m/4 PY 1 1/3)EY e 1/9)EY e m/18. Now, f we take m 18 ln1/δ) = Olog 1/δ), we get P X EX ɛex δ. Note though that we only needed to count the weak estmators that fell outsde 1 ± ɛ)ex we dd not need to assume they are bounded, or have bounded varance, etc. If nstead we had used the mean of the weak estmators, we could have a problem f these bad estmates happened to take very large values. Usng the medan made our estmate robust to such outlers. Problem 4: Randomzed Set-Cover) In ths problem, we ll look at randomzed roundng, whch s a very powerful technque for solvng large-scale combnatoral optmzaton problems. The man dea s that gven a problem whch can be wrtten as an optmzaton problem wth nteger constrants, we can sometmes solve the relaxed problem wth non-nteger constrants, and then round the solutons to get a good assgnment. We wll hghlght ths technque for the Mnmum Set-Cover problem. We are gven a collecton of m subsets {S 1, S,..., S m } whch are subsets of some large set U of n elements, such that S = U. The Mnmum Set-Cover problem s that of selectng the smallest number of sets C from the collecton {S 1, S,..., S m } such that they cover U,.e., such that each element n U les n at least one of the sets n C. Part a) Argue that the mnmum-set cover problem s equvalent to the followng nteger program: Mnmze x x subject to x 1, e U e S x {0, 1}, {1,,..., m} Let the soluton to ths problem,.e., the mnmum set-cover, be denoted OP T. Next, argue that f we solve the same problem, but now change the last constrant to x 0, 1 for all, then the resultng soluton OP T LP of ths relaxed problem obeys OP T LP OP T. Note that the relaxed problem s an LP and hence can be solved effcently. 6

Soluton: For each set S, we assocate a varable x {0, 1} that ndcates of we want to choose S or not. We can then wrte the solutons for Mnmum Set-Cover problem as a vector x {0, 1} m. The objectve s clearly to mnmze the sum of these varables moreover, snce the sets we select must cover each element n U, we need one constrant to ensure ths for each element. If we replace each constrant x {0, 1} wth x 0, 1 for all, then the resultng problem can be solved n a polynomal tme as t s a Lnear Program LP). However, by replacng nteger constrants wth contnuous domans, we have expanded the set of feasble solutons thus, the resultng mnmzaton problem must gve a smaller value as all feasble nteger solutons are wthn our new feasble regon), and hence we have that OP T LP OP T. Part b) Gven a soluton z to the relaxed LP, we now round the values to obtan a feasble soluton for the orgnal mnmum set-cover problem. For each set S, we generate k = c log n..d Bernoullz ) random varables X,1, X,,..., X,k f any of them s 1, then we set x = 1,.e., we add S to our cover C. Prove that the resultng set-cover obeys E C c log n OP T. Soluton: Frst, from the defnton of the roundng process, for any j {1,,..., k}, we have: EX,j = PX,j = 1 = z = OP T LP Therefore, by lnearty of expectaton, we have: E C clog n j=1 EX,j = c log n OP T LP c log n OP T. Part c) Fnally, choose c to ensure that the probablty that the resultng set-cover C does not cover any element e U s less than 1/n. Soluton: For any element e U, and for any j {1,,..., c log n}, let Y e,j = e S X,j, and let Y e = c log n j=1 Y e,j. Note that for the sets S contanng element e, f any of the X,j = 1, then e s covered n other words, Pe s covered = PY e 1. Moreover, EY e,j = e S EX,j = e S z. Let k e = { e S } ; snce e s fractonally covered n the LP, we have z 1 + + z ke 1, and thus EY e c log n. Now usng our basc Chernoff bound, we have: PY e 0 = P Y e 1 1)EY e e EYe c log n e = 1 n c/ Thus Pe s not covered n c/. Moreover, by the unon bound, we have that: PAny element e U s not covered n 1 c/. 7

If c 6, we get 1 c/ =, and hence PEvery element e U s covered 1 n. Note: You can actually get a tghter bound of c 3 usng the followng argument: Frst, we need to observe that probablty of e beng covered s mnmzed when z are all equal,.e., z 1 = = z ke = 1/k e ths s not obvous - try to prove t...). Thus we get for any j: P Y,j = 0 1 z 1 ) 1 z ke ) e S 1 1 k e ) ke 1 e. Therefore, each element s covered wth probablty at least 1 1/e f we draw only one sample of each X. Snce we draw c log n samples, the probablty that element e s not covered s: Pe s not covered ) 1 c log n = 1 e n c. Now va the unon bound, we see that f we take c = 3, then we ll have: PAny element e U s not covered 1 n. Problem 5: Papadmtrou s SAT Algorthm) Optonal) The SAT problem Wkpeda entry), and more generally, the boolean satsfablty SAT) problem Wkpeda entry) are one of the cornerstones of theoretcal algorthms, and also a very useful modelng tool for a varety of optmzaton problems. In the general SAT problem, we want to fnd a satsfyng assgnment for a gven a Boolean expresson n n Boolean.e., {0, 1}, or FALSE/TRUE) varables {X 1, X,..., X n } typcally nvolvng conjunctons.e., logcal AND, denoted as ), dsjunctons.e., logcal OR, denoted as ) and negatons logcal NOT; typcally X denotes the negaton of a varable X). In SAT, the expresson s restrcted to beng a conjuncton AND) of several clauses, where each clause s the dsjuncton OR) of two lterals ether a varable or ts negaton). For example, the expresson X 1 X ) X 1 X 3 ) X X 3 ) s a SAT formula, whch has several satsfyng assgnments ncludng X 1 = 1, X = 1, X 3 = 1). Note that n order to fnd a satsfyng assgnment, we need to set the varables such that each clause n the formula has at least one lteral whch s TRUE. Although the general SAT problem s known to be NP-complete, SAT can be solved n polynomal tme we wll now see a smple randomzed algorthm that demonstrates ths fact: Papapdmtrou s SAT Algorthm): Gven a CNF formula F nvolvng n Boolean varables, and an arbtrary assgnment τ, we check f τ satsfes F. If not, we pck an arbtrary unsatsfed clause, pck one of ts lterals unformly at random, and flp t to get a new assgnment τ. We then repeat ths untl we fnd a satsfyng assgnment. Part a) Assume that F has a unque satsfyng assgnment τ, and for any assgnment τ, let Nτ) be the number of lterals n τ whch agree.e., have the same value) as the correspondng lteral n 8

τ. Argue that each tme we execute an teraton of Papadmtrou s SAT algorthm wth nput assgnment τ, the new assgnment τ satsfes: { Nτ Nτ) + 1 wth probablty at least 1 ) = Nτ) 1 wth probablty at most 1 Soluton: Gven any unsatsfed clause, we know that n τ at least one of the lterals s flpped t could be both are flpped). Snce we choose to flp a unform random lteral out of the two, we are guaranteed that we ncrease the agreement between our guess τ and τ by 1 wth probablty at least 1/. Part b) Based on the above, argue that the runnng tme T n of the algorthm s upper bounded by the frst tme { that a symmetrc random walk startng from 0 hts n or n equvalently, T n = arg mn K>0 } K =1 X = n, where X are..d Rademacher random varables). Next, show that ET n = n, and thus, prove that after On 3 ) teratons, Papadmtrou s algorthm termnates wth probablty greater than 1 1/n. Soluton: See ths lecture and the next) from Tm Roughgarden s algorthms course for a beautful exposton of ths proof. One thng you should note: although runnng the algorthm tll On 3 ) steps gves the desred probablty, one can get much better bounds f we nstead stop after On ) steps, and then restart the process. Markov s tells us that after n steps, we fnd the soluton wth probablty at least 1/ now from prevous assgnments, you know that dong Olog n) ndependent trals s suffcent to amplfy the probablty to 1 1/n. However, the analyss works for any startng pont so ths does suggest that On log n) steps of Papadmtrou s algorthm was suffcent... Is ths surprsng? Not really bascally what ths says s that the falure probablty does follow a Chernoff-style exponental decay. We could not show ths drectly as the random varables were not ndependent however, Markov s nequalty stll works, and we can explot that n a clever way to get our result. 9