Lecture 4: April 10, 2013

Similar documents
Lecture 5: April 17, 2013

Lecture 14: March 1. approaches a standard normal distribution N(0, 1). Thus, as n, for any fixed β > 0 we have. e t2 2 dt

Problem Set 2 Solutions

Lecture 2: Concentration Bounds

Lecture 7: October 18, 2017

Math 216A Notes, Week 5

Lecture 12: November 13, 2018

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

An Introduction to Randomized Algorithms

Lecture 3: August 31

Lecture 9: Expanders Part 2, Extractors

HOMEWORK 2 SOLUTIONS

Lecture 2: April 3, 2013

Application to Random Graphs

On Random Line Segments in the Unit Square

Learning Theory: Lecture Notes

Topic 9: Sampling Distributions of Estimators

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

Topic 9: Sampling Distributions of Estimators

Lecture 2. The Lovász Local Lemma

Lesson 10: Limits and Continuity

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Topic 9: Sampling Distributions of Estimators

Empirical Process Theory and Oracle Inequalities

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Basics of Probability Theory (for Theory of Computation courses)

Problem Set 4 Due Oct, 12

Infinite Sequences and Series

Rademacher Complexity

6.895 Essential Coding Theory October 20, Lecture 11. This lecture is focused in comparisons of the following properties/parameters of a code:

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

This section is optional.

Advanced Stochastic Processes.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Law of the sum of Bernoulli random variables

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

The Maximum-Likelihood Decoding Performance of Error-Correcting Codes

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Discrete Mathematics and Probability Theory Spring 2012 Alistair Sinclair Note 15

Lecture 2: Monte Carlo Simulation

Lecture 15: Learning Theory: Concentration Inequalities

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Math 61CM - Solutions to homework 3

Math 155 (Lecture 3)

Stochastic Simulation

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand Final Solutions

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Lecture 19: Convergence

Lecture 9: Hierarchy Theorems

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

1 Review and Overview

7.1 Convergence of sequences of random variables

Seunghee Ye Ma 8: Week 5 Oct 28

Lecture 4: Unique-SAT, Parity-SAT, and Approximate Counting

Axioms of Measure Theory

Design and Analysis of Algorithms

Agnostic Learning and Concentration Inequalities

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

Lecture 4 February 16, 2016

STAT Homework 1 - Solutions

Lecture 16: Monotone Formula Lower Bounds via Graph Entropy. 2 Monotone Formula Lower Bounds via Graph Entropy

Here, e(a, B) is defined as the number of edges between A and B in the n dimensional boolean hypercube.

CS 330 Discussion - Probability

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

7.1 Convergence of sequences of random variables

Sequences and Series of Functions

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

The Boolean Ring of Intervals

Sequences I. Chapter Introduction

Hashing and Amortization

10-701/ Machine Learning Mid-term Exam Solution

MA131 - Analysis 1. Workbook 2 Sequences I

Output Analysis (2, Chapters 10 &11 Law)

Arkansas Tech University MATH 2924: Calculus II Dr. Marcel B. Finan

Machine Learning Theory (CS 6783)

Lecture 12: September 27

Lecture Notes for Analysis Class

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS

Solutions to selected exercise of Randomized Algorithms

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

5. INEQUALITIES, LIMIT THEOREMS AND GEOMETRIC PROBABILITY

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Maximum Likelihood Estimation and Complexity Regularization

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Frequentist Inference


ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Lecture 2 February 8, 2016

4.1 Data processing inequality

1. ARITHMETIC OPERATIONS IN OBSERVER'S MATHEMATICS

Optimally Sparse SVMs

PRACTICE FINAL/STUDY GUIDE SOLUTIONS

Vector Quantization: a Limiting Case of EM

Transcription:

TTIC/CMSC 1150 Mathematical Toolkit Sprig 01 Madhur Tulsiai Lecture 4: April 10, 01 Scribe: Haris Agelidakis 1 Chebyshev s Iequality recap I the previous lecture, we used Chebyshev s iequality to get a boud o the probability that a radom variable X deviates from its ected value µ to some extet. More precisely, if we deote the variace of X with σ ad assume it s fiite ad o-zero, the Chebyshev s iequality is the followig: Var X] P X µ kσ] k σ 1 k. 1 This is the so-called Secod Momet Method, sice it uses the secod momet, i.e. the variace of X. As a applicatio of the above iequality, we preseted some thresholds i the Erdős-Réyi G,p model. We exted that example here to show somethig that at first looks like a paradox. So, we cosider a radom graph G G,p ad let Z 1 be the umber of copies of K 4 i G, ad Z be the umber of copies of the followig 5-ode graph G 0 i G: v 1 G 0 v v 4 v 5 v Observe that the subgraph iduced by the odes v 1, v, v, v 4 is exactly K 4. Calculatig the ectatios of Z 1 ad Z, we observe the followig: 1. p / E Z 1 ] p / E Z 1 ] 0. p 5/7 E Z ] p 5/7 E Z ] 0 We ow cosider the case 5/7 p /. I this case, E Z 1 ] 0 ad E Z ]. But, K 4 is a subgraph of G 0, so i every appearace of G 0 we obviously have a appearace of K 4. So, did we calculate somethig wrog? Observig thigs more carefully, we ca see that, give a fixed K 4, there ca be may copies of G 0 made by this copy of K 4, simply by coectig each vertex of K 4 with as may differet vertices as we wat. For each such vertex, we get a distict copy of G 0. 1

This idea ca be made precise so as to formally lai the above paradox, but we will skip the details. I today s lecture, we will cosider eve sharper cocetratio bouds, which come i the form of the Cheroff/Hoeffdig bouds. But before presetig the Cheroff bouds, we will also state the very useful Jese s iequality, as is used i the cotext of probability theory. Cosider the defiitio of the variace of a radom variable X. We have Var X] E X µ ]. We obviously have Var X] 0. Thus: E X µ ] E X ] E X] 0 E X ] E X]. The above iequality is just a special case of the well-kow Jese s iequality. Jese s Iequality At first, we eed to defie what a covex real-valued fuctio is. Defiitio.1 Covex fuctio Let f : R R. We say that f is covex if for ay λ 0, 1] we have that f λx 1 + 1 λx λfx1 + 1 λfx. The above property simply meas that if we cosider ay two poits x 1, fx 1 ad x, fx, the lie segmet coectig these two poits o the plae lies above the graph of fuctio f. We ca ow state, without proof, Jese s iequality. Theorem. Jese s Iequality Let f : R R be a covex fuctio. The for ay radom variable X, we have that E fx] fe X]. We ca ow see that is just a special case of Jese s iequality, where we plug-i the covex fuctio fx x. Cheroff/Hoeffdig Bouds We are ow ready to get some sharper cocetratio bouds. We start by cosiderig idepedet boolea radom variables X 1,..., X, each havig value 1 with probability p i. Let Z i1 X i. We set µ E Z] i1 E X i] i1 p i, ad p µ i1 p i. So, we ow wat to get a boud o P Z 1 + δµ]. At first, we use the fact that e x is strictly icreasig, ad so P Z 1 + δµ] P e tz e t1+δµ], t > 0 Markov E e tz] e t1+δµ.

We ow have: E e tz] ] E e tx 1+...X E idep E e tx ] i i1 1 + pi e t 1 ]. i1 i1 e tx i ] pi e t + 1 p i ] i1 At this poit, we utilize the simple but very useful iequality: x R, 1 + x e x. Sice all the quatities i the previous calculatio are o-egative, we ca plug the above iequality i the previous calculatio ad we get: E e tz] i1 e t 1p i e t 1µ 4 From ad 4 we get P Z 1 + δµ] e t 1µ t1 + δµ. 5 We ow wat to miimize the right had-side of the above iequality, with respect to t. Settig its derivative to zero, we get Usig this value for t i 5, we get e t µ 1 + δµ 0 t l1 + δ. P Z 1 + δµ] e t 1µ t1 + δµ e δµ 1 + δ 1+δµ e δ µ 1 + δ 1+δ. Similarly, we ca get that e δ µ P Z 1 δµ] 1 δ 1 δ.

Note that P Z 1 δµ] P e tz e t1 δµ]. But we would like some simpler ressio for the boud. It ca be easily proved that e δ 1 + δ 1+δ µ e δ µ/, 0 < δ < 1, ad so we fially get: Similarly: From 6 ad 7 we get P Z 1 + δµ] e δ µ/, for 0 < δ < 1. 6 P Z 1 δµ] e δ µ/, for 0 < δ < 1. 7 P Z µ δµ] e δ µ/, for 0 < δ < 1. 8 This last iequality is the versio of the Cheroff/Hoeffdig bouds that we are goig to use the most i the followig. We ow move to some applicatios. 4 Coi Tosses We will ow compare the above boud with what we ca get from Chebyshev s iequality. Let s assume that X 1,..., X are idepedet coi tosses, with P X i 1] 1. We wat to get a boud o the value of Z i1 X i. Usig Chebyshev s iequality as stated i 1, we get that P Z µ δµ] Var Z] δ µ. Ad sice i this particular case we have that Var Z] /4 ad µ /, we get that P Z µ δµ] 1 δ. The above boud is oly iversely polyomial i, while the oe i 8 is oetially small i. This fact will prove very useful whe we wat to use a uio boud i a large collectio of evet, as we will see i the applicatio that follows. 5 Max-Cut i the Erdős-Réyi Model Cosider a graph G G,p. Let G V, E, with V ad E Z be the umber of edges i G. We wat to prove that the size of a Max-Cut of G is with high probability roughly equal to E /, which is equal to the ected size of a radom cut. At first, we ca get a cocetratio boud o the umber of edges i the graph, which will help us i the proof of the above statemet. We have that Z {u,v}:u,v V,u v X {u,v}, where X {u,v} is a radom variable idicatig if there exists a edge betwee u ad v. So we get E Z] 4 p.

Usig Cheroff bouds, we get that ] P Z E Z] ε p ε p So, if ε p 1, the w.h.p the umber of edges i G is some umber i the iterval 1 ε p, 1 + ε p. To simplify the calculatios that follow, we ca set, as we ca always adjust ε to match the exact values. We also set m p, which is roughly equal to E Z]. So, i order to prove our statemet about the size of a max-cut, it is sufficiet to show that the umber of edges crossig ay cut belogs i the iterval 1 δ m, 1 + δ m. I order to do so, we fix a cut S, S, with S k ad 1 k. Let Z S be the umber of edges crossig S, S. Observe that Z S ca be writte as a sum of idicator 0 1 variables, ad so we ca use the Cheroff Bouds that we have already proved. We have that µ S E Z S ] pk k, which meas that the ected size of the cut oly depeds o the size of S ad ot o the specific elemets of S, which makes sese if we thik of how we geerate our graph. A useful fact here is that k k 4, ad so µ S p 4. So, we have P Z S 1 + δm/] P Z S 1 + δµ S ] δ µ S δ pk k δ sice 6 pk k Takig a Uio Boud ow over all possible cuts, we have P Max-Cut 1 + δ m ] P Z S 1 + δ m ] S,S / k1 S,S: S k / k1 k1 k. P Z S 1 + δ m ] δ 6 pk / k δ 6 pk / k l δ 6 pk. k1 5

Suppose ow that δ p 6 l p 1 l. Usig this fact i the iequality above we get δ P Max-Cut 1 + δ m ] / 1 k l O. The above iequality shows that if the probability p is sufficietly large to be more precise, if it such that the resultig graph has Ω l edges i ectatio, the the size of a Max-Cut i this graph is w.h.p. very close to E. Observe that we caot get this guaratee by usig Chebyshev s iequality, as the umber of evets we are applyig the uio boud o is large, ad the boud we ca get from Chebyshev is oly O 1 for a sigle cut. k1 6 Radomized Routig i Networks We will ow cosider a differet applicatio. Let s assume we have a etwork of odes, placed o the vertices of a -dimesioal hypercube. I other words, we have a graph with ode set V {0, 1}, ad edge set E {x, y : d H x, y 1}, where d H x, y is the Hammig distace of two -bit strigs, i.e. the umber of bits i which x ad y differ. We are give a permutatio π : {0, 1} {0, 1}, which simply traslates to the fact that vertex x wats to sed a packet to vertex πx. So, we wat to fid a routig strategy, such that each packet arrives to its destiatio at the miimum possible time. We will use a sychroous model, i.e., the routig occurs i discrete time steps, ad i each time step, oe packet is allowed to travel alog each edge. We are iterested i oblivious strategies, i.e. for each x, the path which the packet goig from x to πx will use oly depeds o x ad πx ad o other vertices. Observe that a packet from x to y takes time at least d H x, y, ad sice d H 00...0, 11...1, we have a worst-case lower boud of O for ay strategy. For the above problem we have the followig results: Theorem 6.1 KKT90] For ay determiistic, oblivious routig strategy o the hypercube, there exists a permutatio that requires Ω time steps. This is a bad lower boud for the worst-case sceario. sigificat improvemet. Fortuately, radomizatio ca give a Theorem 6. VB81] There exists a radomized, oblivious routig strategy that termiates i O time steps w.h.p. The radomized strategy of the above theorem cosists of two phases: I phase 1, each packet is routed to a itermediate ode ri, where σi is chose uiformly at radom. I phase, each packet is routed from σi to πi. 6

I both phases, we use the bit-fixig paths strategy to route the packets. I the bit-fixig strategy, we move from a vertex x to a vertex y by checkig all bits of the two odes from left to right, ad at each bit i which x differs from y, we make the correspodig chage so as to reduce the Hammig distace betwee the two strigs. Observe that the paths that we obtai from this strategy are always shortest paths. Also, ote that σ is ot required to be a permutatio of {0, 1}, so this strategy is oblivious, i the sese that each ode does t care about which itermediate ode is chose by other odes. This strategy breaks the symmetry i the problem by simply choosig a radom itermediate destiatio for each packet. This makes it impossible for a adversary to select a bad permutatio. The aalysis of the above strategy will follow i the ext lecture. Refereces KKT90] C. Kaklamais, D. Krizac ad A. Tsatilas. Tight Bouds for Oblivious Routig i the Hypercube, i Proceedigs of the Symposium o Parallel Algorithms ad Architecture, 1990. VBL81] G. Valiat ad G. J. Breber. Uiversal schemes for parallel commuicatio, i Proceedigs of the 1th aual ACM Symposium o Theory of Computig, 1981. 7