Lecture 9: March 26, 2014

Size: px
Start display at page:

Download "Lecture 9: March 26, 2014"

Transcription

1 COMS : Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 9: March 26, 204 Spring 204 Scriber: Keith Nichols Overview. Last Time Finished analysis of O ( n ɛ ) -query algorithm for monotonicity. Showed an Ω( n) lower bound for one-sided non-adaptive monotonicity testers. Stated and proved (one direction of) Yao s Principle: Suppose there exists a distribution D over functions f : {, } n {, } (the inputs to the property testing problem) such that any q-query deterministic algorithm gives the right answer with probability at most c. Then, given any q-query non-adaptive randomized testing algorithm A, there exists some function f A such that: Pr[ A outputs correct answer onf A ] c..2 Today: lower bound for two-sided non-adaptive monotonicity testers. We will use Yao s Principle to show the following lower bound: Theorem (Chen Servedio Tan 4). Any 2-sided non-adaptive property tester for monotonicity, to ɛ 0 -test, needs Ω ( n /5) queries (where ɛ 0 > 0 is an absolute constant). 2 Ω( n /5 ) lower bound: proving Theorem 2. Preliminaries Recall the inition of total variation distance between two distributions over the same set Ω: d TV (D, D 2 ) = D (x) D 2 (x). 2 x

2 2 PROVING THE Ω ( n /5) LOWER BOUND 2 As homework problem from last time, we have the lemma below, which relates the probability of distinguishing between samples from two distributions to their total variation distance: Lemma 2 (HW problem). Let D, D 2 be two distributions over some set Ω, and A be any algorithm (possibly randomized) that takes x Ω as input and outputs Yes or No. Then Pr [ A(x) = Yes ] Pr [ A(x) = Yes ] d TV(D, D 2 ) x D x D 2 HW Problem where the probabilities are also taken over the possible randomness of A. To apply this lemma, recall that given a deterministic algorithm s set of queries Q = {z (),..., z (q) } {, } n, a distribution D over Boolean functions induces a distribution D Q over {, } q : x is drawn from D Q by drawing f D; outputting (f(z (),..., f(z (q) )) {, } q. With this observation and Yao s principle in hand, we can state and prove a key tool in proving lower bounds in property testing: Lemma 3 (Key Tool). Fix any property P (a set of Boolean functions). Let D Yes be a distribution over the Boolean functions that belong to P, and D No be a distribution over Boolean functions that all have dist(f, P) > ɛ. Suppose that for all q-query sets Q, one has d TV ( D Yes Q, D No Q ) (2-sided) non-adaptive ɛ-tester for P must use at least q + queries. 4. Then any Proof. Let D be the mixture D = 2 D Yes + 2 D No (that is, a draw from D is obtained by tossing a fair coin, and returning accordingly a sample drawn either from D Yes or D No ). Fix a q-query deterministic algorithm A. Let p Y = Pr f D Yes [ A accepts on f ], p N = Pr f D No [ A accepts on f ] That is, p Y is the probability that a random Yes function is accepted, while p N is the probability that a random No function is accepted. Via the assumption and the This is sometimes referred to as a data processing inequality for the total variation distance.

3 2 PROVING THE Ω ( n /5) LOWER BOUND 3 previous lemma, p Y p N. However, this means that A cannot be a succesful 4 tester; as Pr f D [ A gives wrong answer ] = 2 ( p Y ) + 2 p N = (p N p Y ) 3 8 > 3 So Yao s Principle tells us that any randomized non-adaptive q-query algorithm is wrong on some f in support of D with probability at least 3 ; but a legit tester can only 8 be wrong on any such f with probability less than. 3 Exercise 4 (Generalization of Lemma 3). Relax the previous lemma slightly. Prove that the conclusion still holds even under the weaker assumptions HW Problem Pr [ f P ] 99 f D Yes 00, Pr [ d TV (f, P) > ɛ ] 99 f D No 00. For our lower bound, we need to come up with D Yes (resp. D No ) to be over monotone functions ( (resp. ɛ 0 -far from monotone) such that Q {, } n with ) Q Q Q = q, d TV D Yes, D No. 4 At a high-level, we need to argue that both distributions look the same. One may thus think of the Central Limit Theorem the sum of many independent, nice real-valued random variables converges to a Gaussian in distribution (in cumulative distribution function). For instance, a binomial distribution Bin ( 0 6, 2) has the same shape ( bell curve ) as the corresponding Gaussian distribution N ( 2, 4 06). For our purpose, however, the convergence guarantees stated by the Central Limit Theorem will not be enough, as they do not give explicit bounds on the rate of convergence; we will use a quantitative version of the CLT, the Berry Esséen Theorem. First, recall the inition a (real-valued) Gaussian random variable: Definition 5 (One-dimensional Gaussian distribution). A real-valued random variable is said to be Gaussian with mean µ and variance σ if it follows the distribution N (µ, σ), which has probability density function f µ,σ (x) = 2πσ e (x µ)2 2σ 2, x R Such a random variable has indeed expectation µ and variance σ 2 ; futhermore, the distribution is fully specified by these two parameters. Extending to higher dimensions, one can ine similarly a d-dimensional Gaussian random variable:

4 2 PROVING THE Ω ( n /5) LOWER BOUND F0,(x) f0,(x) x (a) Cumulative distribution function (CDF) x (b) Probability density function (PDF) Figure : Standard Gaussian N (0, ). Definition 6 (d-dimensional Gaussian distribution). Fix a vector µ R d and a symmetric non-negative inite matrix Σ R d d. A random variable taking values in R d is said to be Gaussian with mean µ and covariance Σ if it follows the distribution N (µ, Σ), which has probability density function f µ,σ (x) = (2π) k det Σ e 2 (x µ)t Σ (x µ), x R d As in the univariate case, µ and Σ uniquely ine the distribution; further, one has that for X N (µ, Σ), Σ i,j = Cov(X i, X j ) = E[(X i EX i )(X j EX j )], i, j [d]. Theorem 7 (Berry Esséen 2 ). Let S = X X n be the sum of n independent (real-valued) random variables X,..., X n satisfying Pr[ X i E[X i ] τ ] =. that is every X i is almost surely bounded. For i [n], ine µ i = E[X i ] and σ i = Var Xi, so that ES = n i= µ i and Var S = n ( i= σi 2 (the last equality by independence). n ) ni= Finally, let G be a N i= µ i, σi 2 Gaussian variable, matching the first two moments of S. Then, for all θ R, Pr[ S θ ] Pr[ G θ ] O(τ) ni=. σi 2

5 2 PROVING THE Ω ( n /5) LOWER BOUND 5 In other terms 3, letting F S (resp. F G ) denote the CDF of S (resp. G), one has F S F G O(τ). n i= σ2 i Remark. The constant hidden in the O( ) notation is actually very reasonable one can take it to be equal to. Application: baby step towards the lower bound. Fix any string z {, } n, and for i [n] let the (independent) random variables γ i be ined as + w.p. 2 γ i = w.p. 2 Letting X i = γ i z i, we have µ i = EX i = 0, σ i = Var X i = ; and can take τ = to apply the Berry Esséen theorem to X = X X n. This allows us to conclude that θ R, Pr[ X θ ] Pr[ G θ ] O() n for G N (0, n). Now, consider a slightly different distribution than the λ i s: for the same z {, } n, ine the independent random variables ν i by ν i = 9 w.p w.p. 0 and let Y i = ν i z i for i [n], Y = Y + + Y n. By our choice of parameters, ( EY i = 0 ( 3) ) z i = 0 = EX i 3 Var Y i = E [ ] Yi 2 = = = Var X i So E[Y ] = E[Y ] = 0 and Var Y = Var X = n; by the Berry Esséen theorem (with τ set to 3, and G as before) θ R, Pr[ Y θ ] Pr[ G θ ] O() n 3 This quantity F S F G is also referred to as the Kolmogorov distance between S and G. 3 There exist other versions of this theorem, with weaker assumptions or phrased in terms of the third moments of the X i s; we only state here one tailored to our needs.

6 2 PROVING THE Ω ( n /5) LOWER BOUND 6 and by the triangle inequality θ R, Pr[ X θ ] Pr[ Y θ ] O() n () We can now ine D Yes and D No based on this (that is, based on respectively a random draw of λ, ν R n distributed as above): a function f λ D Yes is given by z {, } n, f λ (z) = sign(λ z +... λ n z n ). and similarly for f ν D No : z {, } n, f ν (z) = sign(ν z +... ν n z n ) With the notations above, X 0 if and only if f γ (z) = and Y 0 if and only if f ν (z) =. This implies that for any fixed single query z, ( ) {z} {z} d TV D Yes, D No = O() ( Pr[ X 0 ] Pr[ Y 0 ] + Pr[ X > 0 ] Pr[ Y > 0 ] ). 2 n This almost looks like what we were aiming at so why aren t we done? There are two problems with what we did above:. This only deals the case q = ; that is, would provide a lower bound against one-query algorithms. Fix: we will use a multidimensional version of the Berry Esséen Theorem for the sums of q-dimensional independent random variables (converging to a multidimensional Gaussian). 2. f γ, f ν are not monotone (indeed, both the γ i s and ν i s can be negative). Fix: shift everything by 2: - γ i {, 3}: f γ is monotone; - ν i {, 7 3 }: f ν will be far from monotone with high probability (will show this). 2.2 The lower bound construction Up until this point, everything has been a warmup; we are now ready to go into more detail.

7 2 PROVING THE Ω ( n /5) LOWER BOUND 7 D Yes and D No. As we mentioned in the previous section, we need to (re)ine the distributions D Yes and D No (that is, of γ and ν) to solve the second issue: D Yes D No Draw f D Yes by independently drawing, for i [n], +3 w.p. 2 γ i = + w.p. 2 and setting f : x {, } n sign( n i= γ i x i ). Any such f is monotone, as the weights are all positive. Similarly, draw f D No by independently drawing, for i [n], w.p. 3 0 ν i = w.p. and setting f : x {, } n sign( n i= ν i x i ). f is not always far from monotone actually, one of the functions in the support of D No (the one with all weights set to 7/3) is even monotone. However, we shall argue that f D No is far from monotone with overwhelming probability, and then apply the relaxation of the key tool (HW Problem 4) to conclude. The theorem will stem from the following two lemmas, that states respectively that ( ) No-functions are almost all far from monotone, and ( ) that the two distributions are hard to distinguish: Lemma 8 (Lemma ). There exists a universal constant ɛ 0 > 0 such that Pr [ dist(f, M) > ɛ 0 ] f D No 2. Θ(n) (note that this o() probability is actually stronger than what the relaxation from Problem 4 requires.) Lemma 9 (Lemma ). Let A be any deterministic q-query algorithm. Then ( Pr [ A accepts ] Pr [ A accepts ] q 5/4 O (log n) /2 ) f Yes D Yes f No D No so that if q = Õ( n /5) the RHS is at most 0.0, which implies with the earlier lemmas and discussion that at least q + queries are needed for any 2-sided, non-adaptive randomized tester. 0 n /4

8 2 PROVING THE Ω ( n /5) LOWER BOUND 8 Proof of Lemma 8. By an additive Chernoff bound, with probability at least 2 Θ(n) the random variables ν i satisfy m = { i [n] : ν i = } [0.09n, 0.n]. ( ) Say that any linear threshold function for which ( ) holds is nice. Fix any nice f in the support of D No, and rename the variables so that the negative weights correspond to the first variables: ( f(x) = sign (x + + x m ) + 7 ) 3 (x m+ + + x n ), x {, } n It is not difficult to show that for this f (remembering that m = Θ(n)), these first variables have high influence roughly of the same order as for the MAJ function: Claim 0 (HW Problem). For i [m], Inf i [f] = Ω ( n ). Observe further that f is unate (i.e., monotone increasing in some coordinates, and monotone decreasing in the others). Indeed, any LTF g : x sign(w x) is unate: - non-decreasing in coordinate x i if and only if w i 0; - non-increasing in coordinate x i if and only if w i 0. We saw in previous lectures that, for g monotone, ĝ(i) = Inf i [g]; it turns out the same proof generalizes to unate g, yielding HW Problem ĝ(i) = ±Inf i [g] where the sign depends on whether g is non-decreasing or non-increasing in x i. Back to our function f, this means that and thus for all i [m] ˆf(i) = Ω ( n ). + Inf i [f] = ˆf(i) if ν i = 7 3 ˆf(i) if ν i = Fix any monotone Boolean function g: we will show that dist(f, g) ɛ 0, for some

9 2 PROVING THE Ω ( n /5) LOWER BOUND 9 choice of ɛ 0 > 0 independent of f and g. [ 4 dist(f, g) = E ] x U{,} n (f(x) g(x)) 2 = ( ˆf(S) ĝ(s)) 2 (Parseval) S [n] n ( ˆf(i) m ĝ(i)) 2 ( ˆf(i) m ĝ(i)) 2 = ( Inf i [f] Inf i [g]) 2 (g mon.) i= i= i= m m = (Inf i [f] + Inf i [g]) 2 (Inf i [f]) 2 i= i= ( ( )) m 2 ( ) m = Ω n = Ω i= n = Ω(). Proof (sketch) of Lemma 9. Fix any deterministic, non-adaptive q-query algorithm A; and view its q queries z (),..., z (q) {, } n as a q n matrix Q {, } q n, where z (i) corresponds to the i th row of Q. n {}}{ z () z () 2 z () 3 z () n z (2) z (2) 2 z (2) 3 z (2) n q z (q) z (q) 2 z (q) 3 z n (q) Define the Yes-response vector R Y, random variable over {, } q, by the process of (i) drawing f Yes D Yes, where f Yes (x) = sign(γ x + + γ n x n ); (ii) setting the i th coordinate of R Y to f Yes (Q i, ) (f Yes on the i th row of Q, i.e. z (i) ). Similarly, ine the No-response vector R N over {, } q. Via Lemma 2 (the homework problem on total variation distance), (LHS of Lemma 9) d TV (R Y, R N ). (abusing the notation of total variation distance, by identifying the random variables with their distribution.) Hence, our new goal is to show that: d TV (R Y, R N ) (RHS of Lemma 9).?

10 2 PROVING THE Ω ( n /5) LOWER BOUND 0 Multidimensional Berry Esséen setup. variables S, T R q as For fixed Q as above, ine two random S = Qγ, with γ U {,3} n; T = Qν, with for each i [n] (independently) w.p. 3 ν i = w.p. We will also need the following geometric notion: Definition. An orthant in R q is the analogue in q-dimensional Euclidean space of a quadrant in the plane R 2 ; that is, it is a set of the form 0 0 O = O O 2 O q where each O i is either R + or R. There are 2 q different orthants in R q. The random variable R Y is fully determined by the orthant S lies in: the i th coordinate of R Y is the sign of the i th coordinate of S, as S i = (Qγ) i = Q i, γ. Likewise, R N is determined by the orthant T lies in. Abusing slightly the notation, we will write R Y = sign(s) for i [q], R Y,i = sign(s i ) (and similarly, R T = sign(t )). Now, it is enough to show that for any union O of orthants, ( q 5/4 (log n) /2 ) Pr[ S O ] Pr[ T O ] O. ( ) as this is equivalent to proving that, for any subset U {, } q, Pr[ R S U ] Pr[ R T U ] O ( q 5/4 (log n) /2 n /4 ) (and the LHS is by inition equal to dtv (R Y, R N )). Note that for q = we get back to the regular Berry Esséen Theorem; for q >, we will need a multidimensional Berry Esséen. The key will be to have random variables with matching means and covariances (instead of means and variances for the one-dimensional case). n /4 (Rest of the proof during next lecture.)

Lecture 8: March 12, 2014

Lecture 8: March 12, 2014 COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 8: March 2, 204 Spring 204 Scriber: Yichi Zhang Overview. Last time Proper learning for P implies property testing

More information

3 Finish learning monotone Boolean functions

3 Finish learning monotone Boolean functions COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 5: 02/19/2014 Spring 2014 Scribes: Dimitris Paidarakis 1 Last time Finished KM algorithm; Applications of KM

More information

Lecture 5: February 16, 2012

Lecture 5: February 16, 2012 COMS 6253: Advanced Computational Learning Theory Lecturer: Rocco Servedio Lecture 5: February 16, 2012 Spring 2012 Scribe: Igor Carboni Oliveira 1 Last time and today Previously: Finished first unit on

More information

Lecture 1: 01/22/2014

Lecture 1: 01/22/2014 COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 1: 01/22/2014 Spring 2014 Scribes: Clément Canonne and Richard Stark 1 Today High-level overview Administrative

More information

arxiv: v1 [cs.cc] 29 Feb 2012

arxiv: v1 [cs.cc] 29 Feb 2012 On the Distribution of the Fourier Spectrum of Halfspaces Ilias Diakonikolas 1, Ragesh Jaiswal 2, Rocco A. Servedio 3, Li-Yang Tan 3, and Andrew Wan 4 arxiv:1202.6680v1 [cs.cc] 29 Feb 2012 1 University

More information

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN Lecture Notes 5 Convergence and Limit Theorems Motivation Convergence with Probability Convergence in Mean Square Convergence in Probability, WLLN Convergence in Distribution, CLT EE 278: Convergence and

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Supervised Machine Learning (Spring 2014) Homework 2, sample solutions

Supervised Machine Learning (Spring 2014) Homework 2, sample solutions 58669 Supervised Machine Learning (Spring 014) Homework, sample solutions Credit for the solutions goes to mainly to Panu Luosto and Joonas Paalasmaa, with some additional contributions by Jyrki Kivinen

More information

6 The normal distribution, the central limit theorem and random samples

6 The normal distribution, the central limit theorem and random samples 6 The normal distribution, the central limit theorem and random samples 6.1 The normal distribution We mentioned the normal (or Gaussian) distribution in Chapter 4. It has density f X (x) = 1 σ 1 2π e

More information

Lecture 11. Multivariate Normal theory

Lecture 11. Multivariate Normal theory 10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Testing equivalence between distributions using conditional samples

Testing equivalence between distributions using conditional samples Testing equivalence between distributions using conditional samples Clément Canonne Dana Ron Rocco A. Servedio Abstract We study a recently introduced framework [7, 8] for property testing of probability

More information

Lecture 13: 04/23/2014

Lecture 13: 04/23/2014 COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 13: 04/23/2014 Spring 2014 Scribe: Psallidas Fotios Administrative: Submit HW problem solutions by Wednesday,

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

MAT 271E Probability and Statistics

MAT 271E Probability and Statistics MAT 71E Probability and Statistics Spring 013 Instructor : Class Meets : Office Hours : Textbook : Supp. Text : İlker Bayram EEB 1103 ibayram@itu.edu.tr 13.30 1.30, Wednesday EEB 5303 10.00 1.00, Wednesday

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is

More information

11.1 Set Cover ILP formulation of set cover Deterministic rounding

11.1 Set Cover ILP formulation of set cover Deterministic rounding CS787: Advanced Algorithms Lecture 11: Randomized Rounding, Concentration Bounds In this lecture we will see some more examples of approximation algorithms based on LP relaxations. This time we will use

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

Probability and Measure

Probability and Measure Chapter 4 Probability and Measure 4.1 Introduction In this chapter we will examine probability theory from the measure theoretic perspective. The realisation that measure theory is the foundation of probability

More information

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

The Moment Method; Convex Duality; and Large/Medium/Small Deviations Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

Lecture 4: September Reminder: convergence of sequences

Lecture 4: September Reminder: convergence of sequences 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information

STAT 414: Introduction to Probability Theory

STAT 414: Introduction to Probability Theory STAT 414: Introduction to Probability Theory Spring 2016; Homework Assignments Latest updated on April 29, 2016 HW1 (Due on Jan. 21) Chapter 1 Problems 1, 8, 9, 10, 11, 18, 19, 26, 28, 30 Theoretical Exercises

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

Limiting Distributions

Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the two fundamental results

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

The Probabilistic Method

The Probabilistic Method Lecture 3: Tail bounds, Probabilistic Method Today we will see what is nown as the probabilistic method for showing the existence of combinatorial objects. We also review basic concentration inequalities.

More information

Learning and Fourier Analysis

Learning and Fourier Analysis Learning and Fourier Analysis Grigory Yaroslavtsev http://grigory.us CIS 625: Computational Learning Theory Fourier Analysis and Learning Powerful tool for PAC-style learning under uniform distribution

More information

FE 5204 Stochastic Differential Equations

FE 5204 Stochastic Differential Equations Instructor: Jim Zhu e-mail:zhu@wmich.edu http://homepages.wmich.edu/ zhu/ January 20, 2009 Preliminaries for dealing with continuous random processes. Brownian motions. Our main reference for this lecture

More information

SDS 321: Introduction to Probability and Statistics

SDS 321: Introduction to Probability and Statistics SDS 321: Introduction to Probability and Statistics Lecture 14: Continuous random variables Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin www.cs.cmu.edu/

More information

14.1 Finding frequent elements in stream

14.1 Finding frequent elements in stream Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours

More information

Settling the Query Complexity of Non-Adaptive Junta Testing

Settling the Query Complexity of Non-Adaptive Junta Testing Settling the Query Complexity of Non-Adaptive Junta Testing Erik Waingarten, Columbia University Based on joint work with Xi Chen (Columbia University) Rocco Servedio (Columbia University) Li-Yang Tan

More information

ECE Lecture #10 Overview

ECE Lecture #10 Overview ECE 450 - Lecture #0 Overview Introduction to Random Vectors CDF, PDF Mean Vector, Covariance Matrix Jointly Gaussian RV s: vector form of pdf Introduction to Random (or Stochastic) Processes Definitions

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

Limiting Distributions

Limiting Distributions Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the

More information

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it Statistics 300B Winter 08 Final Exam Due 4 Hours after receiving it Directions: This test is open book and open internet, but must be done without consulting other students. Any consultation of other students

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise. 54 We are given the marginal pdfs of Y and Y You should note that Y gamma(4, Y exponential( E(Y = 4, V (Y = 4, E(Y =, and V (Y = 4 (a With U = Y Y, we have E(U = E(Y Y = E(Y E(Y = 4 = (b Because Y and

More information

Probability. Table of contents

Probability. Table of contents Probability Table of contents 1. Important definitions 2. Distributions 3. Discrete distributions 4. Continuous distributions 5. The Normal distribution 6. Multivariate random variables 7. Other continuous

More information

Probability reminders

Probability reminders CS246 Winter 204 Mining Massive Data Sets Probability reminders Sammy El Ghazzal selghazz@stanfordedu Disclaimer These notes may contain typos, mistakes or confusing points Please contact the author so

More information

Improved Approximation of Linear Threshold Functions

Improved Approximation of Linear Threshold Functions Improved Approximation of Linear Threshold Functions Ilias Diakonikolas Computer Science Division UC Berkeley Berkeley, CA ilias@cs.berkeley.edu Rocco A. Servedio Department of Computer Science Columbia

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

7 Random samples and sampling distributions

7 Random samples and sampling distributions 7 Random samples and sampling distributions 7.1 Introduction - random samples We will use the term experiment in a very general way to refer to some process, procedure or natural phenomena that produces

More information

Testing Monotone High-Dimensional Distributions

Testing Monotone High-Dimensional Distributions Testing Monotone High-Dimensional Distributions Ronitt Rubinfeld Computer Science & Artificial Intelligence Lab. MIT Cambridge, MA 02139 ronitt@theory.lcs.mit.edu Rocco A. Servedio Department of Computer

More information

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace

More information

Learning and Fourier Analysis

Learning and Fourier Analysis Learning and Fourier Analysis Grigory Yaroslavtsev http://grigory.us Slides at http://grigory.us/cis625/lecture2.pdf CIS 625: Computational Learning Theory Fourier Analysis and Learning Powerful tool for

More information

University of Regina. Lecture Notes. Michael Kozdron

University of Regina. Lecture Notes. Michael Kozdron University of Regina Statistics 252 Mathematical Statistics Lecture Notes Winter 2005 Michael Kozdron kozdron@math.uregina.ca www.math.uregina.ca/ kozdron Contents 1 The Basic Idea of Statistics: Estimating

More information

Testing Halfspaces. Ryan O Donnell Carnegie Mellon University. Rocco A. Servedio Columbia University.

Testing Halfspaces. Ryan O Donnell Carnegie Mellon University. Rocco A. Servedio Columbia University. Kevin Matulef MIT matulef@mit.edu Testing Halfspaces Ryan O Donnell Carnegie Mellon University odonnell@cs.cmu.edu Rocco A. Servedio Columbia University rocco@cs.columbia.edu November 9, 2007 Ronitt Rubinfeld

More information

ECE 564/645 - Digital Communications, Spring 2018 Homework #2 Due: March 19 (In Lecture)

ECE 564/645 - Digital Communications, Spring 2018 Homework #2 Due: March 19 (In Lecture) ECE 564/645 - Digital Communications, Spring 018 Homework # Due: March 19 (In Lecture) 1. Consider a binary communication system over a 1-dimensional vector channel where message m 1 is sent by signaling

More information

Chapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be

Chapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be Chapter 6 Expectation and Conditional Expectation Lectures 24-30 In this chapter, we introduce expected value or the mean of a random variable. First we define expectation for discrete random variables

More information

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality Discrete Structures II (Summer 2018) Rutgers University Instructor: Abhishek Bhrushundi

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Probability. Lecture Notes. Adolfo J. Rumbos

Probability. Lecture Notes. Adolfo J. Rumbos Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................

More information

The Central Limit Theorem

The Central Limit Theorem The Central Limit Theorem Patrick Breheny September 27 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 31 Kerrich s experiment Introduction 10,000 coin flips Expectation and

More information

Lecture Notes 3 Convergence (Chapter 5)

Lecture Notes 3 Convergence (Chapter 5) Lecture Notes 3 Convergence (Chapter 5) 1 Convergence of Random Variables Let X 1, X 2,... be a sequence of random variables and let X be another random variable. Let F n denote the cdf of X n and let

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20 CS 70 Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20 Today we shall discuss a measure of how close a random variable tends to be to its expectation. But first we need to see how to compute

More information

Lecture 1 Measure concentration

Lecture 1 Measure concentration CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples

More information

STOR Lecture 16. Properties of Expectation - I

STOR Lecture 16. Properties of Expectation - I STOR 435.001 Lecture 16 Properties of Expectation - I Jan Hannig UNC Chapel Hill 1 / 22 Motivation Recall we found joint distributions to be pretty complicated objects. Need various tools from combinatorics

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

Metric spaces and metrizability

Metric spaces and metrizability 1 Motivation Metric spaces and metrizability By this point in the course, this section should not need much in the way of motivation. From the very beginning, we have talked about R n usual and how relatively

More information

CLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES

CLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES CLASSICAL PROBABILITY 2008 2. MODES OF CONVERGENCE AND INEQUALITIES JOHN MORIARTY In many interesting and important situations, the object of interest is influenced by many random factors. If we can construct

More information

STAT 418: Probability and Stochastic Processes

STAT 418: Probability and Stochastic Processes STAT 418: Probability and Stochastic Processes Spring 2016; Homework Assignments Latest updated on April 29, 2016 HW1 (Due on Jan. 21) Chapter 1 Problems 1, 8, 9, 10, 11, 18, 19, 26, 28, 30 Theoretical

More information

Lecture 5: Probabilistic tools and Applications II

Lecture 5: Probabilistic tools and Applications II T-79.7003: Graphs and Networks Fall 2013 Lecture 5: Probabilistic tools and Applications II Lecturer: Charalampos E. Tsourakakis Oct. 11, 2013 5.1 Overview In the first part of today s lecture we will

More information

Lecture 22: Variance and Covariance

Lecture 22: Variance and Covariance EE5110 : Probability Foundations for Electrical Engineers July-November 2015 Lecture 22: Variance and Covariance Lecturer: Dr. Krishna Jagannathan Scribes: R.Ravi Kiran In this lecture we will introduce

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

MTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6

MTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6 MTH739U/P: Topics in Scientific Computing Autumn 16 Week 6 4.5 Generic algorithms for non-uniform variates We have seen that sampling from a uniform distribution in [, 1] is a relatively straightforward

More information

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation) MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation) Last modified: March 7, 2009 Reference: PRP, Sections 3.6 and 3.7. 1. Tail-Sum Theorem

More information

12. Perturbed Matrices

12. Perturbed Matrices MAT334 : Applied Linear Algebra Mike Newman, winter 208 2. Perturbed Matrices motivation We want to solve a system Ax = b in a context where A and b are not known exactly. There might be experimental errors,

More information

Lecture 7. 1 Notations. Tel Aviv University Spring 2011

Lecture 7. 1 Notations. Tel Aviv University Spring 2011 Random Walks and Brownian Motion Tel Aviv University Spring 2011 Lecture date: Apr 11, 2011 Lecture 7 Instructor: Ron Peled Scribe: Yoav Ram The following lecture (and the next one) will be an introduction

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

A ROBUST KHINTCHINE INEQUALITY, AND ALGORITHMS FOR COMPUTING OPTIMAL CONSTANTS IN FOURIER ANALYSIS AND HIGH-DIMENSIONAL GEOMETRY

A ROBUST KHINTCHINE INEQUALITY, AND ALGORITHMS FOR COMPUTING OPTIMAL CONSTANTS IN FOURIER ANALYSIS AND HIGH-DIMENSIONAL GEOMETRY A ROBUST KHINTCHINE INEQUALITY, AND ALGORITHMS FOR COMPUTING OPTIMAL CONSTANTS IN FOURIER ANALYSIS AND HIGH-DIMENSIONAL GEOMETRY ANINDYA DE, ILIAS DIAKONIKOLAS, AND ROCCO A. SERVEDIO Abstract. This paper

More information

STA 2201/442 Assignment 2

STA 2201/442 Assignment 2 STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

1 Review of Probability and Distributions

1 Review of Probability and Distributions Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote

More information

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example continued : Coin tossing Math 425 Intro to Probability Lecture 37 Kenneth Harris kaharri@umich.edu Department of Mathematics University of Michigan April 8, 2009 Consider a Bernoulli trials process with

More information

6.842 Randomness and Computation Lecture 5

6.842 Randomness and Computation Lecture 5 6.842 Randomness and Computation 2012-02-22 Lecture 5 Lecturer: Ronitt Rubinfeld Scribe: Michael Forbes 1 Overview Today we will define the notion of a pairwise independent hash function, and discuss its

More information

Lecture 11. Probability Theory: an Overveiw

Lecture 11. Probability Theory: an Overveiw Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

A Noisy-Influence Regularity Lemma for Boolean Functions Chris Jones

A Noisy-Influence Regularity Lemma for Boolean Functions Chris Jones A Noisy-Influence Regularity Lemma for Boolean Functions Chris Jones Abstract We present a regularity lemma for Boolean functions f : {, } n {, } based on noisy influence, a measure of how locally correlated

More information

MATH 151, FINAL EXAM Winter Quarter, 21 March, 2014

MATH 151, FINAL EXAM Winter Quarter, 21 March, 2014 Time: 3 hours, 8:3-11:3 Instructions: MATH 151, FINAL EXAM Winter Quarter, 21 March, 214 (1) Write your name in blue-book provided and sign that you agree to abide by the honor code. (2) The exam consists

More information

Solution Set for Homework #1

Solution Set for Homework #1 CS 683 Spring 07 Learning, Games, and Electronic Markets Solution Set for Homework #1 1. Suppose x and y are real numbers and x > y. Prove that e x > ex e y x y > e y. Solution: Let f(s = e s. By the mean

More information

Math 416 Lecture 3. The average or mean or expected value of x 1, x 2, x 3,..., x n is

Math 416 Lecture 3. The average or mean or expected value of x 1, x 2, x 3,..., x n is Math 416 Lecture 3 Expected values The average or mean or expected value of x 1, x 2, x 3,..., x n is x 1 x 2... x n n x 1 1 n x 2 1 n... x n 1 n 1 n x i p x i where p x i 1 n is the probability of x i

More information

X = X X n, + X 2

X = X X n, + X 2 CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 22 Variance Question: At each time step, I flip a fair coin. If it comes up Heads, I walk one step to the right; if it comes up Tails, I walk

More information

CIS 800/002 The Algorithmic Foundations of Data Privacy September 29, Lecture 6. The Net Mechanism: A Partial Converse

CIS 800/002 The Algorithmic Foundations of Data Privacy September 29, Lecture 6. The Net Mechanism: A Partial Converse CIS 800/002 The Algorithmic Foundations of Data Privacy September 29, 20 Lecturer: Aaron Roth Lecture 6 Scribe: Aaron Roth Finishing up from last time. Last time we showed: The Net Mechanism: A Partial

More information

4 Invariant Statistical Decision Problems

4 Invariant Statistical Decision Problems 4 Invariant Statistical Decision Problems 4.1 Invariant decision problems Let G be a group of measurable transformations from the sample space X into itself. The group operation is composition. Note that

More information

12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016

12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016 12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses

More information

Notes for Lecture 15

Notes for Lecture 15 U.C. Berkeley CS278: Computational Complexity Handout N15 Professor Luca Trevisan 10/27/2004 Notes for Lecture 15 Notes written 12/07/04 Learning Decision Trees In these notes it will be convenient to

More information

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes UC Berkeley Department of Electrical Engineering and Computer Sciences EECS 6: Probability and Random Processes Problem Set 3 Spring 9 Self-Graded Scores Due: February 8, 9 Submit your self-graded scores

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information