Lecture 9: March 26, 2014
|
|
- Charlotte Randall
- 5 years ago
- Views:
Transcription
1 COMS : Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 9: March 26, 204 Spring 204 Scriber: Keith Nichols Overview. Last Time Finished analysis of O ( n ɛ ) -query algorithm for monotonicity. Showed an Ω( n) lower bound for one-sided non-adaptive monotonicity testers. Stated and proved (one direction of) Yao s Principle: Suppose there exists a distribution D over functions f : {, } n {, } (the inputs to the property testing problem) such that any q-query deterministic algorithm gives the right answer with probability at most c. Then, given any q-query non-adaptive randomized testing algorithm A, there exists some function f A such that: Pr[ A outputs correct answer onf A ] c..2 Today: lower bound for two-sided non-adaptive monotonicity testers. We will use Yao s Principle to show the following lower bound: Theorem (Chen Servedio Tan 4). Any 2-sided non-adaptive property tester for monotonicity, to ɛ 0 -test, needs Ω ( n /5) queries (where ɛ 0 > 0 is an absolute constant). 2 Ω( n /5 ) lower bound: proving Theorem 2. Preliminaries Recall the inition of total variation distance between two distributions over the same set Ω: d TV (D, D 2 ) = D (x) D 2 (x). 2 x
2 2 PROVING THE Ω ( n /5) LOWER BOUND 2 As homework problem from last time, we have the lemma below, which relates the probability of distinguishing between samples from two distributions to their total variation distance: Lemma 2 (HW problem). Let D, D 2 be two distributions over some set Ω, and A be any algorithm (possibly randomized) that takes x Ω as input and outputs Yes or No. Then Pr [ A(x) = Yes ] Pr [ A(x) = Yes ] d TV(D, D 2 ) x D x D 2 HW Problem where the probabilities are also taken over the possible randomness of A. To apply this lemma, recall that given a deterministic algorithm s set of queries Q = {z (),..., z (q) } {, } n, a distribution D over Boolean functions induces a distribution D Q over {, } q : x is drawn from D Q by drawing f D; outputting (f(z (),..., f(z (q) )) {, } q. With this observation and Yao s principle in hand, we can state and prove a key tool in proving lower bounds in property testing: Lemma 3 (Key Tool). Fix any property P (a set of Boolean functions). Let D Yes be a distribution over the Boolean functions that belong to P, and D No be a distribution over Boolean functions that all have dist(f, P) > ɛ. Suppose that for all q-query sets Q, one has d TV ( D Yes Q, D No Q ) (2-sided) non-adaptive ɛ-tester for P must use at least q + queries. 4. Then any Proof. Let D be the mixture D = 2 D Yes + 2 D No (that is, a draw from D is obtained by tossing a fair coin, and returning accordingly a sample drawn either from D Yes or D No ). Fix a q-query deterministic algorithm A. Let p Y = Pr f D Yes [ A accepts on f ], p N = Pr f D No [ A accepts on f ] That is, p Y is the probability that a random Yes function is accepted, while p N is the probability that a random No function is accepted. Via the assumption and the This is sometimes referred to as a data processing inequality for the total variation distance.
3 2 PROVING THE Ω ( n /5) LOWER BOUND 3 previous lemma, p Y p N. However, this means that A cannot be a succesful 4 tester; as Pr f D [ A gives wrong answer ] = 2 ( p Y ) + 2 p N = (p N p Y ) 3 8 > 3 So Yao s Principle tells us that any randomized non-adaptive q-query algorithm is wrong on some f in support of D with probability at least 3 ; but a legit tester can only 8 be wrong on any such f with probability less than. 3 Exercise 4 (Generalization of Lemma 3). Relax the previous lemma slightly. Prove that the conclusion still holds even under the weaker assumptions HW Problem Pr [ f P ] 99 f D Yes 00, Pr [ d TV (f, P) > ɛ ] 99 f D No 00. For our lower bound, we need to come up with D Yes (resp. D No ) to be over monotone functions ( (resp. ɛ 0 -far from monotone) such that Q {, } n with ) Q Q Q = q, d TV D Yes, D No. 4 At a high-level, we need to argue that both distributions look the same. One may thus think of the Central Limit Theorem the sum of many independent, nice real-valued random variables converges to a Gaussian in distribution (in cumulative distribution function). For instance, a binomial distribution Bin ( 0 6, 2) has the same shape ( bell curve ) as the corresponding Gaussian distribution N ( 2, 4 06). For our purpose, however, the convergence guarantees stated by the Central Limit Theorem will not be enough, as they do not give explicit bounds on the rate of convergence; we will use a quantitative version of the CLT, the Berry Esséen Theorem. First, recall the inition a (real-valued) Gaussian random variable: Definition 5 (One-dimensional Gaussian distribution). A real-valued random variable is said to be Gaussian with mean µ and variance σ if it follows the distribution N (µ, σ), which has probability density function f µ,σ (x) = 2πσ e (x µ)2 2σ 2, x R Such a random variable has indeed expectation µ and variance σ 2 ; futhermore, the distribution is fully specified by these two parameters. Extending to higher dimensions, one can ine similarly a d-dimensional Gaussian random variable:
4 2 PROVING THE Ω ( n /5) LOWER BOUND F0,(x) f0,(x) x (a) Cumulative distribution function (CDF) x (b) Probability density function (PDF) Figure : Standard Gaussian N (0, ). Definition 6 (d-dimensional Gaussian distribution). Fix a vector µ R d and a symmetric non-negative inite matrix Σ R d d. A random variable taking values in R d is said to be Gaussian with mean µ and covariance Σ if it follows the distribution N (µ, Σ), which has probability density function f µ,σ (x) = (2π) k det Σ e 2 (x µ)t Σ (x µ), x R d As in the univariate case, µ and Σ uniquely ine the distribution; further, one has that for X N (µ, Σ), Σ i,j = Cov(X i, X j ) = E[(X i EX i )(X j EX j )], i, j [d]. Theorem 7 (Berry Esséen 2 ). Let S = X X n be the sum of n independent (real-valued) random variables X,..., X n satisfying Pr[ X i E[X i ] τ ] =. that is every X i is almost surely bounded. For i [n], ine µ i = E[X i ] and σ i = Var Xi, so that ES = n i= µ i and Var S = n ( i= σi 2 (the last equality by independence). n ) ni= Finally, let G be a N i= µ i, σi 2 Gaussian variable, matching the first two moments of S. Then, for all θ R, Pr[ S θ ] Pr[ G θ ] O(τ) ni=. σi 2
5 2 PROVING THE Ω ( n /5) LOWER BOUND 5 In other terms 3, letting F S (resp. F G ) denote the CDF of S (resp. G), one has F S F G O(τ). n i= σ2 i Remark. The constant hidden in the O( ) notation is actually very reasonable one can take it to be equal to. Application: baby step towards the lower bound. Fix any string z {, } n, and for i [n] let the (independent) random variables γ i be ined as + w.p. 2 γ i = w.p. 2 Letting X i = γ i z i, we have µ i = EX i = 0, σ i = Var X i = ; and can take τ = to apply the Berry Esséen theorem to X = X X n. This allows us to conclude that θ R, Pr[ X θ ] Pr[ G θ ] O() n for G N (0, n). Now, consider a slightly different distribution than the λ i s: for the same z {, } n, ine the independent random variables ν i by ν i = 9 w.p w.p. 0 and let Y i = ν i z i for i [n], Y = Y + + Y n. By our choice of parameters, ( EY i = 0 ( 3) ) z i = 0 = EX i 3 Var Y i = E [ ] Yi 2 = = = Var X i So E[Y ] = E[Y ] = 0 and Var Y = Var X = n; by the Berry Esséen theorem (with τ set to 3, and G as before) θ R, Pr[ Y θ ] Pr[ G θ ] O() n 3 This quantity F S F G is also referred to as the Kolmogorov distance between S and G. 3 There exist other versions of this theorem, with weaker assumptions or phrased in terms of the third moments of the X i s; we only state here one tailored to our needs.
6 2 PROVING THE Ω ( n /5) LOWER BOUND 6 and by the triangle inequality θ R, Pr[ X θ ] Pr[ Y θ ] O() n () We can now ine D Yes and D No based on this (that is, based on respectively a random draw of λ, ν R n distributed as above): a function f λ D Yes is given by z {, } n, f λ (z) = sign(λ z +... λ n z n ). and similarly for f ν D No : z {, } n, f ν (z) = sign(ν z +... ν n z n ) With the notations above, X 0 if and only if f γ (z) = and Y 0 if and only if f ν (z) =. This implies that for any fixed single query z, ( ) {z} {z} d TV D Yes, D No = O() ( Pr[ X 0 ] Pr[ Y 0 ] + Pr[ X > 0 ] Pr[ Y > 0 ] ). 2 n This almost looks like what we were aiming at so why aren t we done? There are two problems with what we did above:. This only deals the case q = ; that is, would provide a lower bound against one-query algorithms. Fix: we will use a multidimensional version of the Berry Esséen Theorem for the sums of q-dimensional independent random variables (converging to a multidimensional Gaussian). 2. f γ, f ν are not monotone (indeed, both the γ i s and ν i s can be negative). Fix: shift everything by 2: - γ i {, 3}: f γ is monotone; - ν i {, 7 3 }: f ν will be far from monotone with high probability (will show this). 2.2 The lower bound construction Up until this point, everything has been a warmup; we are now ready to go into more detail.
7 2 PROVING THE Ω ( n /5) LOWER BOUND 7 D Yes and D No. As we mentioned in the previous section, we need to (re)ine the distributions D Yes and D No (that is, of γ and ν) to solve the second issue: D Yes D No Draw f D Yes by independently drawing, for i [n], +3 w.p. 2 γ i = + w.p. 2 and setting f : x {, } n sign( n i= γ i x i ). Any such f is monotone, as the weights are all positive. Similarly, draw f D No by independently drawing, for i [n], w.p. 3 0 ν i = w.p. and setting f : x {, } n sign( n i= ν i x i ). f is not always far from monotone actually, one of the functions in the support of D No (the one with all weights set to 7/3) is even monotone. However, we shall argue that f D No is far from monotone with overwhelming probability, and then apply the relaxation of the key tool (HW Problem 4) to conclude. The theorem will stem from the following two lemmas, that states respectively that ( ) No-functions are almost all far from monotone, and ( ) that the two distributions are hard to distinguish: Lemma 8 (Lemma ). There exists a universal constant ɛ 0 > 0 such that Pr [ dist(f, M) > ɛ 0 ] f D No 2. Θ(n) (note that this o() probability is actually stronger than what the relaxation from Problem 4 requires.) Lemma 9 (Lemma ). Let A be any deterministic q-query algorithm. Then ( Pr [ A accepts ] Pr [ A accepts ] q 5/4 O (log n) /2 ) f Yes D Yes f No D No so that if q = Õ( n /5) the RHS is at most 0.0, which implies with the earlier lemmas and discussion that at least q + queries are needed for any 2-sided, non-adaptive randomized tester. 0 n /4
8 2 PROVING THE Ω ( n /5) LOWER BOUND 8 Proof of Lemma 8. By an additive Chernoff bound, with probability at least 2 Θ(n) the random variables ν i satisfy m = { i [n] : ν i = } [0.09n, 0.n]. ( ) Say that any linear threshold function for which ( ) holds is nice. Fix any nice f in the support of D No, and rename the variables so that the negative weights correspond to the first variables: ( f(x) = sign (x + + x m ) + 7 ) 3 (x m+ + + x n ), x {, } n It is not difficult to show that for this f (remembering that m = Θ(n)), these first variables have high influence roughly of the same order as for the MAJ function: Claim 0 (HW Problem). For i [m], Inf i [f] = Ω ( n ). Observe further that f is unate (i.e., monotone increasing in some coordinates, and monotone decreasing in the others). Indeed, any LTF g : x sign(w x) is unate: - non-decreasing in coordinate x i if and only if w i 0; - non-increasing in coordinate x i if and only if w i 0. We saw in previous lectures that, for g monotone, ĝ(i) = Inf i [g]; it turns out the same proof generalizes to unate g, yielding HW Problem ĝ(i) = ±Inf i [g] where the sign depends on whether g is non-decreasing or non-increasing in x i. Back to our function f, this means that and thus for all i [m] ˆf(i) = Ω ( n ). + Inf i [f] = ˆf(i) if ν i = 7 3 ˆf(i) if ν i = Fix any monotone Boolean function g: we will show that dist(f, g) ɛ 0, for some
9 2 PROVING THE Ω ( n /5) LOWER BOUND 9 choice of ɛ 0 > 0 independent of f and g. [ 4 dist(f, g) = E ] x U{,} n (f(x) g(x)) 2 = ( ˆf(S) ĝ(s)) 2 (Parseval) S [n] n ( ˆf(i) m ĝ(i)) 2 ( ˆf(i) m ĝ(i)) 2 = ( Inf i [f] Inf i [g]) 2 (g mon.) i= i= i= m m = (Inf i [f] + Inf i [g]) 2 (Inf i [f]) 2 i= i= ( ( )) m 2 ( ) m = Ω n = Ω i= n = Ω(). Proof (sketch) of Lemma 9. Fix any deterministic, non-adaptive q-query algorithm A; and view its q queries z (),..., z (q) {, } n as a q n matrix Q {, } q n, where z (i) corresponds to the i th row of Q. n {}}{ z () z () 2 z () 3 z () n z (2) z (2) 2 z (2) 3 z (2) n q z (q) z (q) 2 z (q) 3 z n (q) Define the Yes-response vector R Y, random variable over {, } q, by the process of (i) drawing f Yes D Yes, where f Yes (x) = sign(γ x + + γ n x n ); (ii) setting the i th coordinate of R Y to f Yes (Q i, ) (f Yes on the i th row of Q, i.e. z (i) ). Similarly, ine the No-response vector R N over {, } q. Via Lemma 2 (the homework problem on total variation distance), (LHS of Lemma 9) d TV (R Y, R N ). (abusing the notation of total variation distance, by identifying the random variables with their distribution.) Hence, our new goal is to show that: d TV (R Y, R N ) (RHS of Lemma 9).?
10 2 PROVING THE Ω ( n /5) LOWER BOUND 0 Multidimensional Berry Esséen setup. variables S, T R q as For fixed Q as above, ine two random S = Qγ, with γ U {,3} n; T = Qν, with for each i [n] (independently) w.p. 3 ν i = w.p. We will also need the following geometric notion: Definition. An orthant in R q is the analogue in q-dimensional Euclidean space of a quadrant in the plane R 2 ; that is, it is a set of the form 0 0 O = O O 2 O q where each O i is either R + or R. There are 2 q different orthants in R q. The random variable R Y is fully determined by the orthant S lies in: the i th coordinate of R Y is the sign of the i th coordinate of S, as S i = (Qγ) i = Q i, γ. Likewise, R N is determined by the orthant T lies in. Abusing slightly the notation, we will write R Y = sign(s) for i [q], R Y,i = sign(s i ) (and similarly, R T = sign(t )). Now, it is enough to show that for any union O of orthants, ( q 5/4 (log n) /2 ) Pr[ S O ] Pr[ T O ] O. ( ) as this is equivalent to proving that, for any subset U {, } q, Pr[ R S U ] Pr[ R T U ] O ( q 5/4 (log n) /2 n /4 ) (and the LHS is by inition equal to dtv (R Y, R N )). Note that for q = we get back to the regular Berry Esséen Theorem; for q >, we will need a multidimensional Berry Esséen. The key will be to have random variables with matching means and covariances (instead of means and variances for the one-dimensional case). n /4 (Rest of the proof during next lecture.)
Lecture 8: March 12, 2014
COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 8: March 2, 204 Spring 204 Scriber: Yichi Zhang Overview. Last time Proper learning for P implies property testing
More information3 Finish learning monotone Boolean functions
COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 5: 02/19/2014 Spring 2014 Scribes: Dimitris Paidarakis 1 Last time Finished KM algorithm; Applications of KM
More informationLecture 5: February 16, 2012
COMS 6253: Advanced Computational Learning Theory Lecturer: Rocco Servedio Lecture 5: February 16, 2012 Spring 2012 Scribe: Igor Carboni Oliveira 1 Last time and today Previously: Finished first unit on
More informationLecture 1: 01/22/2014
COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 1: 01/22/2014 Spring 2014 Scribes: Clément Canonne and Richard Stark 1 Today High-level overview Administrative
More informationarxiv: v1 [cs.cc] 29 Feb 2012
On the Distribution of the Fourier Spectrum of Halfspaces Ilias Diakonikolas 1, Ragesh Jaiswal 2, Rocco A. Servedio 3, Li-Yang Tan 3, and Andrew Wan 4 arxiv:1202.6680v1 [cs.cc] 29 Feb 2012 1 University
More informationLecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN
Lecture Notes 5 Convergence and Limit Theorems Motivation Convergence with Probability Convergence in Mean Square Convergence in Probability, WLLN Convergence in Distribution, CLT EE 278: Convergence and
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationSupervised Machine Learning (Spring 2014) Homework 2, sample solutions
58669 Supervised Machine Learning (Spring 014) Homework, sample solutions Credit for the solutions goes to mainly to Panu Luosto and Joonas Paalasmaa, with some additional contributions by Jyrki Kivinen
More information6 The normal distribution, the central limit theorem and random samples
6 The normal distribution, the central limit theorem and random samples 6.1 The normal distribution We mentioned the normal (or Gaussian) distribution in Chapter 4. It has density f X (x) = 1 σ 1 2π e
More informationLecture 11. Multivariate Normal theory
10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances
More informationProbability and Distributions
Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated
More informationTesting equivalence between distributions using conditional samples
Testing equivalence between distributions using conditional samples Clément Canonne Dana Ron Rocco A. Servedio Abstract We study a recently introduced framework [7, 8] for property testing of probability
More informationLecture 13: 04/23/2014
COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 13: 04/23/2014 Spring 2014 Scribe: Psallidas Fotios Administrative: Submit HW problem solutions by Wednesday,
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationMAT 271E Probability and Statistics
MAT 71E Probability and Statistics Spring 013 Instructor : Class Meets : Office Hours : Textbook : Supp. Text : İlker Bayram EEB 1103 ibayram@itu.edu.tr 13.30 1.30, Wednesday EEB 5303 10.00 1.00, Wednesday
More informationRecitation 2: Probability
Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions
More information18.440: Lecture 28 Lectures Review
18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is
More information11.1 Set Cover ILP formulation of set cover Deterministic rounding
CS787: Advanced Algorithms Lecture 11: Randomized Rounding, Concentration Bounds In this lecture we will see some more examples of approximation algorithms based on LP relaxations. This time we will use
More informationQuick Tour of Basic Probability Theory and Linear Algebra
Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions
More informationLecture 2: Review of Basic Probability Theory
ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent
More informationProbability and Measure
Chapter 4 Probability and Measure 4.1 Introduction In this chapter we will examine probability theory from the measure theoretic perspective. The realisation that measure theory is the foundation of probability
More informationThe Moment Method; Convex Duality; and Large/Medium/Small Deviations
Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential
More informationJoint Probability Distributions and Random Samples (Devore Chapter Five)
Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete
More informationLecture 4: September Reminder: convergence of sequences
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused
More informationMAS223 Statistical Inference and Modelling Exercises
MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,
More informationLecture 2: Review of Probability
Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................
More informationSTAT 414: Introduction to Probability Theory
STAT 414: Introduction to Probability Theory Spring 2016; Homework Assignments Latest updated on April 29, 2016 HW1 (Due on Jan. 21) Chapter 1 Problems 1, 8, 9, 10, 11, 18, 19, 26, 28, 30 Theoretical Exercises
More informationChapter 2: Fundamentals of Statistics Lecture 15: Models and statistics
Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:
More informationSYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions
SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationLimiting Distributions
We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the two fundamental results
More informationRandom Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R
In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationThe Probabilistic Method
Lecture 3: Tail bounds, Probabilistic Method Today we will see what is nown as the probabilistic method for showing the existence of combinatorial objects. We also review basic concentration inequalities.
More informationLearning and Fourier Analysis
Learning and Fourier Analysis Grigory Yaroslavtsev http://grigory.us CIS 625: Computational Learning Theory Fourier Analysis and Learning Powerful tool for PAC-style learning under uniform distribution
More informationFE 5204 Stochastic Differential Equations
Instructor: Jim Zhu e-mail:zhu@wmich.edu http://homepages.wmich.edu/ zhu/ January 20, 2009 Preliminaries for dealing with continuous random processes. Brownian motions. Our main reference for this lecture
More informationSDS 321: Introduction to Probability and Statistics
SDS 321: Introduction to Probability and Statistics Lecture 14: Continuous random variables Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin www.cs.cmu.edu/
More information14.1 Finding frequent elements in stream
Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours
More informationSettling the Query Complexity of Non-Adaptive Junta Testing
Settling the Query Complexity of Non-Adaptive Junta Testing Erik Waingarten, Columbia University Based on joint work with Xi Chen (Columbia University) Rocco Servedio (Columbia University) Li-Yang Tan
More informationECE Lecture #10 Overview
ECE 450 - Lecture #0 Overview Introduction to Random Vectors CDF, PDF Mean Vector, Covariance Matrix Jointly Gaussian RV s: vector form of pdf Introduction to Random (or Stochastic) Processes Definitions
More informationAlgorithms for Uncertainty Quantification
Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example
More informationSample Spaces, Random Variables
Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted
More informationLimiting Distributions
Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the
More informationStatistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it
Statistics 300B Winter 08 Final Exam Due 4 Hours after receiving it Directions: This test is open book and open internet, but must be done without consulting other students. Any consultation of other students
More informationProbability Background
Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second
More information(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.
54 We are given the marginal pdfs of Y and Y You should note that Y gamma(4, Y exponential( E(Y = 4, V (Y = 4, E(Y =, and V (Y = 4 (a With U = Y Y, we have E(U = E(Y Y = E(Y E(Y = 4 = (b Because Y and
More informationProbability. Table of contents
Probability Table of contents 1. Important definitions 2. Distributions 3. Discrete distributions 4. Continuous distributions 5. The Normal distribution 6. Multivariate random variables 7. Other continuous
More informationProbability reminders
CS246 Winter 204 Mining Massive Data Sets Probability reminders Sammy El Ghazzal selghazz@stanfordedu Disclaimer These notes may contain typos, mistakes or confusing points Please contact the author so
More informationImproved Approximation of Linear Threshold Functions
Improved Approximation of Linear Threshold Functions Ilias Diakonikolas Computer Science Division UC Berkeley Berkeley, CA ilias@cs.berkeley.edu Rocco A. Servedio Department of Computer Science Columbia
More informationconditional cdf, conditional pdf, total probability theorem?
6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random
More information7 Random samples and sampling distributions
7 Random samples and sampling distributions 7.1 Introduction - random samples We will use the term experiment in a very general way to refer to some process, procedure or natural phenomena that produces
More informationTesting Monotone High-Dimensional Distributions
Testing Monotone High-Dimensional Distributions Ronitt Rubinfeld Computer Science & Artificial Intelligence Lab. MIT Cambridge, MA 02139 ronitt@theory.lcs.mit.edu Rocco A. Servedio Department of Computer
More informationEEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as
L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace
More informationLearning and Fourier Analysis
Learning and Fourier Analysis Grigory Yaroslavtsev http://grigory.us Slides at http://grigory.us/cis625/lecture2.pdf CIS 625: Computational Learning Theory Fourier Analysis and Learning Powerful tool for
More informationUniversity of Regina. Lecture Notes. Michael Kozdron
University of Regina Statistics 252 Mathematical Statistics Lecture Notes Winter 2005 Michael Kozdron kozdron@math.uregina.ca www.math.uregina.ca/ kozdron Contents 1 The Basic Idea of Statistics: Estimating
More informationTesting Halfspaces. Ryan O Donnell Carnegie Mellon University. Rocco A. Servedio Columbia University.
Kevin Matulef MIT matulef@mit.edu Testing Halfspaces Ryan O Donnell Carnegie Mellon University odonnell@cs.cmu.edu Rocco A. Servedio Columbia University rocco@cs.columbia.edu November 9, 2007 Ronitt Rubinfeld
More informationECE 564/645 - Digital Communications, Spring 2018 Homework #2 Due: March 19 (In Lecture)
ECE 564/645 - Digital Communications, Spring 018 Homework # Due: March 19 (In Lecture) 1. Consider a binary communication system over a 1-dimensional vector channel where message m 1 is sent by signaling
More informationChapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be
Chapter 6 Expectation and Conditional Expectation Lectures 24-30 In this chapter, we introduce expected value or the mean of a random variable. First we define expectation for discrete random variables
More informationLecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality
Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality Discrete Structures II (Summer 2018) Rutgers University Instructor: Abhishek Bhrushundi
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationProbability. Lecture Notes. Adolfo J. Rumbos
Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................
More informationThe Central Limit Theorem
The Central Limit Theorem Patrick Breheny September 27 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 31 Kerrich s experiment Introduction 10,000 coin flips Expectation and
More informationLecture Notes 3 Convergence (Chapter 5)
Lecture Notes 3 Convergence (Chapter 5) 1 Convergence of Random Variables Let X 1, X 2,... be a sequence of random variables and let X be another random variable. Let F n denote the cdf of X n and let
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20
CS 70 Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20 Today we shall discuss a measure of how close a random variable tends to be to its expectation. But first we need to see how to compute
More informationLecture 1 Measure concentration
CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples
More informationSTOR Lecture 16. Properties of Expectation - I
STOR 435.001 Lecture 16 Properties of Expectation - I Jan Hannig UNC Chapel Hill 1 / 22 Motivation Recall we found joint distributions to be pretty complicated objects. Need various tools from combinatorics
More informationNotes on Random Vectors and Multivariate Normal
MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution
More informationMetric spaces and metrizability
1 Motivation Metric spaces and metrizability By this point in the course, this section should not need much in the way of motivation. From the very beginning, we have talked about R n usual and how relatively
More informationCLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES
CLASSICAL PROBABILITY 2008 2. MODES OF CONVERGENCE AND INEQUALITIES JOHN MORIARTY In many interesting and important situations, the object of interest is influenced by many random factors. If we can construct
More informationSTAT 418: Probability and Stochastic Processes
STAT 418: Probability and Stochastic Processes Spring 2016; Homework Assignments Latest updated on April 29, 2016 HW1 (Due on Jan. 21) Chapter 1 Problems 1, 8, 9, 10, 11, 18, 19, 26, 28, 30 Theoretical
More informationLecture 5: Probabilistic tools and Applications II
T-79.7003: Graphs and Networks Fall 2013 Lecture 5: Probabilistic tools and Applications II Lecturer: Charalampos E. Tsourakakis Oct. 11, 2013 5.1 Overview In the first part of today s lecture we will
More informationLecture 22: Variance and Covariance
EE5110 : Probability Foundations for Electrical Engineers July-November 2015 Lecture 22: Variance and Covariance Lecturer: Dr. Krishna Jagannathan Scribes: R.Ravi Kiran In this lecture we will introduce
More informationChapter 5 continued. Chapter 5 sections
Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationMTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6
MTH739U/P: Topics in Scientific Computing Autumn 16 Week 6 4.5 Generic algorithms for non-uniform variates We have seen that sampling from a uniform distribution in [, 1] is a relatively straightforward
More informationMATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)
MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation) Last modified: March 7, 2009 Reference: PRP, Sections 3.6 and 3.7. 1. Tail-Sum Theorem
More information12. Perturbed Matrices
MAT334 : Applied Linear Algebra Mike Newman, winter 208 2. Perturbed Matrices motivation We want to solve a system Ax = b in a context where A and b are not known exactly. There might be experimental errors,
More informationLecture 7. 1 Notations. Tel Aviv University Spring 2011
Random Walks and Brownian Motion Tel Aviv University Spring 2011 Lecture date: Apr 11, 2011 Lecture 7 Instructor: Ron Peled Scribe: Yoav Ram The following lecture (and the next one) will be an introduction
More information1 Presessional Probability
1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional
More informationA ROBUST KHINTCHINE INEQUALITY, AND ALGORITHMS FOR COMPUTING OPTIMAL CONSTANTS IN FOURIER ANALYSIS AND HIGH-DIMENSIONAL GEOMETRY
A ROBUST KHINTCHINE INEQUALITY, AND ALGORITHMS FOR COMPUTING OPTIMAL CONSTANTS IN FOURIER ANALYSIS AND HIGH-DIMENSIONAL GEOMETRY ANINDYA DE, ILIAS DIAKONIKOLAS, AND ROCCO A. SERVEDIO Abstract. This paper
More informationSTA 2201/442 Assignment 2
STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution
More information1 Glivenko-Cantelli type theorems
STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then
More information1 Review of Probability and Distributions
Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote
More informationExample continued. Math 425 Intro to Probability Lecture 37. Example continued. Example
continued : Coin tossing Math 425 Intro to Probability Lecture 37 Kenneth Harris kaharri@umich.edu Department of Mathematics University of Michigan April 8, 2009 Consider a Bernoulli trials process with
More information6.842 Randomness and Computation Lecture 5
6.842 Randomness and Computation 2012-02-22 Lecture 5 Lecturer: Ronitt Rubinfeld Scribe: Michael Forbes 1 Overview Today we will define the notion of a pairwise independent hash function, and discuss its
More informationLecture 11. Probability Theory: an Overveiw
Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the
More informationBayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationA Noisy-Influence Regularity Lemma for Boolean Functions Chris Jones
A Noisy-Influence Regularity Lemma for Boolean Functions Chris Jones Abstract We present a regularity lemma for Boolean functions f : {, } n {, } based on noisy influence, a measure of how locally correlated
More informationMATH 151, FINAL EXAM Winter Quarter, 21 March, 2014
Time: 3 hours, 8:3-11:3 Instructions: MATH 151, FINAL EXAM Winter Quarter, 21 March, 214 (1) Write your name in blue-book provided and sign that you agree to abide by the honor code. (2) The exam consists
More informationSolution Set for Homework #1
CS 683 Spring 07 Learning, Games, and Electronic Markets Solution Set for Homework #1 1. Suppose x and y are real numbers and x > y. Prove that e x > ex e y x y > e y. Solution: Let f(s = e s. By the mean
More informationMath 416 Lecture 3. The average or mean or expected value of x 1, x 2, x 3,..., x n is
Math 416 Lecture 3 Expected values The average or mean or expected value of x 1, x 2, x 3,..., x n is x 1 x 2... x n n x 1 1 n x 2 1 n... x n 1 n 1 n x i p x i where p x i 1 n is the probability of x i
More informationX = X X n, + X 2
CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 22 Variance Question: At each time step, I flip a fair coin. If it comes up Heads, I walk one step to the right; if it comes up Tails, I walk
More informationCIS 800/002 The Algorithmic Foundations of Data Privacy September 29, Lecture 6. The Net Mechanism: A Partial Converse
CIS 800/002 The Algorithmic Foundations of Data Privacy September 29, 20 Lecturer: Aaron Roth Lecture 6 Scribe: Aaron Roth Finishing up from last time. Last time we showed: The Net Mechanism: A Partial
More information4 Invariant Statistical Decision Problems
4 Invariant Statistical Decision Problems 4.1 Invariant decision problems Let G be a group of measurable transformations from the sample space X into itself. The group operation is composition. Note that
More information12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016
12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses
More informationNotes for Lecture 15
U.C. Berkeley CS278: Computational Complexity Handout N15 Professor Luca Trevisan 10/27/2004 Notes for Lecture 15 Notes written 12/07/04 Learning Decision Trees In these notes it will be convenient to
More informationUC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes
UC Berkeley Department of Electrical Engineering and Computer Sciences EECS 6: Probability and Random Processes Problem Set 3 Spring 9 Self-Graded Scores Due: February 8, 9 Submit your self-graded scores
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationReview of Probability Theory
Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving
More information