Hoeffding, Chernoff, Bennet, and Bernstein Bounds
|
|
- Charlene Nicholson
- 6 years ago
- Views:
Transcription
1 Stat 928: Statistical Learning Theory Lecture: 6 Hoeffding, Chernoff, Bennet, Bernstein Bounds Instructor: Sham Kakade 1 Hoeffding s Bound We say X is a sub-gaussian rom variable if it has quadratically bounded logarithmic moment generating function,e.g. For a sub-gaussian rom variable, we have ln Ee λ(x µ) λ2 2 b. P ( X n µ + ɛ) e nɛ2 /2b. Similarly, P ( X n µ ɛ) e nɛ2 /2b. 2 Chernoff Bound For a binary rom variable, recall the KullbackLeibler divergence is KL(p q) = p ln(p/q) + (1 p) ln((1 p)/(1 q)). Theorem 2.1. (Relative Entropy Chernoff Bound) Assume that X [0, 1] EX = µ. We have the following inequality P ( X n µ + ɛ) e nkl(µ+ɛ µ) First, let us underst the worst case MGF for X. P ( X n µ ɛ) e nkl(µ ɛ µ), Lemma 2.2. Assume that X [0, 1] EX = µ. We have the following inequality Ee λx (1 µ)e 0 + µe λ This shows that the maximum logarithmic moment generating function is achieved with a {0, 1} valued rom variable, i.e. Ee λx E X µ[e λx ] where X is a {0, 1} valued rom variable which takes the value 1 with probability µ. Proof. Let M X (λ) = Ee λx M X (λ) = (1 µ)e 0 + µe λ. Then M X (0) = M X (0). Moreover, which completes the proof. M X(λ) = EXe λx EXe λ 1 = µe λ = M X (λ) 1
2 Now we are ready to provide the proof. Proof. By the previous lemma, we only need to prove the result for binary X {0, 1}, with mean 1. Recall from Lemma 1.4 in the previous lecture that, I(µ + ɛ) = KL(P µ+ɛ P ) where P µ+ɛ was the variational distribution P λ where λ was is set such that E X Pλ [X] = µ + ɛ. Since X is binary, it must be that P µ+ɛ is just distribution which is 1 with probability µ + ɛ. Hence KL(P µ+ɛ P ) is just the KL between two binary distributions with means µ + ɛ µ, which completes the proof. 2.1 Useful Forms of the Chernoff Bound Note that by Hoeffding s lemma (as X is sub-gaussian), we have (from Lecture 5) that KL(µ + ɛ µ) = inf λ>0 [ λ(µ + ɛ) + ln((1 µ)e0 + µe λ )] 2ɛ 2 Define V ar p be the variance of a X which is 1 with probability p 0 with probability 1 p. It is straightforward to show that the second derivative with respect to δ is: Define KL (µ + δ µ) = 1/V ar δ MaxVar[µ, µ + ɛ] = max p [µ,µ+ɛ] V ar p which provides a lower bound on the second derivative for δ between 0 ɛ. Hence, we have that: which leads to a nicer version of the Chernoff bound. KL(µ + ɛ µ) 1 2 ɛ2 /MaxVar[µ, µ + ɛ] Theorem 2.3. (Nicer Form of the Chernoff Bound) Assume that X [0, 1] EX = µ. Fix ɛ. Define: MaxVar[µ, µ + ɛ] = max p [µ,µ+ɛ] V ar p as before (i.e. it is the maximal variance (of {0, 1} variable) between µ µ + ɛ). We have the following inequality P ( X n µ + ɛ) e n ɛ 2 2 MaxVar[µ,µ+ɛ] P ( X n µ ɛ) e n ɛ 2 2 MaxVar[µ ɛ,µ] The following corollary (while always true) is much sharper bound than Hoeffding s bound when µ 0. Corollary 2.4. We have the following bound: P ( X n µ + ɛ) exp[ nɛ 2 /2(µ + ɛ)] thus P ( X n µ ɛ) exp[ nɛ 2 /2µ]. 2
3 This implies a multiplicative form of the Chernoff bound since: δ 2 P ( X n (1 + δ)µ) exp[ nµ 2(1 + δ) ] P ( X n (1 δ)µ) exp[ nµδ 2 /2] Similar results for Bernstein Bennet inequalities are available. 3 Bennet Inequality In Bennet inequality, we assume that the variable is upper bounded, want to estimate its moment generating function using variance information. Lemma 3.1. If X EX 1, then λ 0: where µ = EX ln Ee λ(x µ) (e λ λ 1)V ar(x). Proof. It suffices to prove the lemma when µ = 0. Using ln z z 1, we have ln Ee λx = ln Ee λx Ee λx 1 = λ 2 E eλx λx 1 (λx) 2 (X) 2 λ 2 E eλ λ 1 λ 2 (X) 2, where the second inequality follows from the fact that the function (e z z 1)/z 2 is non-decreasing λx λ. Lemma 3.2. We have inf [ λɛ + λ>0 (eλ λ 1)V ar(x)] = V ar(x)φ(ɛ/v ar(x)) 2(V ar(x) + ɛ/3). where φ(z) = (1 + z) ln(1 + z) z. Proof. Take derivative with respect to λ, we obtain ɛ + (e λ 1)V ar(x) = 0. Therefore λ = ln(1 + ɛ/v ar(x)). Plug in, we obtain the equality. It is easy to verify using Taylor expansion of the exponential function that for λ (0, 3): e λ λ 1 λ2 2 Now by picking λ = ɛ/(v ar(x) + ɛ/3), we have λɛ + (λ/3) m λ 2 = 2(1 λ/3). m=0 λ 2 2(1 λ/3) = ɛ2 /[2V ar(x) + 2ɛ/3]. 3 ɛ 2
4 This proves the desired bound. The above bound implies the following bound: If X EX b, for some b > 0, then P [X EX + ɛ] exp[ nɛ 2 /(2V ar(x) + 2ɛb/3)]. This is similar to the Gaussian result, except for the term 2ɛb/3. Behaves similar to Gaussian tail bound when ɛb V ar(x). 4 Bernstein Inequality In Bernstein inequality, we obtain a result similar to the simplified Bennet bound but with a moment condition. There are different forms. We consider one form. Lemma 4.1. If X satisfies the moment condition with b > 0 for integers m 2: EX m m!b m 2 V/2, then when λ (0, 1/b): thus ln Ee λx λex + 0.5λ 2 V (1 λb) 1, P [ X n EX + ɛ] exp[ nɛ 2 /(2V + 2ɛb)]. Proof. We have the following estimation of logarithmic moment generating function: ln Ee λx Ee λx 1 λex + 0.5V λ 2 m=2 b m 2 λ m 2 = λex + 0.5λ 2 V (1 λb) 1. The last inequality is similar to the proof of Bennet inequality. Exercise: finish the proof. 5 Independent but non-iid rom variables If X 1,..., X n are independent but not iid. Let X n = n 1 n i=1 X i, µ = E X n, then we have In particular, we have the following results: P ( X n µ + ɛ) inf [ λn(µ + ɛ) + n ln Ee λxi ]. λ>0 Lemma 5.1. If X i are sub-gaussians with Ee λxi λex i + 0.5λ 2 V i, then P ( X n µ + ɛ) exp [ n2 ɛ 2 ] 2 n i=1 V. i An example is Radamecher average: let σ i = {±1} be independent rom Bernoulli variables, a i be fixed numbers, then n [ P (n 1 nɛ 2 ] σ i a i ɛ) exp 2n 1 n. i=1 a2 i Similarly one can derive bounds for Bennet Bernstein inequalities. i=1 i=1 4
5 Lemma 5.2. If X i EX i b for all i, then [ P ( X n µ + ɛ) exp n 2 ɛ 2 2 n i=1 V ar(x i) + 2nbɛ/3 ]. 6 Alternative Expression Tail inequality: P (deviation ɛ) δ(ɛ). Equivalent expression: with probability 1 δ: deviation ɛ(δ), where ɛ(δ) is the inverse function of δ(ɛ). For example the Chernoff bound P ( X n µ ɛ) exp( 2nɛ 2 ) = δ, means with probability 1 δ: Xn EX ln(1/δ)/(2n). For Bennet inequality, we set thus using x + y x + y: P [ X n EX + ɛ] exp[ nɛ 2 /(2V ar(x) + 2ɛb/3)], δ = exp[ nɛ 2 /(2V ar(x) + 2ɛb/3)], ɛ = 2V ar(x) ln(1/δ)/n + b 2 ln(1/δ) 2 /(9n 2 ) + b ln(1/δ) That is, with probability at least 1 δ, we have X n EX 2V ar(x) ln(1/δ)/n + 2V ar(x) ln(1/δ)/n + 2b ln(1/δ). 2b ln(1/δ) 5
The Moment Method; Convex Duality; and Large/Medium/Small Deviations
Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential
More informationMarch 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.
Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)
More informationLecture 4: Sampling, Tail Inequalities
Lecture 4: Sampling, Tail Inequalities Variance and Covariance Moment and Deviation Concentration and Tail Inequalities Sampling and Estimation c Hung Q. Ngo (SUNY at Buffalo) CSE 694 A Fun Course 1 /
More informationSTAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)
STAT/MATH 395 A - PROBABILITY II UW Winter Quarter 07 Néhémy Lim Moment functions Moments of a random variable Definition.. Let X be a rrv on probability space (Ω, A, P). For a given r N, E[X r ], if it
More informationSymmetrization and Rademacher Averages
Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationTail and Concentration Inequalities
CSE 694: Probabilistic Analysis and Randomized Algorithms Lecturer: Hung Q. Ngo SUNY at Buffalo, Spring 2011 Last update: February 19, 2011 Tail and Concentration Ineualities From here on, we use 1 A to
More informationProbability Background
Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second
More informationLecture 4. P r[x > ce[x]] 1/c. = ap r[x = a] + a>ce[x] P r[x = a]
U.C. Berkeley CS273: Parallel and Distributed Theory Lecture 4 Professor Satish Rao September 7, 2010 Lecturer: Satish Rao Last revised September 13, 2010 Lecture 4 1 Deviation bounds. Deviation bounds
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Concentration inequalities. P(X ǫ) exp( ψ (ǫ))). Cumulant generating function bounds. Hoeffding
More informationLecture 18: March 15
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 2: Introduction to statistical learning theory. 1 / 22 Goals of statistical learning theory SLT aims at studying the performance of
More informationExample continued. Math 425 Intro to Probability Lecture 37. Example continued. Example
continued : Coin tossing Math 425 Intro to Probability Lecture 37 Kenneth Harris kaharri@umich.edu Department of Mathematics University of Michigan April 8, 2009 Consider a Bernoulli trials process with
More informationHigh Dimensional Probability
High Dimensional Probability for Mathematicians and Data Scientists Roman Vershynin 1 1 University of Michigan. Webpage: www.umich.edu/~romanv ii Preface Who is this book for? This is a textbook in probability
More informationProving the central limit theorem
SOR3012: Stochastic Processes Proving the central limit theorem Gareth Tribello March 3, 2019 1 Purpose In the lectures and exercises we have learnt about the law of large numbers and the central limit
More informationAssignment 4: Solutions
Math 340: Discrete Structures II Assignment 4: Solutions. Random Walks. Consider a random walk on an connected, non-bipartite, undirected graph G. Show that, in the long run, the walk will traverse each
More informationNovel Bernstein-like Concentration Inequalities for the Missing Mass
Novel Bernstein-like Concentration Inequalities for the Missing Mass Bahman Yari Saeed Khanloo Monash University bahman.khanloo@monash.edu Gholamreza Haffari Monash University gholamreza.haffari@monash.edu
More informationOXPORD UNIVERSITY PRESS
Concentration Inequalities A Nonasymptotic Theory of Independence STEPHANE BOUCHERON GABOR LUGOSI PASCAL MASS ART OXPORD UNIVERSITY PRESS CONTENTS 1 Introduction 1 1.1 Sums of Independent Random Variables
More informationConcentration inequalities and tail bounds
Concentration inequalities and tail bounds John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples
More information1 Review of The Learning Setting
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we
More informationConcentration Inequalities
Chapter Concentration Inequalities I. Moment generating functions, the Chernoff method, and sub-gaussian and sub-exponential random variables a. Goal for this section: given a random variable X, how does
More informationMatrix concentration inequalities
ELE 538B: Mathematics of High-Dimensional Data Matrix concentration inequalities Yuxin Chen Princeton University, Fall 2018 Recap: matrix Bernstein inequality Consider a sequence of independent random
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationTail Inequalities. The Chernoff bound works for random variables that are a sum of indicator variables with the same distribution (Bernoulli trials).
Tail Inequalities William Hunt Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV William.Hunt@mail.wvu.edu Introduction In this chapter, we are interested
More informationOn the Complexity of Best Arm Identification with Fixed Confidence
On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, joint work with Emilie Kaufmann CNRS, CRIStAL) to be presented at COLT 16, New York
More information1 Review of Probability
1 Review of Probability Random variables are denoted by X, Y, Z, etc. The cumulative distribution function (c.d.f.) of a random variable X is denoted by F (x) = P (X x), < x
More informationEE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.
EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported
More informationarxiv: v1 [math.pr] 11 Feb 2019
A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm arxiv:190.03736v1 math.pr] 11 Feb 019 Chi Jin University of California, Berkeley chijin@cs.berkeley.edu Rong Ge Duke
More informationLecture 8. October 22, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.
Lecture 8 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University October 22, 2007 1 2 3 4 5 6 1 Define convergent series 2 Define the Law of Large Numbers
More informationLecture 21: Minimax Theory
Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways
More informationSTAT/MATH 395 PROBABILITY II
STAT/MATH 395 PROBABILITY II Chapter 6 : Moment Functions Néhémy Lim 1 1 Department of Statistics, University of Washington, USA Winter Quarter 2016 of Common Distributions Outline 1 2 3 of Common Distributions
More informationLecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses
Information and Coding Theory Autumn 07 Lecturer: Madhur Tulsiani Lecture 9: October 5, 07 Lower bounds for minimax rates via multiple hypotheses In this lecture, we extend the ideas from the previous
More informationConcentration inequalities and the entropy method
Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona what is concentration? We are interested in bounding random fluctuations of functions of many
More information21.1 Lower bounds on minimax risk for functional estimation
ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 1: Functional estimation & testing Lecturer: Yihong Wu Scribe: Ashok Vardhan, Apr 14, 016 In this chapter, we will
More informationLecture 1 Measure concentration
CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem
More informationEcon 508B: Lecture 5
Econ 508B: Lecture 5 Expectation, MGF and CGF Hongyi Liu Washington University in St. Louis July 31, 2017 Hongyi Liu (Washington University in St. Louis) Math Camp 2017 Stats July 31, 2017 1 / 23 Outline
More informationECEN 5612, Fall 2007 Noise and Random Processes Prof. Timothy X Brown NAME: CUID:
Midterm ECE ECEN 562, Fall 2007 Noise and Random Processes Prof. Timothy X Brown October 23 CU Boulder NAME: CUID: You have 20 minutes to complete this test. Closed books and notes. No calculators. If
More information6.1 Moment Generating and Characteristic Functions
Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,
More informationSelf-normalized Cramér-Type Large Deviations for Independent Random Variables
Self-normalized Cramér-Type Large Deviations for Independent Random Variables Qi-Man Shao National University of Singapore and University of Oregon qmshao@darkwing.uoregon.edu 1. Introduction Let X, X
More informationStat410 Probability and Statistics II (F16)
Stat4 Probability and Statistics II (F6 Exponential, Poisson and Gamma Suppose on average every /λ hours, a Stochastic train arrives at the Random station. Further we assume the waiting time between two
More informationCombinatorial Proof of the Hot Spot Theorem
Combinatorial Proof of the Hot Spot Theorem Ernie Croot May 30, 2006 1 Introduction A problem which has perplexed mathematicians for a long time, is to decide whether the digits of π are random-looking,
More informationEE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm
EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately
More informationTail Inequalities Randomized Algorithms. Sariel Har-Peled. December 20, 2002
Tail Inequalities 497 - Randomized Algorithms Sariel Har-Peled December 0, 00 Wir mssen wissen, wir werden wissen (We must know, we shall know) David Hilbert 1 Tail Inequalities 1.1 The Chernoff Bound
More informationMachine Learning Theory Lecture 4
Machine Learning Theory Lecture 4 Nicholas Harvey October 9, 018 1 Basic Probability One of the first concentration bounds that you learn in probability theory is Markov s inequality. It bounds the right-tail
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationChapter 4 : Expectation and Moments
ECE5: Analysis of Random Signals Fall 06 Chapter 4 : Expectation and Moments Dr. Salim El Rouayheb Scribe: Serge Kas Hanna, Lu Liu Expected Value of a Random Variable Definition. The expected or average
More informationLecture 5: Random Energy Model
STAT 206A: Gibbs Measures Invited Speaker: Andrea Montanari Lecture 5: Random Energy Model Lecture date: September 2 Scribe: Sebastien Roch This is a guest lecture by Andrea Montanari (ENS Paris and Stanford)
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More information1 Dimension Reduction in Euclidean Space
CSIS0351/8601: Randomized Algorithms Lecture 6: Johnson-Lindenstrauss Lemma: Dimension Reduction Lecturer: Hubert Chan Date: 10 Oct 011 hese lecture notes are supplementary materials for the lectures.
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationContinuous Distributions
A normal distribution and other density functions involving exponential forms play the most important role in probability and statistics. They are related in a certain way, as summarized in a diagram later
More informationLecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary
ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Adversarial bandits Definition: sequential game. Lower bounds on regret from the stochastic case. Exp3: exponential weights
More informationSolutions to Problem Set 4
UC Berkeley, CS 174: Combinatorics and Discrete Probability (Fall 010 Solutions to Problem Set 4 1. (MU 5.4 In a lecture hall containing 100 people, you consider whether or not there are three people in
More informationEntropy and Ergodic Theory Lecture 15: A first look at concentration
Entropy and Ergodic Theory Lecture 15: A first look at concentration 1 Introduction to concentration Let X 1, X 2,... be i.i.d. R-valued RVs with common distribution µ, and suppose for simplicity that
More informationParametric Models: from data to models
Parametric Models: from data to models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Jan 22, 2018 Recall: Model-based ML DATA MODEL LEARNING MODEL MODEL INFERENCE KNOWLEDGE Learning:
More information1 MDP Value Iteration Algorithm
CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using
More informationExpectation, variance and moments
Expectation, variance and moments John Appleby Contents Expectation and variance Examples 3 Moments and the moment generating function 4 4 Examples of moment generating functions 5 5 Concluding remarks
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More information1 Exercises for lecture 1
1 Exercises for lecture 1 Exercise 1 a) Show that if F is symmetric with respect to µ, and E( X )
More informationLectures 6: Degree Distributions and Concentration Inequalities
University of Washington Lecturer: Abraham Flaxman/Vahab S Mirrokni CSE599m: Algorithms and Economics of Networks April 13 th, 007 Scribe: Ben Birnbaum Lectures 6: Degree Distributions and Concentration
More informationMATH 117 LECTURE NOTES
MATH 117 LECTURE NOTES XIN ZHOU Abstract. This is the set of lecture notes for Math 117 during Fall quarter of 2017 at UC Santa Barbara. The lectures follow closely the textbook [1]. Contents 1. The set
More informationUC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes
UC Berkeley Department of Electrical Engineering and Computer Sciences EECS 6: Probability and Random Processes Problem Set 3 Spring 9 Self-Graded Scores Due: February 8, 9 Submit your self-graded scores
More informationA NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER
A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER 1. The method Ashwelde and Winter [1] proposed a new approach to deviation inequalities for sums of independent random matrices. The
More informationFirst and Last Name: 2. Correct The Mistake Determine whether these equations are false, and if so write the correct answer.
. Correct The Mistake Determine whether these equations are false, and if so write the correct answer. ( x ( x (a ln + ln = ln(x (b e x e y = e xy (c (d d dx cos(4x = sin(4x 0 dx xe x = (a This is an incorrect
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...
More informationExpectation, inequalities and laws of large numbers
Chapter 3 Expectation, inequalities and laws of large numbers 3. Expectation and Variance Indicator random variable Let us suppose that the event A partitions the sample space S, i.e. A A S. The indicator
More informationExercises with solutions (Set D)
Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where
More informationCSE 312 Final Review: Section AA
CSE 312 TAs December 8, 2011 General Information General Information Comprehensive Midterm General Information Comprehensive Midterm Heavily weighted toward material after the midterm Pre-Midterm Material
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationGeneralization Bounds and Stability
Generalization Bounds and Stability Lorenzo Rosasco Tomaso Poggio 9.520 Class 9 2009 About this class Goal To recall the notion of generalization bounds and show how they can be derived from a stability
More informationProblem set 1, Real Analysis I, Spring, 2015.
Problem set 1, Real Analysis I, Spring, 015. (1) Let f n : D R be a sequence of functions with domain D R n. Recall that f n f uniformly if and only if for all ɛ > 0, there is an N = N(ɛ) so that if n
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationEE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16
EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers
More informationCSE548, AMS542: Analysis of Algorithms, Spring 2014 Date: May 12. Final In-Class Exam. ( 2:35 PM 3:50 PM : 75 Minutes )
CSE548, AMS54: Analysis of Algorithms, Spring 014 Date: May 1 Final In-Class Exam ( :35 PM 3:50 PM : 75 Minutes ) This exam will account for either 15% or 30% of your overall grade depending on your relative
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationSublinear-Time Algorithms
Lecture 20 Sublinear-Time Algorithms Supplemental reading in CLRS: None If we settle for approximations, we can sometimes get much more efficient algorithms In Lecture 8, we saw polynomial-time approximations
More informationLecture I: Asymptotics for large GUE random matrices
Lecture I: Asymptotics for large GUE random matrices Steen Thorbjørnsen, University of Aarhus andom Matrices Definition. Let (Ω, F, P) be a probability space and let n be a positive integer. Then a random
More informationProbability inequalities 11
Paninski, Intro. Math. Stats., October 5, 2005 29 Probability inequalities 11 There is an adage in probability that says that behind every limit theorem lies a probability inequality (i.e., a bound on
More informationSTA 711: Probability & Measure Theory Robert L. Wolpert
STA 711: Probability & Measure Theory Robert L. Wolpert 6 Independence 6.1 Independent Events A collection of events {A i } F in a probability space (Ω,F,P) is called independent if P[ i I A i ] = P[A
More informationCLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES
CLASSICAL PROBABILITY 2008 2. MODES OF CONVERGENCE AND INEQUALITIES JOHN MORIARTY In many interesting and important situations, the object of interest is influenced by many random factors. If we can construct
More informationChapter 7. Basic Probability Theory
Chapter 7. Basic Probability Theory I-Liang Chern October 20, 2016 1 / 49 What s kind of matrices satisfying RIP Random matrices with iid Gaussian entries iid Bernoulli entries (+/ 1) iid subgaussian entries
More informationThings to remember when learning probability distributions:
SPECIAL DISTRIBUTIONS Some distributions are special because they are useful They include: Poisson, exponential, Normal (Gaussian), Gamma, geometric, negative binomial, Binomial and hypergeometric distributions
More informationChapter 9: Basic of Hypercontractivity
Analysis of Boolean Functions Prof. Ryan O Donnell Chapter 9: Basic of Hypercontractivity Date: May 6, 2017 Student: Chi-Ning Chou Index Problem Progress 1 Exercise 9.3 (Tightness of Bonami Lemma) 2/2
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationEXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION
EXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION Luc Devroye Division of Statistics University of California at Davis Davis, CA 95616 ABSTRACT We derive exponential inequalities for the oscillation
More informationCSCI-6971 Lecture Notes: Probability theory
CSCI-6971 Lecture Notes: Probability theory Kristopher R. Beevers Department of Computer Science Rensselaer Polytechnic Institute beevek@cs.rpi.edu January 31, 2006 1 Properties of probabilities Let, A,
More informationCS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018
CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs
More informationMAS113 Introduction to Probability and Statistics
MAS113 Introduction to Probability and Statistics School of Mathematics and Statistics, University of Sheffield 2018 19 Identically distributed Suppose we have n random variables X 1, X 2,..., X n. Identically
More information8 Laws of large numbers
8 Laws of large numbers 8.1 Introduction We first start with the idea of standardizing a random variable. Let X be a random variable with mean µ and variance σ 2. Then Z = (X µ)/σ will be a random variable
More informationPolicy Gradient. U(θ) = E[ R(s t,a t );π θ ] = E[R(τ);π θ ] (1) 1 + e θ φ(s t) E[R(τ);π θ ] (3) = max. θ P(τ;θ)R(τ) (6) P(τ;θ) θ log P(τ;θ)R(τ) (9)
CS294-40 Learning for Robotics and Control Lecture 16-10/20/2008 Lecturer: Pieter Abbeel Policy Gradient Scribe: Jan Biermeyer 1 Recap Recall: H U() = E[ R(s t,a ;π ] = E[R();π ] (1) Here is a sample path
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results
Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics
More informationProbability Lecture III (August, 2006)
robability Lecture III (August, 2006) 1 Some roperties of Random Vectors and Matrices We generalize univariate notions in this section. Definition 1 Let U = U ij k l, a matrix of random variables. Suppose
More informationLecture Notes 3 Convergence (Chapter 5)
Lecture Notes 3 Convergence (Chapter 5) 1 Convergence of Random Variables Let X 1, X 2,... be a sequence of random variables and let X be another random variable. Let F n denote the cdf of X n and let
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known
More informationInformation theoretic perspectives on learning algorithms
Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez
More informationDraft. Advanced Probability Theory (Fall 2017) J.P.Kim Dept. of Statistics. Finally modified at November 28, 2017
Fall 207 Dept. of Statistics Finally modified at November 28, 207 Preface & Disclaimer This note is a summary of the lecture Advanced Probability Theory 326.729A held at Seoul National University, Fall
More informationM4L5. Expectation and Moments of Functions of Random Variable
M4L5 Expectation and Moments of Functions of Random Variable 1. Introduction This lecture is a continuation of previous lecture, elaborating expectations, moments and moment generating functions of the
More information