Lecture 1 Measure concentration

Size: px
Start display at page:

Download "Lecture 1 Measure concentration"

Transcription

1 CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples of concentration of measure. This phenomenon is very useful in analyzing machine learning algorithms and can be used to bound things like error probabilities. The first and the most standard of concentration results is for averages. It states that the average of bounded independent random variables is tightly concentrated around its expectation... Example: coin tosses Suppose a coin of unknown bias p is tossed n times: X,..., X n {0, }. Then the average of the X i is tightly concentrated around p. Specifically ( ) X X n P p n ǫ 2e 2ǫ2 n Figure. shows the quick drop-off of the probability that the sample mean deviates from its expectation. So for a large enough n, we can estimate p quite accurately. p n p p + n Figure.. Shows the exponential decay in the probability of the sample mean deviating from its expectation (p) in the coin tossing experiment...2 Example: random points in a d-dimensional box Pick a point X [, +] d uniformly at random. Then it can be shown that X is tightly concentrated around d/3. To see this we note that X = (X,..., X d ), then E X 2 = E [ X Xd 2 ] = d EXi 2 = i= d i= + 2 x2 dx = d 3 where the second equality is due to the linearity of expectation. Now since each of the X i s are independent (and bounded), we can show the concentration ( P X 2 d ) 3 ǫd 2e 2ǫ2d. -

2 This provides us with the counter-intuitive result that the volume of the high-dimensional cube tends to lie in its corners where the points have length approximately d/3. Note that above examples are special cases of Hoeffding s Inequality: Lemma (Hoeffding s inequality). Suppose X,..., X n are independent and bounded variables, such that a i X i b i. Then, [ ( ) X X n X X n P E ǫ] 2e 2ǫ2 n 2 / P i (bi ai)2. (.) n n We will soon prove a much more general version of this, which is introduced next...3 Concentration of Lipschitz functions Observing the Hoeffding bound, one might wonder whether such concentration applies only to averages of random variables. After all, what is so special about averages? It turns out that the relevant feature of the average that yields tight concentration is that it is smooth. In fact any smooth function of bounded independent random variables is tightly concentrated around its expectation. The notion of smoothness we will use is Lipschitz. Definition 2. f : R n R is λ-lipschitz w.r.t. the l p -metric if, for all x, y, f(x) f(y) λ x y p. Example. For x = (x,..., x n ), define the average: a(x) = n (x x n ). Then a( ) is (/n)-lipschitz with respect to l metric, since for any x,x, a(x) a(x ) = n [(x x ) (x n x n )] n ( x x x n x n ) = n x x. It turns out that Hoeffding s bound holds for all Lipschitz (with respect to l ) functions. Lemma 3 (Concentration of Lipschitz functions wrt l metric). Suppose X,..., X n are independent and bounded with a i x i b i Then, for any f : R n R which is λ-lipschitz w.r.t. l -metric Proof. See Section.5. P [f Ef + ǫ] e 2ǫ2 /λ 2 P i (bi ai)2 Remark. Since f is also λ-lipschitz, we can bound both above and below as P [ f Ef ǫ] 2e 2ǫ2 /λ 2 P i (bi ai)2. (.2) We now look at bounds for functions that are Lipschitz with respect to other metrics...4 Concentration of Lipschitz functions w.r.t. l 2 metric Let S d denote the surface of the unit sphere in R d, and let µ be the uniform distribution over S d. The following is known (we will prove it later in the course): -2

3 f = w f = + f is within ǫ of its median value exp Ω(ǫ 2 d) of the sphere s mass Figure.2. The function f(x) = w x is at one pole of the sphere, + at the other pole, and increases steadily from to + as one moves from one pole to the other. Since f is -Lipschitz on S d, most of the volume lies in a thin slice near the equator of the sphere (perpendicular to w). Lemma 4. Let f : S d R be λ-lipschitz w.r.t. l 2 -metric. Then, where med(f) is a median value of f. µ [f med(f) + ǫ] 4e ǫ2 d/2λ 2 (.3) One immediate consequence of (.3) is most of the volume of the sphere lies in a thin slice around the equator (for all equators!). To see this, fix any unit vector w S d. Then for X µ (this notation means X drawn from distribution µ ), E(w X) = 0 and also med(w X) = 0. Moreover, the function f(x) = w x is -Lipschitz wrt the l 2 norm: for all x, y S d, f(x) f(y) = w x w y = w (x y) w 2 x y 2 = x y 2 where the second-to-last inequality uses Cauchy-Schwarz. Thus by (.3), f is tightly concentrated around its median, i.e., µ [X : w X ǫ] 4e ǫ2 d/2. See Figure.2. Moreover, since there is nothing special about this particular w; the above bound is true for any equator!..5 Types of concentration Types of concentration we ll encounter in this course: Concentration of a product measure X = (X,...,X n ) where X i are independent and bounded, with respect to l and Hamming metric. Concentration of a uniform measure over S d, with respect to l 2 metric. Concentration of multivariate Gaussian measure, with respect to l 2 metric. -3

4 .2 Probability review.2. Warm-up problem Question. Let σ be a random permutation of {...n}. Let S be the number of fixed points of this permutation. What is the expected value and variance of S? Answer. Use n indicator random variables X i = (σ(i) = i), so that S X i. By linearity of expectation, we can solve the first problem as follows: ES = E(X + + X n ) = n EX i = i= n i= P(X i = ) n =. For the second problem, we use var(s) = E(S 2 ) (ES) 2 = E(S 2 ), and E(S 2 ) = E(X + + X n ) 2 = E Xi 2 + X i X j i i j EX 2 i + i j E(X i X j ) (linearity of expectation) n + i j n(n ) = 2 Thus var(s) =..2.2 Some basics Property 5 (Linearity of expectation). E(X +Y ) = EX +EY (holds even if X and Y are not independent). Property 6. var(x) = E(X EX) 2 = EX 2 (EX) 2. Property 7 (Jensen s inequality). If f is a convex function, then Ef(X) f(ex). Here s a picture to help you remember this enormously useful property of convex functions: Ef(X) f(ex) a EX b -4

5 Lemma 8. If X,...,X n are independent, then var(x + + X n ) = var(x ) + + var(x n ). Proof. Let X,..., X n be n independent random variables. Set Y i = X i EX i. Thus Y,..., Y n are independent with mean zero, and var(x + + X n ) = E[(X EX ) + + (X n EX n )] 2 = E(Y + + Y n ) 2 = E Yi 2 + Y i Y j i i j EY 2 i + i j EY i EY j EY 2 i E(X i EX i ) 2 var(x i ) As an example of an incorrect application, had we mistakingly assumed that the X i in the warmup problem were independent we would have found that the variance was n n instead of. Not too far off, since those X i are approximately independent (for large n). Lemma 9 (Markov s inequality). P( X a) E X a. Proof. Observe: X a ( X a); take expectations of both sides, using E[( X a)] = P( X a). Example. A simple application of Markov s inequality to the random variable S, which is always positive, is P(S k) /k. Lemma 0 (Chebyshev s inequality). Proof. Apply Markov s inequality to (X EX) 2 : P( X EX a) var(x) a 2. P( X EX a) = P((X EX) 2 a 2 ) E(X EX)2 a 2 = var(x) a 2 Example. Again, S is a strictly positive random variable, thus P(S k) /(k ) 2. Note that this is generally a better bound than that given by Markov s inequality. -5

6 .2.3 Example: symmetric random walk A symmetric random walk is a stochastic process on the line. One starts at the origin and at each time step moves either one unit to the left or one unit to the right, with equal probability. The move at time t is thus a random variable X t, where { + (right) with probability /2 X t = (left) with probability /2 Let S n = n i= X i be the position after n steps of the random walk. What are the expected value and variance of S n? The expected value of X i is 0 since we are equally likely to obtain + and, so ES n = E n X i = EX i = 0. i= Similarly, since the X i are independent, variance becomes linear as well. The variance of X i is EXi 2 =, therefore n var(s n ) = var( X i ) = var(x i ) = n. i= The standard deviation of S n is thus n; so we would expect that S n is ±O( n). We can make this more precise by using Markov s and Chebyshev s inequalities. (Markov) P( S n c n) E S n c n ES 2 n c n (Chebyshev) P( S n c n) var(s n) (c n) 2 = c 2 = var(sn ) c n = c.2.4 Moment-generating functions The Chebyshev inequality is just the Markov inequality applied to X 2 ; this often yields a better bound, as in the case of the symmetric random walk. We could similarly apply Markov s inequality to X 4, or X 6, or even higher powers of X. For the symmetric random walk, the bounds would get better and better (they would look like O(/c k ) for increasing powers of k). The natural culmination of all this is to apply Markov s inequality to e X (or, for a little flexibility, e tx, where t is a constant we will optimize). Lemma. (Chernoff s Bounding Method) Proof. Again, we use Markov s inequality, P(X c) EetX e tc for any t > 0. P(X c) = P(e tx e tc ) EetX e tc. Definition 2. The moment generating function of random variable X is the function ψ(t) = Ee tx. -6

7 Example. If X is Gaussian with mean 0 and variance, ψ(t) = e tx e x2 /2 dx = e t2 /2. 2π In general, the value Ee tx may not always be defined. However, if Ee t0x is defined for some t 0 > 0, then:. Ee tx is defined for all t < t All moments of X are finite and ψ(t) has derivatives of all orders at t = 0, with EX k = k ψ t k. t=0 3. {ψ(t), t t 0 } uniquely determines the distribution of X..3 Bounding Ee tx We can compute this expectation directly if we know the distribution of X (simply do an integral), but can we get bounds on it given just some coarse statistics of X? Lemma 3. If X [a, b] and X has mean 0, then Ee tx e t2 (b a) 2 /8. Proof. As shown in Figure.3, e tx is a convex function. a 0 Figure.3. e tx is a convex function. b If we write x = λa + ( λ)b (where 0 λ ), convexity tells us that Plugging in λ = (b x)/(b a) then gives e tx λe ta + ( λ)e tb. e tx b x b a etx + x a b a etb Take expectations of both sides, using linearity of expectation and the fact that EX = 0. Ee tx b EX b a eta + EX a b a etb = beta ae tb b a e t2 (b a) 2 /8 where the last step is just calculus. -7

8 .4 Hoeffding s Inequality Theorem 4 (Hoeffding s inequality). Let X,..., X n be independent and bounded with a i X i b i. Let S n = X + + X n. Then for any ǫ > 0, P(S n ES n ǫ) P(S n ES n ǫ) e 2ǫ2 / P i (bi ai)2 e 2ǫ2 / P i (bi ai)2 Proof. We ll just do the upper bound (lower bound proof is very similar). Define Y i = X i EX i ; then {Y i } are independent, with mean zero and range [a i EX i, b i EX i ]. For any t > 0, P(S n ES n ǫ) = P(Y + + Y n ǫ) = P(e t(y+ +Yn) e tǫ ) Eet(Y+ +Yn) by Chernoff s bounding method. Exploiting the independence of the Y i s, and using our generic bound (Lemma 3) for each Y i, we get P(S n ES n ǫ) by choosing t = 4ǫ/( (b i a i ) 2 ). Next: generalize to Lipschitz functions. EetY Ee ty2 Ee tyn.5 Concentration in metric spaces.5. Basic definitions e tǫ et2 (b a ) 2 /8 e t2 (b 2 a 2) 2 /8 e t2 (b n a n) 2 /8 e tǫ e 2ǫ2 / P (b i a i) 2 Definition 5. A metric space (S, d) consists of a set S and a function d : S S R which satisfies three properties.. d(x, y) 0, with equality iff x = y 2. d(x, y) = d(y, x) 3. d(x, z) d(x, y) + d(y, z) Example. (R n, l p -distance) is a metric space for any p. Definition 6. f : S R is λ-lipschitz if f(x) f(y) λd(x, y) for all x, y S. Now suppose that µ is a probability measure on S, and that we want to bound µ{f Ef + ǫ} = P X µ (f(x) Ef + ǫ). Once again, it would be natural to look at the moment-generating function E µ e tf = e tf(x) µ(dx). But we want a bound that holds for all Lipschitz functions, so we take the supremum of this quantity. e tǫ -8

9 Definition 7. The Laplace functional of metric measure space (S, d, µ) is L (S,d,µ) (t) = sup E µ e tf where the supremum is taken over all -Lipschitz functions with mean Metric spaces of bounded diameter We start with an analog of Lemma 3. Lemma 8. If (S, d) has bounded diameter D = sup x,y S d(x, y) <, then for any probability measure µ on S, L (S,d,µ) (t) e t2 D 2 /2. Proof. First some intuition. Pick any function f : S R which is -Lipschitz and has mean zero. Then certainly f(x) D for all x, and so Ee tf e td. The bound we seek is much tighter than this for small values of t (recall that in Hoeffding s proof we chose t = O(ǫ 2 )). To see why it is plausible, let s write out the Taylor expansion of e tf and make an unjustifiable approximation: Ee tf = E [ + tf + t2 f t3 f 3 3! ] + + tef + t2 Ef t2 D 2 2 e t2 D 2 /2. We ve exploited the fact that Ef = 0 to eliminate the first term of the series. However, notice that e t2 D 2 /2 contains all the even powers of t, and so we really need to eliminate all the odd terms in the original Taylor series. When is Ef i = 0 for odd i? Answer: when the distribution of f is symmetric around zero. Since this might not be the case, we need to explicitly symmetrize f. Now let s start the real proof. Take any -Lipschitz mean-0 function f : S R. First note that by Jensen s inequality, E µ e tf e teµf =. Let X, Y be two independent draws from distribution µ. Then: E µ e tf E µ e tf E µ e tf = E X µ e tf(x) E Y µ e tf(y ) = E X,Y µ e t(f(x) f(y )), which is just what we wanted because f(x) f(y ) has a symmetric distribution. Thus its odd powers have zero mean: [ ] E µ e tf t i t i E (f(x) f(y ))i = i! i! E(f(X) f(y t 2i ))i = (2i)! E(f(X) f(y ))2i. i=0 Now we use the fact that f(x) f(y ) D, along with the inequality (2i)! i! 2 i, to get E µ e tf t 2i D 2i ( t 2 D 2 ) i = e t2 D 2 /2, (2i)! 2 i! and we re done. i=0 i=0 i=0 In fact, by being a little more careful and using the same technique as in Lemma 3, we can get a slightly better bound. Lemma 9. Under the same conditions as Lemma 8, L (S,d,µ) (t) e t2 D 2 /8. We will apply this lemma to individual coordinates, as we did in Hoeffing s proof. i=0-9

10 .5.3 Product spaces Lemma 20. If (S, d) and (T, δ) are metric spaces so is (S T, d + δ). Example. S = T = R and d(x, y) = x y = δ(x, y). In this case, the metric on the product space is l distance. Definition 2. If µ is a measure on S and ν is a measure on T, let µ ν denote the product measure on S T, i.e., which satisfies (µ ν)(a B) = µ(a)ν(b) for all measurable A S, B T. Lemma 22. If (S, d, µ) and (T, δ, ν) are metric measure spaces then L (S T,d+δ,µ ν) (t) L (S,d,µ) (t) L (T,δ,ν) (t). Proof. Pick any -Lipschitz f : S T R which has mean zero. For any y T, define f(y) = E X µ f(x, y). Then f has mean zero, over Y ν. Moreover, it is -Lipschitz on (T, δ) since for any y, y T, f(y) f(y ) = E X µ [f(x, y)] E X µ [f(x, y )] = E X µ [f(x, y) f(x, y )] δ(y, y ) (the last step uses the fact that f is -Lipschitz). Now for any fixed y, the function f(x, y) f(y) is -Lipschitz on (S, d) and has mean zero over X µ. Therefore, [ E µ ν e tf = E X µ E Y ν e tf(y ) t(f(x,y ) f(y e ))] [ = E Y ν e tf(y ) t(f(x,y ) f(y E X µ e ))] ] E Y ν [e tf(y ) L S,d,µ (t) L (S,d,µ) (t) L (T,δ,ν) (t). Theorem 23. Let (S, d, µ ),..., (S n, d n, µ n ) be metric measure spaces of bounded diameters D i <. Let S = (S S 2 S n, d + d d n ) be the product space and µ = µ µ 2 µ n the product measure. Then for any -Lipschitz function f : S R, µ {f Ef + ǫ} e 2ǫ2 / P D 2 i. Proof. Combining Lemmas 9 and 22, we see that L (S,d,µ) (t) e (t2 /8)( P i D2 i ). Now it is a simple matter of applying Chernoff s bounding method, using the fact that f Ef is -Lipschitz with mean zero: and the rest is algebra. µ {f Ef ǫ} = µ {e t(f Ef) e tǫ} E µe t(f Ef) e tǫ L (S,d,µ)(t) e tǫ Example. Take S i = R and d i (x, y) = x y. Then S = R n and d(x, y) = x y. This leads to the following corollary. Corollary 24. Let X,..., X n be independent and bounded with a i X i b i. Then for any -Lipschitz function f : R n R with respect to the l metric, P ( f(x,..., X n ) Ef ǫ) 2e 2ǫ2 / P (b i a i) 2. Remark. Hoeffding s inequality is a special case of this corollary where f(x,..., x n ) = x + + x n. -0

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1). Name M362K Final Exam Instructions: Show all of your work. You do not have to simplify your answers. No calculators allowed. There is a table of formulae on the last page. 1. Suppose X 1,..., X 1 are independent

More information

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang. Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)

More information

Chapter 4. Chapter 4 sections

Chapter 4. Chapter 4 sections Chapter 4 sections 4.1 Expectation 4.2 Properties of Expectations 4.3 Variance 4.4 Moments 4.5 The Mean and the Median 4.6 Covariance and Correlation 4.7 Conditional Expectation SKIP: 4.8 Utility Expectation

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

X = X X n, + X 2

X = X X n, + X 2 CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 22 Variance Question: At each time step, I flip a fair coin. If it comes up Heads, I walk one step to the right; if it comes up Tails, I walk

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

6.1 Moment Generating and Characteristic Functions

6.1 Moment Generating and Characteristic Functions Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,

More information

Lecture 4 Lebesgue spaces and inequalities

Lecture 4 Lebesgue spaces and inequalities Lecture 4: Lebesgue spaces and inequalities 1 of 10 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 4 Lebesgue spaces and inequalities Lebesgue spaces We have seen how

More information

Probability inequalities 11

Probability inequalities 11 Paninski, Intro. Math. Stats., October 5, 2005 29 Probability inequalities 11 There is an adage in probability that says that behind every limit theorem lies a probability inequality (i.e., a bound on

More information

Proving the central limit theorem

Proving the central limit theorem SOR3012: Stochastic Processes Proving the central limit theorem Gareth Tribello March 3, 2019 1 Purpose In the lectures and exercises we have learnt about the law of large numbers and the central limit

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

Concentration inequalities and the entropy method

Concentration inequalities and the entropy method Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona what is concentration? We are interested in bounding random fluctuations of functions of many

More information

Chapter 4. Continuous Random Variables 4.1 PDF

Chapter 4. Continuous Random Variables 4.1 PDF Chapter 4 Continuous Random Variables In this chapter we study continuous random variables. The linkage between continuous and discrete random variables is the cumulative distribution (CDF) which we will

More information

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2 Order statistics Ex. 4. (*. Let independent variables X,..., X n have U(0, distribution. Show that for every x (0,, we have P ( X ( < x and P ( X (n > x as n. Ex. 4.2 (**. By using induction or otherwise,

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Lecture 4. P r[x > ce[x]] 1/c. = ap r[x = a] + a>ce[x] P r[x = a]

Lecture 4. P r[x > ce[x]] 1/c. = ap r[x = a] + a>ce[x] P r[x = a] U.C. Berkeley CS273: Parallel and Distributed Theory Lecture 4 Professor Satish Rao September 7, 2010 Lecturer: Satish Rao Last revised September 13, 2010 Lecture 4 1 Deviation bounds. Deviation bounds

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

1 Dimension Reduction in Euclidean Space

1 Dimension Reduction in Euclidean Space CSIS0351/8601: Randomized Algorithms Lecture 6: Johnson-Lindenstrauss Lemma: Dimension Reduction Lecturer: Hubert Chan Date: 10 Oct 011 hese lecture notes are supplementary materials for the lectures.

More information

Outline. Martingales. Piotr Wojciechowski 1. 1 Lane Department of Computer Science and Electrical Engineering West Virginia University.

Outline. Martingales. Piotr Wojciechowski 1. 1 Lane Department of Computer Science and Electrical Engineering West Virginia University. Outline Piotr 1 1 Lane Department of Computer Science and Electrical Engineering West Virginia University 8 April, 01 Outline Outline 1 Tail Inequalities Outline Outline 1 Tail Inequalities General Outline

More information

Sometimes can find power series expansion of M X and read off the moments of X from the coefficients of t k /k!.

Sometimes can find power series expansion of M X and read off the moments of X from the coefficients of t k /k!. Moment Generating Functions Defn: The moment generating function of a real valued X is M X (t) = E(e tx ) defined for those real t for which the expected value is finite. Defn: The moment generating function

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20 CS 70 Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20 Today we shall discuss a measure of how close a random variable tends to be to its expectation. But first we need to see how to compute

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Problem Set 0 Solutions

Problem Set 0 Solutions CS446: Machine Learning Spring 2017 Problem Set 0 Solutions Handed Out: January 25 th, 2017 Handed In: NONE 1. [Probability] Assume that the probability of obtaining heads when tossing a coin is λ. a.

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

Concentration Inequalities

Concentration Inequalities Chapter Concentration Inequalities I. Moment generating functions, the Chernoff method, and sub-gaussian and sub-exponential random variables a. Goal for this section: given a random variable X, how does

More information

Lecture 4: Inequalities and Asymptotic Estimates

Lecture 4: Inequalities and Asymptotic Estimates CSE 713: Random Graphs and Applications SUNY at Buffalo, Fall 003 Lecturer: Hung Q. Ngo Scribe: Hung Q. Ngo Lecture 4: Inequalities and Asymptotic Estimates We draw materials from [, 5, 8 10, 17, 18].

More information

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions 18.175: Lecture 8 Weak laws and moment-generating/characteristic functions Scott Sheffield MIT 18.175 Lecture 8 1 Outline Moment generating functions Weak law of large numbers: Markov/Chebyshev approach

More information

Stochastic Processes and Monte-Carlo Methods. University of Massachusetts: Spring 2018 version. Luc Rey-Bellet

Stochastic Processes and Monte-Carlo Methods. University of Massachusetts: Spring 2018 version. Luc Rey-Bellet Stochastic Processes and Monte-Carlo Methods University of Massachusetts: Spring 2018 version Luc Rey-Bellet April 5, 2018 Contents 1 Simulation and the Monte-Carlo method 3 1.1 Review of probability............................

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Introduction to Algebraic and Geometric Topology Week 3

Introduction to Algebraic and Geometric Topology Week 3 Introduction to Algebraic and Geometric Topology Week 3 Domingo Toledo University of Utah Fall 2017 Lipschitz Maps I Recall f :(X, d)! (X 0, d 0 ) is Lipschitz iff 9C > 0 such that d 0 (f (x), f (y)) apple

More information

Conditional distributions (discrete case)

Conditional distributions (discrete case) Conditional distributions (discrete case) The basic idea behind conditional distributions is simple: Suppose (XY) is a jointly-distributed random vector with a discrete joint distribution. Then we can

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 17-27 Review Scott Sheffield MIT 1 Outline Continuous random variables Problems motivated by coin tossing Random variable properties 2 Outline Continuous random variables Problems

More information

3. Review of Probability and Statistics

3. Review of Probability and Statistics 3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture

More information

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Expectation. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Expectation DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean, variance,

More information

Limiting Distributions

Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the two fundamental results

More information

ψ(t) := log E[e tx ] (1) P X j x e Nψ (x) j=1

ψ(t) := log E[e tx ] (1) P X j x e Nψ (x) j=1 Seminars 15/01/018 Ex 1. Exponential estimate Let X be a real valued random variable, and suppose that there exists r > 0 such that E[e rx ]

More information

CSE 525 Randomized Algorithms & Probabilistic Analysis Spring Lecture 3: April 9

CSE 525 Randomized Algorithms & Probabilistic Analysis Spring Lecture 3: April 9 CSE 55 Randomized Algorithms & Probabilistic Analysis Spring 01 Lecture : April 9 Lecturer: Anna Karlin Scribe: Tyler Rigsby & John MacKinnon.1 Kinds of randomization in algorithms So far in our discussion

More information

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Fourier and Stats / Astro Stats and Measurement : Stats Notes Fourier and Stats / Astro Stats and Measurement : Stats Notes Andy Lawrence, University of Edinburgh Autumn 2013 1 Probabilities, distributions, and errors Laplace once said Probability theory is nothing

More information

Lecture 22: Variance and Covariance

Lecture 22: Variance and Covariance EE5110 : Probability Foundations for Electrical Engineers July-November 2015 Lecture 22: Variance and Covariance Lecturer: Dr. Krishna Jagannathan Scribes: R.Ravi Kiran In this lecture we will introduce

More information

Selected Exercises on Expectations and Some Probability Inequalities

Selected Exercises on Expectations and Some Probability Inequalities Selected Exercises on Expectations and Some Probability Inequalities # If E(X 2 ) = and E X a > 0, then P( X λa) ( λ) 2 a 2 for 0 < λ

More information

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation) MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation) Last modified: March 7, 2009 Reference: PRP, Sections 3.6 and 3.7. 1. Tail-Sum Theorem

More information

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal

More information

EE514A Information Theory I Fall 2013

EE514A Information Theory I Fall 2013 EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/

More information

Entropy and Ergodic Theory Lecture 15: A first look at concentration

Entropy and Ergodic Theory Lecture 15: A first look at concentration Entropy and Ergodic Theory Lecture 15: A first look at concentration 1 Introduction to concentration Let X 1, X 2,... be i.i.d. R-valued RVs with common distribution µ, and suppose for simplicity that

More information

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15) Problem 1: Chernoff Bounds via Negative Dependence - from MU Ex 5.15) While deriving lower bounds on the load of the maximum loaded bin when n balls are thrown in n bins, we saw the use of negative dependence.

More information

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write Lecture 3: Expected Value 1.) Definitions. If X 0 is a random variable on (Ω, F, P), then we define its expected value to be EX = XdP. Notice that this quantity may be. For general X, we say that EX exists

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

Part II Probability and Measure

Part II Probability and Measure Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Lecture 5: Moment generating functions

Lecture 5: Moment generating functions Lecture 5: Moment generating functions Definition 2.3.6. The moment generating function (mgf) of a random variable X is { x e tx f M X (t) = E(e tx X (x) if X has a pmf ) = etx f X (x)dx if X has a pdf

More information

Moments. Raw moment: February 25, 2014 Normalized / Standardized moment:

Moments. Raw moment: February 25, 2014 Normalized / Standardized moment: Moments Lecture 10: Central Limit Theorem and CDFs Sta230 / Mth 230 Colin Rundel Raw moment: Central moment: µ n = EX n ) µ n = E[X µ) 2 ] February 25, 2014 Normalized / Standardized moment: µ n σ n Sta230

More information

Lecture Notes on Metric Spaces

Lecture Notes on Metric Spaces Lecture Notes on Metric Spaces Math 117: Summer 2007 John Douglas Moore Our goal of these notes is to explain a few facts regarding metric spaces not included in the first few chapters of the text [1],

More information

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Expectation. DS GA 1002 Probability and Statistics for Data Science.   Carlos Fernandez-Granda Expectation DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean,

More information

The Canonical Gaussian Measure on R

The Canonical Gaussian Measure on R The Canonical Gaussian Measure on R 1. Introduction The main goal of this course is to study Gaussian measures. The simplest example of a Gaussian measure is the canonical Gaussian measure P on R where

More information

I forgot to mention last time: in the Ito formula for two standard processes, putting

I forgot to mention last time: in the Ito formula for two standard processes, putting I forgot to mention last time: in the Ito formula for two standard processes, putting dx t = a t dt + b t db t dy t = α t dt + β t db t, and taking f(x, y = xy, one has f x = y, f y = x, and f xx = f yy

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

Stochastic Models of Manufacturing Systems

Stochastic Models of Manufacturing Systems Stochastic Models of Manufacturing Systems Ivo Adan Organization 2/47 7 lectures (lecture of May 12 is canceled) Studyguide available (with notes, slides, assignments, references), see http://www.win.tue.nl/

More information

17. Convergence of Random Variables

17. Convergence of Random Variables 7. Convergence of Random Variables In elementary mathematics courses (such as Calculus) one speaks of the convergence of functions: f n : R R, then lim f n = f if lim f n (x) = f(x) for all x in R. This

More information

Lecture 11. Probability Theory: an Overveiw

Lecture 11. Probability Theory: an Overveiw Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the

More information

Appendix B: Inequalities Involving Random Variables and Their Expectations

Appendix B: Inequalities Involving Random Variables and Their Expectations Chapter Fourteen Appendix B: Inequalities Involving Random Variables and Their Expectations In this appendix we present specific properties of the expectation (additional to just the integral of measurable

More information

Stochastic Processes and Monte-Carlo Methods. University of Massachusetts: Spring Luc Rey-Bellet

Stochastic Processes and Monte-Carlo Methods. University of Massachusetts: Spring Luc Rey-Bellet Stochastic Processes and Monte-Carlo Methods University of Massachusetts: Spring 2010 Luc Rey-Bellet Contents 1 Random variables and Monte-Carlo method 3 1.1 Review of probability............................

More information

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2 Order statistics Ex. 4.1 (*. Let independent variables X 1,..., X n have U(0, 1 distribution. Show that for every x (0, 1, we have P ( X (1 < x 1 and P ( X (n > x 1 as n. Ex. 4.2 (**. By using induction

More information

Math 127: Course Summary

Math 127: Course Summary Math 27: Course Summary Rich Schwartz October 27, 2009 General Information: M27 is a course in functional analysis. Functional analysis deals with normed, infinite dimensional vector spaces. Usually, these

More information

11.1 Set Cover ILP formulation of set cover Deterministic rounding

11.1 Set Cover ILP formulation of set cover Deterministic rounding CS787: Advanced Algorithms Lecture 11: Randomized Rounding, Concentration Bounds In this lecture we will see some more examples of approximation algorithms based on LP relaxations. This time we will use

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

Lecture 4: Sampling, Tail Inequalities

Lecture 4: Sampling, Tail Inequalities Lecture 4: Sampling, Tail Inequalities Variance and Covariance Moment and Deviation Concentration and Tail Inequalities Sampling and Estimation c Hung Q. Ngo (SUNY at Buffalo) CSE 694 A Fun Course 1 /

More information

Convergence in Distribution

Convergence in Distribution Convergence in Distribution Undergraduate version of central limit theorem: if X 1,..., X n are iid from a population with mean µ and standard deviation σ then n 1/2 ( X µ)/σ has approximately a normal

More information

Econ 508B: Lecture 5

Econ 508B: Lecture 5 Econ 508B: Lecture 5 Expectation, MGF and CGF Hongyi Liu Washington University in St. Louis July 31, 2017 Hongyi Liu (Washington University in St. Louis) Math Camp 2017 Stats July 31, 2017 1 / 23 Outline

More information

Lecture 6 Basic Probability

Lecture 6 Basic Probability Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic

More information

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

The Moment Method; Convex Duality; and Large/Medium/Small Deviations Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential

More information

High Dimensional Probability

High Dimensional Probability High Dimensional Probability for Mathematicians and Data Scientists Roman Vershynin 1 1 University of Michigan. Webpage: www.umich.edu/~romanv ii Preface Who is this book for? This is a textbook in probability

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 18

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 18 EECS 7 Discrete Mathematics and Probability Theory Spring 214 Anant Sahai Note 18 A Brief Introduction to Continuous Probability Up to now we have focused exclusively on discrete probability spaces Ω,

More information

STAT 414: Introduction to Probability Theory

STAT 414: Introduction to Probability Theory STAT 414: Introduction to Probability Theory Spring 2016; Homework Assignments Latest updated on April 29, 2016 HW1 (Due on Jan. 21) Chapter 1 Problems 1, 8, 9, 10, 11, 18, 19, 26, 28, 30 Theoretical Exercises

More information

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued Chapter 3 sections Chapter 3 - continued 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions

More information

Solution for Problem 7.1. We argue by contradiction. If the limit were not infinite, then since τ M (ω) is nondecreasing we would have

Solution for Problem 7.1. We argue by contradiction. If the limit were not infinite, then since τ M (ω) is nondecreasing we would have 362 Problem Hints and Solutions sup g n (ω, t) g(ω, t) sup g(ω, s) g(ω, t) µ n (ω). t T s,t: s t 1/n By the uniform continuity of t g(ω, t) on [, T], one has for each ω that µ n (ω) as n. Two applications

More information

Expectation is a positive linear operator

Expectation is a positive linear operator Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 6: Expectation is a positive linear operator Relevant textbook passages: Pitman [3]: Chapter

More information

Tail and Concentration Inequalities

Tail and Concentration Inequalities CSE 694: Probabilistic Analysis and Randomized Algorithms Lecturer: Hung Q. Ngo SUNY at Buffalo, Spring 2011 Last update: February 19, 2011 Tail and Concentration Ineualities From here on, we use 1 A to

More information

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real

More information

Exponential Tail Bounds

Exponential Tail Bounds Exponential Tail Bounds Mathias Winther Madsen January 2, 205 Here s a warm-up problem to get you started: Problem You enter the casino with 00 chips and start playing a game in which you double your capital

More information

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued Chapter 3 sections 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions 3.6 Conditional

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

The Johnson-Lindenstrauss Lemma

The Johnson-Lindenstrauss Lemma The Johnson-Lindenstrauss Lemma Kevin (Min Seong) Park MAT477 Introduction The Johnson-Lindenstrauss Lemma was first introduced in the paper Extensions of Lipschitz mappings into a Hilbert Space by William

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

Chapter 4 continued. Chapter 4 sections

Chapter 4 continued. Chapter 4 sections Chapter 4 sections Chapter 4 continued 4.1 Expectation 4.2 Properties of Expectations 4.3 Variance 4.4 Moments 4.5 The Mean and the Median 4.6 Covariance and Correlation 4.7 Conditional Expectation SKIP:

More information

STAT 418: Probability and Stochastic Processes

STAT 418: Probability and Stochastic Processes STAT 418: Probability and Stochastic Processes Spring 2016; Homework Assignments Latest updated on April 29, 2016 HW1 (Due on Jan. 21) Chapter 1 Problems 1, 8, 9, 10, 11, 18, 19, 26, 28, 30 Theoretical

More information

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example continued : Coin tossing Math 425 Intro to Probability Lecture 37 Kenneth Harris kaharri@umich.edu Department of Mathematics University of Michigan April 8, 2009 Consider a Bernoulli trials process with

More information

Common-Knowledge / Cheat Sheet

Common-Knowledge / Cheat Sheet CSE 521: Design and Analysis of Algorithms I Fall 2018 Common-Knowledge / Cheat Sheet 1 Randomized Algorithm Expectation: For a random variable X with domain, the discrete set S, E [X] = s S P [X = s]

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

Lecture 2 Sep 5, 2017

Lecture 2 Sep 5, 2017 CS 388R: Randomized Algorithms Fall 2017 Lecture 2 Sep 5, 2017 Prof. Eric Price Scribe: V. Orestis Papadigenopoulos and Patrick Rall NOTE: THESE NOTES HAVE NOT BEEN EDITED OR CHECKED FOR CORRECTNESS 1

More information

4 Expectation & the Lebesgue Theorems

4 Expectation & the Lebesgue Theorems STA 205: Probability & Measure Theory Robert L. Wolpert 4 Expectation & the Lebesgue Theorems Let X and {X n : n N} be random variables on a probability space (Ω,F,P). If X n (ω) X(ω) for each ω Ω, does

More information

We introduce methods that are useful in:

We introduce methods that are useful in: Instructor: Shengyu Zhang Content Derived Distributions Covariance and Correlation Conditional Expectation and Variance Revisited Transforms Sum of a Random Number of Independent Random Variables more

More information