Convergence in Distribution
|
|
- Loreen Campbell
- 5 years ago
- Views:
Transcription
1 Convergence in Distribution Undergraduate version of central limit theorem: if X 1,..., X n are iid from a population with mean µ and standard deviation σ then n 1/2 ( X µ)/σ has approximately a normal distribution. Also Binomial(n, p) random variable has approximately a N(np, np(1 p)) distribution. Precise meaning of statements like X and Y have approximately the same distribution? Desired meaning: X and Y have nearly the same cdf. But care needed. 1
2 Q1) If n is a large number is the N(0,1/n) distribution close to the distribution of X 0? Q2) Is N(0,1/n) close to the N(1/n,1/n) distribution? Q3) Is N(0,1/n) close to N(1/ n,1/n) distribution? Q4) If X n 2 n is the distribution of X n close to that of X 0? 2
3 Answers depend on how close close needs to be so it s a matter of definition. In practice the usual sort of approximation we want to make is to say that some random variable X, say, has nearly some continuous distribution, like N(0,1). So: want to know probabilities like P(X > x) are nearly P(N(0,1) > x). Real difficulty: case of discrete random variables or infinite dimensions: not done in this course. Mathematicians meaning of close: Either they can provide an upper bound on the distance between the two things or they are talking about taking a limit. In this course we take limits and use metrics. 3
4 Definition: A sequence of random variables X n taking values in a separable metric space S, d converges in distribution to a random variable X if E(g(X n )) E(g(X)) for every bounded continuous function g mapping S to the real line. Notation: : X n X. Remark: This is abusive language. It is the distributions that converge not the random variables. Example: If U is Uniform and X n = U, X = 1 U then X n converges in distribution to X. Other Jargon: weak convergence, weak convergence, convergence in law. 4
5 General Properties: If X n X and h is continuous from S 1 to S 2 then Y n = h(x n ) Y = h(x) Theorem 1 (Slutsky) If X n X, Y y o and h is continuous from S 1 S 2 to S 3 at x, y o for each x then Z n = h(x n, Y n ) Z = h(x, y) 5
6 We will begin by specializing to simplest case: S is the real line and d(x, y) = x y. In the following we suppose that X n, X are real valued random variables. Theorem 2 The following are equivalent: 1. X n converges in distribution to X. 2. P(X n x) P(X x) for each x such that P(X = x) = The limit of the characteristic functions of X n is the characteristic function of X: for every real t. E(e itxn ) E(e itx ) These are all implied by M Xn (t) M X (t) < for all t ǫ for some positive ǫ. 6
7 Now let s go back to the questions I asked: X n N(0,1/n) and X = 0. Then P(X n x) 1 x > 0 0 x < 0 1/2 x = 0 Limit is cdf of X = 0 except for x = 0; cdf of X is not continuous at x = 0. So: X n X. Does X n N(1/n,1/n) have distribution close that of Y n N(0,1/n). Find a limit X and prove both X n X and Y n X. Take X = 0. Then and E(e txn ) = e t/n+t2 /(2n) 1 = E(e tx ) E(e tyn ) = e t2 /(2n) 1 so that both X n and Y n have the same limit in distribution. 7
8 N(0,1/n) vs X=0; n= X=0 N(0,1/n) N(0,1/n) vs X=0; n= X=0 N(0,1/n)
9 N(1/n,1/n) vs N(0,1/n); n= N(0,1/n) N(1/n,1/n) N(1/n,1/n) vs N(0,1/n); n= N(0,1/n) N(1/n,1/n)
10 Multiply both X n and Y n by n 1/2 and let X N(0,1). Then nx n N(n 1/2,1) and ny n N(0,1). Use characteristic functions to prove that both nx n and nyn converge to N(0,1) in distribution. If you now let X n N(n 1/2,1/n) and Y n N(0,1/n) then again both X n and Y n converge to 0 in distribution. If you multiply X n and Y n in the previous point by n 1/2 then n 1/2 X n N(1,1) and n 1/2 Y n N(0,1) so that n 1/2 X n and n 1/2 Y n are not close together in distribution. You can check that 2 n 0 in distribution. 10
11 N(1/sqrt(n),1/n) vs N(0,1/n); n= N(0,1/n) N(1/sqrt(n),1/n) N(1/sqrt(n),1/n) vs N(0,1/n); n= N(0,1/n) N(1/sqrt(n),1/n)
12 Summary: to derive approximate distributions: Show sequence of rvs X n converges weakly to some X. The limit distribution (i.e. dstbn of X) should be non-trivial, like say N(0,1). Don t say: X n is approximately N(1/n,1/n). Do say: n 1/2 (X n 1/n) converges to N(0,1) in distribution. 12
13 The Central Limit Theorem Theorem 3 If X 1, X 2, are iid with mean 0 and variance 1 then n 1/2 X converges in distribution to N(0,1). That is, P(n 1/2 X x) 1 2π x e y2 /2 dy. Proof: We will show E(e itn1/2 X ) e t 2 /2. This is the characteristic function of N(0,1) so we are done by our theorem. 13
14 Some basic facts: If Z N(0,1) then E ( e itz) = e t2 /2 Theorem 4 If X is a real random variable with E( X k ) < then the function ψ(t) = E ( e itx) has k continuous derivatives as a function of the real variable t. (Real part and imaginary part each have that many derivatives.) Moreover for 1 j k we find ψ (j) (t) = i k E ( X k e itx) Theorem 5 (Taylor Expansion) For such an X: ψ(t) = 1 + k j=1 i j E(X j )t j /j! + R(t) where the remainder function R(t) satisfies lim t 0 R(t)/tk = 0 14
15 Finish proof: let ψ(t) = E(exp(itX 1 )): E(e it n X ) = ψ n (t/ n) Since variance is 1 and mean is 0: ψ(t) = 1 t 2 /2 + R(t) where lim t 0 R(t)/t 2 = 0. Fix t, replace t by t/ n: ψ n (t/ n) = 1 t 2 /(2n) + R(t/ n) Define x n = t 2 /2 + 2nR(t/ n). Notice x n t 2 /2 (by property of R) and use x n x implies valid for all complex x. Get to finish proof. (1 + x n /n) n e x E(e itn1/2 X ) e t2 /2. 15
16 Proof of Theorem 4: do case k = 1. Must show But lim h 0 ψ(t + h) ψ(t) h ψ(t + h) ψ(t) h = E = ie(xe itx ) ei(t+h)x e itx h Fact: e i(t+h)x e itx h X for any t. By Dominated Convergence Theorem can take limit inside integral to get ψ (t) = ie(xe itx ) 16
17 Multivariate convergence in distribution Definition: X n R p converges in distribution to X R p if E(g(X n )) E(g(X)) for each bounded continuous real valued function g on R p. This is equivalent to either of Cramér Wold Device: a T X n converges in distribution to a T X for each a R p. or Convergence of characteristic functions: E(e iat X n ) E(e iat X ) for each a R p. 17
18 Extensions of the CLT 1. Y 1, Y 2, iid in R p, mean µ, variance Σ then n 1/2 (Ȳ µ) MVN(0,Σ). 2. Lyapunov CLT: for each n X n1,..., X nn independent rvs with E(X ni ) = 0 (1) Var( X ni ) = 1 (2) i E( X ni 3 ) 0 (3) then i X ni N(0,1). i 3. Lindeberg CLT: If conds (1), (2) and E(X 2 ni 1( X ni > ǫ)) 0 each ǫ > 0 then i X ni N(0,1). (Lyapunov s condition implies Lindeberg s.) 4. Non-independent rvs: m-dependent CLT, martingale CLT, CLT for mixing processes. 5. Not sums: Slutsky s theorem, δ method. 18
19 Slutsky s Theorem in R p : If X n X and Y n converges in distribution (or in probability) to c, a constant, then X n + Y n X + c. More generally, if f(x, y) is continuous then f(x n, Y n ) f(x, c). Warning: hypothesis that limit of Y n constant is essential. Definition: Y n Y in probability if ǫ > 0: P(d(Y n, Y ) > ǫ) 0. Fact: for Y constant convergence in distribution and in probability are the same. Always convergence in probability implies convergence in distribution. Both are weaker than almost sure convergence: Definition: Y n Y almost surely if P({ω Ω : lim n Y n (ω) = Y (ω)}) = 1. 19
20 Theorem 6 (The delta method) Suppose: Sequence Y n y, a constant. If X n = a n (Y n y) then X n X for some random variable X. f is ftn defined on a neighbourhood of y R p which is differentiable at y. Then a n (f(y n ) f(y)) converges in distribution to f (y)x. If X n R p and f : R p R q then f is q p matrix of first derivatives of components of f. 20
21 Proof: The function f : R q R p is differentiable at y R q if there is a matrix Df such that lim h 0 f(y + h) f(y) Dfh h = 0 that is, for each ǫ > 0 there is a δ > 0 such that h δ implies Define and f(y + h) f(y) Dfh ǫ h. R n = a n (f(y n ) f(y)) a n Df (Y n y) S n = a n Df (Y n y) = DfX n According to Slutsky s theorem S n DfX If we now prove R n 0 then by Slutsky s theorem we find a n (f(y n ) f(y)) = S n + R n DfX 21
22 Now fix ǫ 1, ǫ 2 > 0. I claim there is K so big that for all n P(B n ) P( a n (Y n y) > K) ǫ 1. Let δ > 0 be the value in the definition of derivative corresponding to ǫ 2 /K. Choose N so large that n N implies K/a n δ. For n N we have { a n (Y n y) > K} { Y n y > δ} { R n > ǫ 2 } so that n N implies P( R n > ǫ 2 ) ǫ 1 which means R n 0 in probability. 22
23 Example: Suppose X 1,..., X n are a sample from a population with mean µ, variance σ 2, and third and fourth central moments µ 3 and µ 4. Then n 1/2 (s 2 σ 2 ) N(0, µ 4 σ 4 ) where is notation for convergence in distribution. For simplicity I define s 2 = X 2 X 2. 23
24 How to apply δ method: 1) Write statistic as a function of averages: Define See that Define W i = W n = [ X 2 i X i [ X 2 X ] ]. See that s 2 = f( W n ). f(x 1, x 2 ) = x 1 x 2 2 2) Compute mean of your averages: [ ] [ E(X 2 µ W E( W n ) = i ) µ = 2 + σ 2 E(X i ) µ ]. 3) In δ method theorem take Y n = W n and y = E(Y n ). 24
25 4) Take a n = n 1/2. 5) Use central limit theorem: n 1/2 (Y n y) MV N(0,Σ) where Σ = Var(W i ). 6) To compute Σ take expected value of (W µ W )(W µ W ) T There are 4 entries in this matrix. Top left entry is This has expectation: (X 2 µ 2 σ 2 ) 2 E { (X 2 µ 2 σ 2 ) 2} = E(X 4 ) (µ 2 + σ 2 ) 2. 25
26 Using binomial expansion: So E(X 4 ) = E{(X µ + µ) 4 } = µ 4 + 4µµ 3 + 6µ 2 σ 2 + 4µ 3 E(X µ) + µ 4. Σ 11 = µ 4 σ 4 + 4µµ 3 + 4µ 2 σ 2 Top right entry is expectation of (X 2 µ 2 σ 2 )(X µ) which is E(X 3 ) µe(x 2 ) Similar to 4th moment get µ 3 + 2µσ 2 Lower right entry is σ 2. So Σ = [ µ4 σ 4 + 4µµ 3 + 4µ 2 σ 2 µ 3 + 2µσ 2 µ 3 + 2µσ 2 σ 2 ] 26
27 7) Compute derivative (gradient) of f: has components (1, 2x 2 ). Evaluate at y = (µ 2 + σ 2, µ) to get a T = (1, 2µ). This leads to n 1/2 (s 2 σ 2 ) [ n 1/2 X [1, 2µ] 2 (µ 2 + σ 2 ) X µ which converges in distribution to ] (1, 2µ)MV N(0,Σ). This rv is N(0, a T Σa) = N(0, µ 4 σ 4 ). 27
28 Alternative approach worth pursuing. Suppose c is constant. Define X i = X i c. Then: sample variance of Xi variance of X i. is same as sample Notice all central moments of X i same as for X i. Conclusion: no loss in µ = 0. In this case: a T = (1,0) and Σ = [ µ4 σ 4 µ 3 µ 3 σ 2 ]. Notice that a T Σ = [µ 4 σ 4, µ 3 ] and a T Σa = µ 4 σ 4. 28
29 Special case: if population is N(µ, σ 2 ) then µ 3 = 0 and µ 4 = 3σ 4. Our calculation has n 1/2 (s 2 σ 2 ) N(0,2σ 4 ) You can divide through by σ 2 and get n 1/2 ( s2 1) N(0,2) σ2 In fact ns 2 /σ 2 has a χ 2 n 1 distribution and so the usual central limit theorem shows that (n 1) 1/2 [ns 2 /σ 2 (n 1)] N(0,2) (using mean of χ 2 1 is 1 and variance is 2). Factor out n to get n n 1 n1/2 (s 2 /σ 2 1)+(n 1) 1/2 N(0,2) which is δ method calculation except for some constants. Difference is unimportant: Slutsky s theorem. 29
30 Extending the ideas to higher dimensions. W 1, W 2, iid Density f(x) = exp( (x + 1))1(x > 1) Mean 0 shifted exponential. Set S k = W W k Plot against k for k = 1..n. Label x axis to run from 0 to 1. Rescale vertical axes to fit in square. 30
31 n=100 n=400 S_k/sqrt(n) S_k/sqrt(n) k/n k/n n=1600 n=10000 S_k/sqrt(n) S_k/sqrt(n) k/n k/n 31
32 Proof: of Slutsky s Theorem: First: why is it true? If X n X and Y n y then we will show (X n, Y n ) (X, y). Point is that joint law of X, y is determined by marginal laws! Once this is done then by definition. E(h(X n, Y n )) E(h(X, y)) Note: You don t need continuity for all x, y but I will do only easy case. 32
33 Definition: A family {P α, α A} of probability measures on (S, d) is tight if for each ǫ > 0 there is a K compact in S such that for every α A P(K) 1 ǫ Theorem 7 If S is a complete separable metric space then each probability measure P on the Borel sets in S is tight. Proof: Let x 1, x 2, be dense in S. For each n draw balls B n,, B n,2, 1/n around x 1, x 2,.... of radius Each point in S is in one of these balls because the x j sequence is dense. That is: S = B n,j j=1 33
34 Thus 1 = P(S) = lim J P Pick J n so large that P J n B n,j j=1 J B n,j j=1 1 ǫ/2 n Let F n be the closure of J n j=1 B n,j. Let K = n=1 F n. I claim K is compact and has probability at least 1 ǫ. First P(K) = 1 P(K c ) = 1 P ( F c n ) 1 P(F c n) 1 ǫ/2 n = 1 ǫ (Incidentally you see that K is not empty!) 34
35 Second: K closed (intersection of closed sets). Third: K is totally bounded since each F n is a cover of K by (closed) balls of radius 1/n. So K is compact. Theorem 8 If X n converge in distribution to some X in a complete separable metric space S then the sequence X n is tight. Conversely: Theorem 9 If the sequence X n is tight then every subsequence is also tight. There is a subsequence X nk and a random variable X such that as k X nk X. Theorem 10 If there is a rv X such that every subsequence of X n has a further subsubsequence converging in distribution to X then X n X. 35
36 Proof of Theorem 9: do R p. First assertion obvious. Let x 1, x 2,... be dense in R p. Find sequence n 1,1 < n 1,2 < such that the sequence F n1,k (x 1 ) has a limit which we denote y 1. Exists because probabilities trapped in [0,1]. (Bolzano-Weierstrass). Pick n 2,1 < n 2,2 < a subsequence of the n 1,k such that F n2,k (x 2 ) has a limit which we denote y 2. Continue picking subsequence n m+1,k from the sequence n m,k so that F nm+1,k (x m+1 ) has a limit which we denote y m+1. 36
37 Trick: Diagonalization. Consider the sequence n 1,1 < n 2,2 < After the kth entry all remaining are a subsequence of the kth subsequence n k,j. So for each j. lim k F n k,k (x j ) = y j Idea: would like to define F X (x j ) = y j but that might not give cdf. Instead set F X (x) = inf{y j : x j > x}. Next: prove F X is cdf. Then prove subsequence converges to F X. 37
8 Laws of large numbers
8 Laws of large numbers 8.1 Introduction We first start with the idea of standardizing a random variable. Let X be a random variable with mean µ and variance σ 2. Then Z = (X µ)/σ will be a random variable
More informationIEOR 3106: Introduction to Operations Research: Stochastic Models. Fall 2011, Professor Whitt. Class Lecture Notes: Thursday, September 15.
IEOR 3106: Introduction to Operations Research: Stochastic Models Fall 2011, Professor Whitt Class Lecture Notes: Thursday, September 15. Random Variables, Conditional Expectation and Transforms 1. Random
More informationLimiting Distributions
We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the two fundamental results
More informationUses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ).
1 Economics 620, Lecture 8a: Asymptotics II Uses of Asymptotic Distributions: Suppose X n! 0 in probability. (What can be said about the distribution of X n?) In order to get distribution theory, we need
More informationLimiting Distributions
Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the
More informationStat 5101 Lecture Slides: Deck 7 Asymptotics, also called Large Sample Theory. Charles J. Geyer School of Statistics University of Minnesota
Stat 5101 Lecture Slides: Deck 7 Asymptotics, also called Large Sample Theory Charles J. Geyer School of Statistics University of Minnesota 1 Asymptotic Approximation The last big subject in probability
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationJoint Probability Distributions and Random Samples (Devore Chapter Five)
Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete
More information7 Random samples and sampling distributions
7 Random samples and sampling distributions 7.1 Introduction - random samples We will use the term experiment in a very general way to refer to some process, procedure or natural phenomena that produces
More informationGaussian vectors and central limit theorem
Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationTheoretical Statistics. Lecture 1.
1. Organizational issues. 2. Overview. 3. Stochastic convergence. Theoretical Statistics. Lecture 1. eter Bartlett 1 Organizational Issues Lectures: Tue/Thu 11am 12:30pm, 332 Evans. eter Bartlett. bartlett@stat.
More informationAPPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2
APPM/MATH 4/5520 Solutions to Exam I Review Problems. (a) f X (x ) f X,X 2 (x,x 2 )dx 2 x 2e x x 2 dx 2 2e 2x x was below x 2, but when marginalizing out x 2, we ran it over all values from 0 to and so
More informationNotes on Random Vectors and Multivariate Normal
MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution
More informationCHAPTER 3: LARGE SAMPLE THEORY
CHAPTER 3 LARGE SAMPLE THEORY 1 CHAPTER 3: LARGE SAMPLE THEORY CHAPTER 3 LARGE SAMPLE THEORY 2 Introduction CHAPTER 3 LARGE SAMPLE THEORY 3 Why large sample theory studying small sample property is usually
More informationSTA205 Probability: Week 8 R. Wolpert
INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and
More informationFrom now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1.
Chapter 1 Metric spaces 1.1 Metric and convergence We will begin with some basic concepts. Definition 1.1. (Metric space) Metric space is a set X, with a metric satisfying: 1. d(x, y) 0, d(x, y) = 0 x
More information7 Convergence in R d and in Metric Spaces
STA 711: Probability & Measure Theory Robert L. Wolpert 7 Convergence in R d and in Metric Spaces A sequence of elements a n of R d converges to a limit a if and only if, for each ǫ > 0, the sequence a
More informationCS145: Probability & Computing
CS45: Probability & Computing Lecture 5: Concentration Inequalities, Law of Large Numbers, Central Limit Theorem Instructor: Eli Upfal Brown University Computer Science Figure credits: Bertsekas & Tsitsiklis,
More informationFormulas for probability theory and linear models SF2941
Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms
More informationlim F n(x) = F(x) will not use either of these. In particular, I m keeping reserved for implies. ) Note:
APPM/MATH 4/5520, Fall 2013 Notes 9: Convergence in Distribution and the Central Limit Theorem Definition: Let {X n } be a sequence of random variables with cdfs F n (x) = P(X n x). Let X be a random variable
More informationChapter 5. Chapter 5 sections
1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationThis does not cover everything on the final. Look at the posted practice problems for other topics.
Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry
More informationX n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)
14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence
More informationQuick Tour of Basic Probability Theory and Linear Algebra
Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions
More informationChapter 4. Chapter 4 sections
Chapter 4 sections 4.1 Expectation 4.2 Properties of Expectations 4.3 Variance 4.4 Moments 4.5 The Mean and the Median 4.6 Covariance and Correlation 4.7 Conditional Expectation SKIP: 4.8 Utility Expectation
More informationMeasure-theoretic probability
Measure-theoretic probability Koltay L. VEGTMAM144B November 28, 2012 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 1 / 27 The probability space De nition The (Ω, A, P) measure space is
More informationProbability and Measure
Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real
More informationSUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)
SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems
More informationStochastic Models (Lecture #4)
Stochastic Models (Lecture #4) Thomas Verdebout Université libre de Bruxelles (ULB) Today Today, our goal will be to discuss limits of sequences of rv, and to study famous limiting results. Convergence
More informationThe Delta Method and Applications
Chapter 5 The Delta Method and Applications 5.1 Local linear approximations Suppose that a particular random sequence converges in distribution to a particular constant. The idea of using a first-order
More information5 Operations on Multiple Random Variables
EE360 Random Signal analysis Chapter 5: Operations on Multiple Random Variables 5 Operations on Multiple Random Variables Expected value of a function of r.v. s Two r.v. s: ḡ = E[g(X, Y )] = g(x, y)f X,Y
More informationLecture 6 Basic Probability
Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic
More informationSDS : Theoretical Statistics
SDS 384 11: Theoretical Statistics Lecture 1: Introduction Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin https://psarkar.github.io/teaching Manegerial Stuff
More informationLecture 14: Multivariate mgf s and chf s
Lecture 14: Multivariate mgf s and chf s Multivariate mgf and chf For an n-dimensional random vector X, its mgf is defined as M X (t) = E(e t X ), t R n and its chf is defined as φ X (t) = E(e ıt X ),
More informationBASICS OF PROBABILITY
October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,
More informationProperties of Random Variables
Properties of Random Variables 1 Definitions A discrete random variable is defined by a probability distribution that lists each possible outcome and the probability of obtaining that outcome If the random
More informationMath 117: Infinite Sequences
Math 7: Infinite Sequences John Douglas Moore November, 008 The three main theorems in the theory of infinite sequences are the Monotone Convergence Theorem, the Cauchy Sequence Theorem and the Subsequence
More informationChapter 5 continued. Chapter 5 sections
Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationExercises Measure Theoretic Probability
Exercises Measure Theoretic Probability 2002-2003 Week 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary
More informationFourth Week: Lectures 10-12
Fourth Week: Lectures 10-12 Lecture 10 The fact that a power series p of positive radius of convergence defines a function inside its disc of convergence via substitution is something that we cannot ignore
More informationQualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama
Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Instructions This exam has 7 pages in total, numbered 1 to 7. Make sure your exam has all the pages. This exam will be 2 hours
More informationi=1 k i=1 g i (Y )] = k
Math 483 EXAM 2 covers 2.4, 2.5, 2.7, 2.8, 3.1, 3.2, 3.3, 3.4, 3.8, 3.9, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.9, 5.1, 5.2, and 5.3. The exam is on Thursday, Oct. 13. You are allowed THREE SHEETS OF NOTES and
More informationSection Taylor and Maclaurin Series
Section.0 Taylor and Maclaurin Series Ruipeng Shen Feb 5 Taylor and Maclaurin Series Main Goal: How to find a power series representation for a smooth function us assume that a smooth function has a power
More informationStatistics, Data Analysis, and Simulation SS 2015
Statistics, Data Analysis, and Simulation SS 2015 08.128.730 Statistik, Datenanalyse und Simulation Dr. Michael O. Distler Mainz, 27. April 2015 Dr. Michael O. Distler
More informationA Brief Analysis of Central Limit Theorem. SIAM Chapter Florida State University
1 / 36 A Brief Analysis of Central Limit Theorem Omid Khanmohamadi (okhanmoh@math.fsu.edu) Diego Hernán Díaz Martínez (ddiazmar@math.fsu.edu) Tony Wills (twills@math.fsu.edu) Kouadio David Yao (kyao@math.fsu.edu)
More informationLarge Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n
Large Sample Theory In statistics, we are interested in the properties of particular random variables (or estimators ), which are functions of our data. In ymptotic analysis, we focus on describing the
More information4 Expectation & the Lebesgue Theorems
STA 205: Probability & Measure Theory Robert L. Wolpert 4 Expectation & the Lebesgue Theorems Let X and {X n : n N} be random variables on a probability space (Ω,F,P). If X n (ω) X(ω) for each ω Ω, does
More informationChapter 2: Fundamentals of Statistics Lecture 15: Models and statistics
Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:
More informationLecture 2: Review of Probability
Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................
More informationd(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N
Problem 1. Let f : A R R have the property that for every x A, there exists ɛ > 0 such that f(t) > ɛ if t (x ɛ, x + ɛ) A. If the set A is compact, prove there exists c > 0 such that f(x) > c for all x
More informationChapter 5. Weak convergence
Chapter 5 Weak convergence We will see later that if the X i are i.i.d. with mean zero and variance one, then S n / p n converges in the sense P(S n / p n 2 [a, b])! P(Z 2 [a, b]), where Z is a standard
More informationMid Term-1 : Practice problems
Mid Term-1 : Practice problems These problems are meant only to provide practice; they do not necessarily reflect the difficulty level of the problems in the exam. The actual exam problems are likely to
More informationSometimes can find power series expansion of M X and read off the moments of X from the coefficients of t k /k!.
Moment Generating Functions Defn: The moment generating function of a real valued X is M X (t) = E(e tx ) defined for those real t for which the expected value is finite. Defn: The moment generating function
More informationCharacteristic Functions and the Central Limit Theorem
Chapter 6 Characteristic Functions and the Central Limit Theorem 6.1 Characteristic Functions 6.1.1 Transforms and Characteristic Functions. There are several transforms or generating functions used in
More informationn! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2
Order statistics Ex. 4. (*. Let independent variables X,..., X n have U(0, distribution. Show that for every x (0,, we have P ( X ( < x and P ( X (n > x as n. Ex. 4.2 (**. By using induction or otherwise,
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationLecture 13 and 14: Bayesian estimation theory
1 Lecture 13 and 14: Bayesian estimation theory Spring 2012 - EE 194 Networked estimation and control (Prof. Khan) March 26 2012 I. BAYESIAN ESTIMATORS Mother Nature conducts a random experiment that generates
More informationTransforms. Convergence of probability generating functions. Convergence of characteristic functions functions
Transforms For non-negative integer value ranom variables, let the probability generating function g X : [0, 1] [0, 1] be efine by g X (t) = E(t X ). The moment generating function ψ X (t) = E(e tx ) is
More informationProbability and Distributions
Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated
More informationRandom Variables. P(x) = P[X(e)] = P(e). (1)
Random Variables Random variable (discrete or continuous) is used to derive the output statistical properties of a system whose input is a random variable or random in nature. Definition Consider an experiment
More informationthe convolution of f and g) given by
09:53 /5/2000 TOPIC Characteristic functions, cont d This lecture develops an inversion formula for recovering the density of a smooth random variable X from its characteristic function, and uses that
More informationProbability and Measure
Chapter 4 Probability and Measure 4.1 Introduction In this chapter we will examine probability theory from the measure theoretic perspective. The realisation that measure theory is the foundation of probability
More informationP (A G) dp G P (A G)
First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume
More informationProblem Set 2: Solutions Math 201A: Fall 2016
Problem Set 2: s Math 201A: Fall 2016 Problem 1. (a) Prove that a closed subset of a complete metric space is complete. (b) Prove that a closed subset of a compact metric space is compact. (c) Prove that
More informationChapter 6. Characteristic functions
Chapter 6 Characteristic functions We define the characteristic function of a random variable X by ' X (t) E e x for t R. Note that ' X (t) R e x P X (dx). So if X and Y have the same law, they have the
More informationChapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability
Probability Theory Chapter 6 Convergence Four different convergence concepts Let X 1, X 2, be a sequence of (usually dependent) random variables Definition 1.1. X n converges almost surely (a.s.), or with
More informationi=1 k i=1 g i (Y )] = k f(t)dt and f(y) = F (y) except at possibly countably many points, E[g(Y )] = f(y)dy = 1, F(y) = y
Math 480 Exam 2 is Wed. Oct. 31. You are allowed 7 sheets of notes and a calculator. The exam emphasizes HW5-8, and Q5-8. From the 1st exam: The conditional probability of A given B is P(A B) = P(A B)
More informationLecture 1 Measure concentration
CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples
More informationSection 27. The Central Limit Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University
Section 27 The Central Limit Theorem Po-Ning Chen, Professor Institute of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 3000, R.O.C. Identically distributed summands 27- Central
More informationAnalysis Qualifying Exam
Analysis Qualifying Exam Spring 2017 Problem 1: Let f be differentiable on R. Suppose that there exists M > 0 such that f(k) M for each integer k, and f (x) M for all x R. Show that f is bounded, i.e.,
More informationRecall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n
Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible
More informationconditional cdf, conditional pdf, total probability theorem?
6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random
More informationMath 832 Fall University of Wisconsin at Madison. Instructor: David F. Anderson
Math 832 Fall 2013 University of Wisconsin at Madison Instructor: David F. Anderson Pertinent information Instructor: David Anderson Office: Van Vleck 617 email: anderson@math.wisc.edu Office hours: Mondays
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationExercises Measure Theoretic Probability
Exercises Measure Theoretic Probability Chapter 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary family
More informationChapter 2. Metric Spaces. 2.1 Metric Spaces
Chapter 2 Metric Spaces ddddddddddddddddddddddddd ddddddd dd ddd A metric space is a mathematical object in which the distance between two points is meaningful. Metric spaces constitute an important class
More information1 Exercises for lecture 1
1 Exercises for lecture 1 Exercise 1 a) Show that if F is symmetric with respect to µ, and E( X )
More informationµ X (A) = P ( X 1 (A) )
1 STOCHASTIC PROCESSES This appendix provides a very basic introduction to the language of probability theory and stochastic processes. We assume the reader is familiar with the general measure and integration
More informationChapter 7: Special Distributions
This chater first resents some imortant distributions, and then develos the largesamle distribution theory which is crucial in estimation and statistical inference Discrete distributions The Bernoulli
More informationMany of you got these steps reversed or otherwise out of order.
Problem 1. Let (X, d X ) and (Y, d Y ) be metric spaces. Suppose that there is a bijection f : X Y such that for all x 1, x 2 X. 1 10 d X(x 1, x 2 ) d Y (f(x 1 ), f(x 2 )) 10d X (x 1, x 2 ) Show that if
More informationEcon 508B: Lecture 5
Econ 508B: Lecture 5 Expectation, MGF and CGF Hongyi Liu Washington University in St. Louis July 31, 2017 Hongyi Liu (Washington University in St. Louis) Math Camp 2017 Stats July 31, 2017 1 / 23 Outline
More information6.1 Moment Generating and Characteristic Functions
Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,
More informationP (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n
JOINT DENSITIES - RANDOM VECTORS - REVIEW Joint densities describe probability distributions of a random vector X: an n-dimensional vector of random variables, ie, X = (X 1,, X n ), where all X is are
More informationIntroduction to Statistical Inference Self-study
Introduction to Statistical Inference Self-study Contents Definition, sample space The fundamental object in probability is a nonempty sample space Ω. An event is a subset A Ω. Definition, σ-algebra A
More informationEE4601 Communication Systems
EE4601 Communication Systems Week 2 Review of Probability, Important Distributions 0 c 2011, Georgia Institute of Technology (lect2 1) Conditional Probability Consider a sample space that consists of two
More informationThe main results about probability measures are the following two facts:
Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a
More information1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3
Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,
More informationDepartment of Mathematics
Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Supplement 2: Review Your Distributions Relevant textbook passages: Pitman [10]: pages 476 487. Larsen
More informationMath Camp II. Calculus. Yiqing Xu. August 27, 2014 MIT
Math Camp II Calculus Yiqing Xu MIT August 27, 2014 1 Sequence and Limit 2 Derivatives 3 OLS Asymptotics 4 Integrals Sequence Definition A sequence {y n } = {y 1, y 2, y 3,..., y n } is an ordered set
More informationReview of Probability Theory
Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving
More informationNotes on uniform convergence
Notes on uniform convergence Erik Wahlén erik.wahlen@math.lu.se January 17, 2012 1 Numerical sequences We begin by recalling some properties of numerical sequences. By a numerical sequence we simply mean
More informationExam P Review Sheet. for a > 0. ln(a) i=0 ari = a. (1 r) 2. (Note that the A i s form a partition)
Exam P Review Sheet log b (b x ) = x log b (y k ) = k log b (y) log b (y) = ln(y) ln(b) log b (yz) = log b (y) + log b (z) log b (y/z) = log b (y) log b (z) ln(e x ) = x e ln(y) = y for y > 0. d dx ax
More informationChapter 5 Joint Probability Distributions
Applied Statistics and Probability for Engineers Sixth Edition Douglas C. Montgomery George C. Runger Chapter 5 Joint Probability Distributions 5 Joint Probability Distributions CHAPTER OUTLINE 5-1 Two
More informationn! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2
Order statistics Ex. 4.1 (*. Let independent variables X 1,..., X n have U(0, 1 distribution. Show that for every x (0, 1, we have P ( X (1 < x 1 and P ( X (n > x 1 as n. Ex. 4.2 (**. By using induction
More information1 Review of Probability and Distributions
Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote
More informationLecture 5: Moment generating functions
Lecture 5: Moment generating functions Definition 2.3.6. The moment generating function (mgf) of a random variable X is { x e tx f M X (t) = E(e tx X (x) if X has a pmf ) = etx f X (x)dx if X has a pdf
More informationProbability on a Riemannian Manifold
Probability on a Riemannian Manifold Jennifer Pajda-De La O December 2, 2015 1 Introduction We discuss how we can construct probability theory on a Riemannian manifold. We make comparisons to this and
More information