Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Similar documents
Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Probability Theory and Statistics. Peter Jochumzen

ENGG2430A-Homework 2

matrix-free Elements of Probability Theory 1 Random Variables and Distributions Contents Elements of Probability Theory 2

Multivariate Random Variable

STOR Lecture 16. Properties of Expectation - I

STT 441 Final Exam Fall 2013

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Multiple Random Variables

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Math 510 midterm 3 answers

Introduction to Probability and Stocastic Processes - Part I

Elements of Probability Theory

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Notes on Random Processes

Properties of Summation Operator

Problem 1. Problem 2. Problem 3. Problem 4

Stat 5101 Notes: Algorithms (thru 2nd midterm)

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

STAT/MATH 395 PROBABILITY II

ECON 3150/4150, Spring term Lecture 6

Random Variables and Their Distributions

ECE Lecture #9 Part 2 Overview

More than one variable

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Lecture 2: Probability

conditional cdf, conditional pdf, total probability theorem?

Lecture 11. Probability Theory: an Overveiw

BASICS OF PROBABILITY

Chapter 5 continued. Chapter 5 sections

Stat 5101 Notes: Algorithms

Probability Lecture III (August, 2006)

5 Operations on Multiple Random Variables

Chapter 4 Multiple Random Variables

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Some Concepts of Probability (Review) Volker Tresp Summer 2018

STA 2201/442 Assignment 2

Final Exam # 3. Sta 230: Probability. December 16, 2012

Practice Examination # 3

LIST OF FORMULAS FOR STK1100 AND STK1110

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Continuous Random Variables

Bivariate Distributions. Discrete Bivariate Distribution Example

Review of Probability Theory

14.30 Introduction to Statistical Methods in Economics Spring 2009

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Class 8 Review Problems solutions, 18.05, Spring 2014

Lecture 2: Repetition of probability theory and statistics

Bivariate distributions

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

Exam P Review Sheet. for a > 0. ln(a) i=0 ari = a. (1 r) 2. (Note that the A i s form a partition)

The Multivariate Normal Distribution. In this case according to our theorem

Lecture 2: Review of Probability

Recitation 2: Probability

Introduction to Normal Distribution

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Lecture 14: Multivariate mgf s and chf s

2 (Statistics) Random variables

Bivariate Distributions

Homework 5 Solutions

STAT215: Solutions for Homework 2

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B

01 Probability Theory and Statistics Review

Formulas for probability theory and linear models SF2941

Problem Set #5. Econ 103. Solution: By the complement rule p(0) = 1 p q. q, 1 x 0 < 0 1 p, 0 x 0 < 1. Solution: E[X] = 1 q + 0 (1 p q) + p 1 = p q

Chp 4. Expectation and Variance

4. Distributions of Functions of Random Variables

Actuarial Science Exam 1/P

CHAPTER 4 MATHEMATICAL EXPECTATION. 4.1 Mean of a Random Variable

ECSE B Solutions to Assignment 8 Fall 2008

System Identification, Lecture 4

1 Probability theory. 2 Random variables and probability theory.

Lecture 2: Probability

Preliminary Statistics. Lecture 3: Probability Models and Distributions

MAS223 Statistical Modelling and Inference Examples

Lecture 1: Bayesian Framework Basics

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

Mathematics 426 Robert Gross Homework 9 Answers

Machine learning - HT Maximum Likelihood

Notes on Random Vectors and Multivariate Normal

18 Bivariate normal distribution I

Probability and Distributions

Jointly Distributed Random Variables

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Mathematical statistics: Estimation theory

4. CONTINUOUS RANDOM VARIABLES

Quick Tour of Basic Probability Theory and Linear Algebra

MULTIVARIATE PROBABILITY DISTRIBUTIONS

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Random Vectors and Random Sampling. 1+ x2 +y 2 ) (n+2)/2

System Identification, Lecture 4

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Statistics GIDP Ph.D. Qualifying Exam Theory Jan 11, 2016, 9:00am-1:00pm

Chapter 4 continued. Chapter 4 sections

3. Probability and Statistics

Random Variables. P(x) = P[X(e)] = P(e). (1)

STAT Chapter 5 Continuous Distributions

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

1. Fisher Information

Introduction to Machine Learning

Transcription:

Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012

Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 1 / 24

Joint CDF, cont. The joint Cumulative distribution function follows the same rules as the univariate CDF, Univariate definition: x F (x) = P (X x) = f(z)dz lim F (x) = 0 x lim F (x) = 1 x x y F (x) F (y) Bivariate definition: y x F (x, y) = P (X x, Y y) = f(x, y) dx dy lim F (x, y) = 0 lim x,y F (x, y) = 1 x x,y x, y y F (x, y) F (x, y ) Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 2 / 24

Marginal Distributions We can define marginal distributions based on the CDF by setting one of the values to infinity: F (x, ) = P (X x, Y ) = = P (X x) = F X (x) x f(x, y) dy dy F (, y) = P (X, Y y) = = P (Y y) = F Y (y) y f(x, y) dx dy Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 3 / 24

Joint pdf Similar to the CDF the probability density function follows the same general rules except in two dimensions, Univariate definition: f(x) 0 for all x f(x) = d dx F (x) f(x)dx = 1 Bivariate definition: f(x, y) 0 for all (x, y) f(x, y) = F (x, y) x y f(x, y) dx dy = 1 Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 4 / 24

Marginal pdfs Marginal probability density functions are defined in terms of integrating out one of the random variables. f X (x) = f Y (x) = f(x, y) dy f(x, y) dx Previously we defined independence in terms of E(XY ) = E(X)E(Y ) X and Y are independent. This is equivalent in the joint case of f(x, y) = f X (x)f Y (y) X and Y are independent. Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 5 / 24

Probability and Expectation Univariate definition: Bivariate definition: P (X A) = f(x) dx A E[g(X)] = g(x) f(x) dx P (X A, Y B) = f(x, y) dx dy A B E[g(X, Y )] = g(x, y) f(x, y) dx dy Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 6 / 24

Bivariate Unit Normal If X and Y have independent unit normal distributions then their joint distribution f(x, y) is given by f(x, y) = φ(x)φ(y) = 1 2π e 1 2 (x2 +y 2 ) Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 7 / 24

Rayleigh Distribution The distance from the origin, R, of a point (x, y) derived from X, Y N (0, 1) is called the Rayleigh distribution. f R(r) = re 1 2 r2 F r(r) = r re 1 2 r2 0 = e 1 2 r2 r V ar(r) = 4 π 2 0 dr = 1 e 1 2 r2 E(R) = π 2 Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 8 / 24

Sum of Independent Normals Let X N (µ, σ 2 ) and Y N (λ, τ) then M X(t) = exp(µt + 1 2 σ2 t 2 ) M Y (t) = exp(λt + 1 2 τ 2 t 2 ) M X+Y (t) = exp(µt + 1 2 σ2 t 2 ) exp(λt + 1 2 τ 2 t 2 ) = exp((µ + λ)t + 1 2 (σ2 + τ 2 )t 2 Therefore X + Y N (µ + λ, σ 2 + τ 2 ), which can be generalized to any number of normals. Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 9 / 24

Operation Distributions If X, Y have a joint density f(x, y), then Z = g(x, Y ) has a density f g(x,y ) (z) which is determined by integrating over all values of X or Y and then considering a change of variables for the other to Z. Distribution of Sums - Z = X + Y : / dz f X+Y (z) = f(x, z/x) dy dx = f(x, z x) dx Distribution of Ratios - Z = Y/X: / dz f Y/X) (z) = f(x, z/x) dy dx = x f(x, zx) dx Product Distribution - Z = XY : / dz f XY (z) = f(x, z/x) dy dx = f(x, z/x)/ x dx Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 10 / 24

Conditional Expectation If X and Y are independent random variables then we define the conditional expectation as follows E(X Y = y) = all y xf(x y) dx E(X Y = y) = xf(x y) dx We can think of the conditional expectation as being a function of the random variable Y, thereby making E(X Y ) itself a random variable E(E(X Y )) = E(X) Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 11 / 24

Conditional Variance Similar to conditional expectation we can also define conditional variance V ar(y X) = E [(Y E(Y X)) 2 ] X = E(Y 2 X) E(Y X) 2 there is also an equivalent to the law of total probability for expectations, the Law of Total Probability for Variance: V ar(y ) = E[V ar(y X)] + V ar[e(y X)] Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 12 / 24

Covariance We have previously discussed Covariance in relation to the variance of the sum of two random variables (Review Lecture 8). V ar(x + Y ) = V ar(x) + V ar(y ) + 2Cov(X, Y ) Specifically, covariance is defined as Cov(X, Y ) = E[(X E(X))(Y E(Y ))] = E(XY ) E(X)E(Y ) this is a generalization of variance to two random variables and generally measures the degree to which X and Y tend to be large (or small) at the same time or the degree to which one tends to be large while the other is small. Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 13 / 24

Properties of Covariance Cov(X, Y ) = E[(X µ x )(Y µ y )] = E(XY ) µ x µ y Cov(X, Y ) = Cov(Y, X) Cov(X, Y ) = 0 if X and Y are independent Cov(X, c) = 0 Cov(X, X) = V ar(x) Cov(aX, by ) = ab Cov(X, Y ) Cov(X + a, Y + b) = Cov(X, Y ) Cov(X, Y + Z) = Cov(X, Y ) + Cov(X, Z) Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 14 / 24

Correlation Since Cov(X, Y ) depends on the magnitude of X and Y we would prefer to have a measure of association that is not effected by arbitrary changes in the scales of the random variables. The most common measure of linear association is correlation which is defined as ρ(x, Y ) = Cov(X, Y ) σ X σ Y 1 < ρ(x, Y ) < 1 Where the magnitude of the correlation measures the strength of the linear association and the sign determines if it is a positive or negative relationship. Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 15 / 24

General Bivariate Normal Let Z 1, Z 2 N (0, 1), which we will use to build a general bivariate normal distribution. f(z 1, z 2 ) = 1 [ 2π exp 1 ] 2 (z2 1 + z2) 2 We want to transform these unit normal distributions to have the follow arbitrary parameters: µ X, µ Y, σ X, σ Y, ρ X = σ X Z 1 + µ X Y = σ Y [ρz 1 + 1 ρ 2 Z 2 ] + µ Y [ ( 1 1 (x µx ) 2 f(x, y) = 2πσ X σ Y (1 ρ 2 exp ) 1/2 2(1 ρ 2 ) σ X 2 + (y µ Y ) 2 σ 2 Y 2ρ (x µ )] X ) (y µ Y ) σ X σ Y Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 16 / 24

General Bivariate Normal - Marginals and Cov The marginal distributions of X and Y are given by X = σ X Z 1 + µ X = σ X N (0, 1) + µ X = N (µ X, σ 2 X ) Y = σ Y [ρz 1 + 1 ρ 2 Z 2 ] + µ Y = σ Y [ρn (0, 1) + 1 ρ 2 N (0, 1)] + µ Y = σ Y [N (0, ρ 2 ) + N (0, 1 ρ 2 )] + µ Y = σ Y N (0, 1) + µ Y = N (µ Y, σ 2 Y ) Cov(X, Y ) = E [(X E(X))(Y E(Y ))] [ = E (σ X Z 1 + µ X µ X )(σ Y [ρz 1 + ] 1 ρ 2 Z 2 ] + µ Y µ Y ) [ = E (σ X Z 1 )(σ Y [ρz 1 + ] 1 ρ 2 Z 2 ]) = σ X σ Y E [ρz 1 2 + ] 1 ρ 2 Z 1 Z 2 = σ X σ Y ρe[z 2 1 ] = σ X σ Y ρ ρ(x, Y ) = Cov(X, Y ) σ X σ Y = ρ Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 17 / 24

General Bivariate Normal - Density (Matrix Notation) Obviously, the density for the Bivariate Normal is ugly, and it only gets worse when we consider higher dimensional joint densities of normals. We can write the density in a more compact form using matrix notation, f(x) = 1 [ 2π (det Σ) 1/2 exp 1 ] 2 (x µ)t Σ 1 (x µ) In the bivariate case, ( ) x x = y µ = ( µx µ Y ) ( σ 2 Σ = X ρσ X σ Y ρσ X σ Y σ 2 Y ) Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 18 / 24

Conditional Expectation of the Bivariate Normal Using X = µ X + σ X Z 1 and Y = µ Y + σ Y [ρz 1 + (1 ρ 2 ) 1/2 Z 2 ] where Z 1, Z 2 N (0, 1) we can find E(Y X). Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 19 / 24

Conditional Expectation of the Bivariate Normal Using X = µ X + σ X Z 1 and Y = µ Y + σ Y [ρz 1 + (1 ρ 2 ) 1/2 Z 2 ] where Z 1, Z 2 N (0, 1) we can find E(Y X). ) ] E[Y X = x] = E [µ Y + σ Y (ρz 1 + (1 ρ 2 ) 1/2 Z 2 X = x ( = E [µ Y + σ Y ρ x µ ) X + (1 ρ 2 ) 1/2 Z 2 X = x] σ X ( = µ Y + σ Y ρ x µ ) X + (1 ρ 2 ) 1/2 E[Z 2 X = x] σ X ( ) x µx = µ Y + σ Y ρ σ X By symmetry, ( ) y µy E[X Y = y] = µ X + σ X ρ σ Y Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 19 / 24

Conditional Variance of the Bivariate Normal Using X = µ X + σ X Z 1 and Y = µ Y + σ Y [ρz 1 + (1 ρ 2 ) 1/2 Z 2 ] where Z 1, Z 2 N (0, 1) we can find V ar(y X). Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 20 / 24

Conditional Variance of the Bivariate Normal Using X = µ X + σ X Z 1 and Y = µ Y + σ Y [ρz 1 + (1 ρ 2 ) 1/2 Z 2 ] where Z 1, Z 2 N (0, 1) we can find V ar(y X). ) ] V ar[y X = x] = V ar [µ Y + σ Y (ρz 1 + (1 ρ 2 ) 1/2 Z 2 X = x ( = V ar [µ Y + σ Y ρ x µ ) X + (1 ρ 2 ) 1/2 Z 2 X = x] σ X = V ar[σ Y (1 ρ 2 )Z 2 X = x] = σ 2 Y (1 ρ 2 ) By symmetry, V ar[x Y = y] = σ 2 X(1 ρ 2 ) Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 20 / 24

Bayesian Inference Bayes Theorem states P (A B) = P (B A)P (A) P (B) We are interested in estimating the model parameters based on the observed data and any prior belief about the parameters, which we setup as follows P (θ X) = P (X θ) P (X) π(θ) P (X θ) π(θ) Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 21 / 24

Bayesian Inference - Terminology Elements of the Bayesian Model: π(θ) - Prior distribution - This distribution reflects any preexisting information / belief about the distribution of the parameter(s). P (X θ) - Likelihood / Sampling distribution - Distribution of the data given the parameters, which is the probability model believed to have generated the data. P (X) - Marginal distribution of the data - Distribution of the observed data marginalized over all possible values of the parameter(s). P (θ X) - Posterior distribution - Distribution of the parameter(s) after taking the observed data into account. Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 22 / 24

Maximum Likelihood The maximum likelihood estimates parameter(s) by maximizing the likelihood function given the observed data. n L(θ X) = f(x θ) = f(x i θ) i=1 l(θ X) = log L(θ X) = i = 1 n log f(x i θ) The maximum likelihood estimate (estimator), MLE, is therefore given by MLE = ˆθ MLE = arg max θ L(θ X) = arg max θ l(θ X) Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 23 / 24

Why use the MLE? The maximum likelihood estimator has the following properties: Consistency - as the sample size tends to infinity the MLE tends to the true value of the parameter(s) Asymptotic normality - as the sample size increases, the distribution of the MLE tends to the normal distribution Efficiency - as the sample size tends to infinity, there are not any other unbiased estimator with a lower mean squared error Common issues: Uniqueness - estimate may not be unique Computation - results may not have a closed form Statistics 104 (Colin Rundel) Lecture 25 April 23, 2012 24 / 24