Preliminary Statistics. Lecture 3: Probability Models and Distributions

Similar documents
Preliminary Statistics Lecture 3: Probability Models and Distributions (Outline) prelimsoas.webs.com

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

ECON Fundamentals of Probability

Joint Distribution of Two or More Random Variables

Lecture 2: Repetition of probability theory and statistics

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Chapter 2. Continuous random variables

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

3. Probability and Statistics

1 Review of Probability and Distributions

CHAPTER 4 MATHEMATICAL EXPECTATION. 4.1 Mean of a Random Variable

Probability and Distributions

Chapter 4. Chapter 4 sections

Algorithms for Uncertainty Quantification

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

The Binomial distribution. Probability theory 2. Example. The Binomial distribution

Lecture 2: Review of Probability

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Preliminary Statistics course. Lecture 1: Descriptive Statistics

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

ENGG2430A-Homework 2

1 Probability theory. 2 Random variables and probability theory.

EXPECTED VALUE of a RV. corresponds to the average value one would get for the RV when repeating the experiment, =0.

Bivariate distributions

Chapter 4 continued. Chapter 4 sections

Review: mostly probability and some statistics

Probability. Table of contents

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

ECE353: Probability and Random Processes. Lecture 7 -Continuous Random Variable

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Continuous Random Variables

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Probability Theory and Statistics. Peter Jochumzen

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Random Vectors and Random Sampling. 1+ x2 +y 2 ) (n+2)/2

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

BASICS OF PROBABILITY

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Homework 4 Solution, due July 23

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Probability Distributions

Applied Econometrics - QEM Theme 1: Introduction to Econometrics Chapter 1 + Probability Primer + Appendix B in PoE

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Continuous Random Variables and Continuous Distributions

2 (Statistics) Random variables

This does not cover everything on the final. Look at the posted practice problems for other topics.

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

LIST OF FORMULAS FOR STK1100 AND STK1110

Jointly Distributed Random Variables

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

ECE Lecture #9 Part 2 Overview

4. Distributions of Functions of Random Variables

Covariance and Correlation

Appendix A : Introduction to Probability and stochastic processes

5 Operations on Multiple Random Variables

CDA5530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables

Gaussian random variables inr n

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

Random Variables and Their Distributions

III - MULTIVARIATE RANDOM VARIABLES

LECTURE 1. Introduction to Econometrics

Statistical Pattern Recognition

Review of Probability. CS1538: Introduction to Simulations

Probability Review. Chao Lan

SDS 321: Introduction to Probability and Statistics

Class 8 Review Problems solutions, 18.05, Spring 2014

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

S n = x + X 1 + X X n.

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

The Chi-Square Distributions

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

STT 441 Final Exam Fall 2013

The normal distribution

Review of Statistics

Unit 2. Describing Data: Numerical

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

Vectors and Matrices Statistics with Vectors and Matrices

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

Data Analysis and Monte Carlo Methods

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Random variables (discrete)

Stat 704 Data Analysis I Probability Review

1 Presessional Probability

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

Statistics Examples. Cathal Ormond

Lecture 4: Proofs for Expectation, Variance, and Covariance Formula

Functions of two random variables. Conditional pairs

Lecture 1: August 28

01 Probability Theory and Statistics Review

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

STAT Chapter 5 Continuous Distributions

Week 10 Worksheet. Math 4653, Section 001 Elementary Probability Fall Ice Breaker Question: Do you prefer waffles or pancakes?

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Basic concepts of probability theory

Transcription:

Preliminary Statistics Lecture 3: Probability Models and Distributions Rory Macqueen (rm43@soas.ac.uk), September 2015

Outline Revision of Lecture 2 Probability Density Functions Cumulative Distribution Functions Properties of random variables Properties Variance Covariance Common Continuous Models Normal Distribution T-Distribution Chi-Square Distribution F-Distribution Sampling Distributions

Probability Density Function f(x) A formula defining the curve of a continuous probability model. The area under the curve between two points gives the probability that a value between the two points will arise. This area is obtained by integrating the PDF between the two points. The sum of the area under the curve is equal to one.

Probability Density Function f(x) PDF of a continuous random variable measures the probability of the random variable over a certain range or interval. P a X b = f x dx a For example, the probability that the height of an individual lies in the interval 60 and 75 inches is given by the area between 60 and 75. b

Cumulative Distribution Function F(x) x i P X x i = f x dx Geometrically, the CDF of a continuous random variable is a continuous curve

Expected Values The expected value of a random variable (often denoted as μ) is the sum/integral of each value it can take, x i, multiplied by the probability of taking that value, f(x i ) (the pdf). It is the population mean of the variable For discrete random variables: For continuous random variables: N E X = x i f x i = μ i=1 E X = xf x dx = μ The expected value of random variables is also known as the population mean. Eg. In the example with the dice: 1 7 = 2. 36 + 3. 2 36 + 4. 3 36 + + 12. 1 36 If all values are equally likely, f(x i )=1/N, and the expected value is the arithmetic mean.

Expected Values Properties Properties of Expected Value 1. E(b) = b Where b is a constant. 2. E(X+Y) = E(X) + E(Y) Where X and Y are random variables. 3. E(XY) E(X) E(Y) Where X and Y are non independent random variables. 4. E(XY) = E(X) E(Y) If X and Y are independent random variables. 5. E(aX) = ae(x) Where a is a constant. 6. E(aX+b) = ae(x) + E(b) = ae(x) + b Where a and b are constant. 7. E(g x ) = g x f(x) x Where g(x) is a function of X

Variance Let X be a random variable with E(X) = μ. The dispersion of X around its mean (expected value) can be described by the (population) variance denoted by ς 2 : Var X = ς 2 = = 2 (( )ܧ )ܧ X )ܧ 2 ) μ 2

Variance Let X be a discrete random variable with E(X) = μ: Var X = )ܧ μ) 2 = (x i μ) 2 f(x i ) N i=1 If each outcome is equally likely, f(x i ) = 1 N Let X be a continuous random variable with E(X) = μ: Var X = )ܧ μ) 2 = (x μ) 2 f x dx

Variance 1. Var(x) = E(X-μ) 2 = E(X 2 ) - μ 2 Properties of Variance 2. Var(b) = 0 With b being a constant. 3. Var(X+b) = Var(X) With b being a constant. 4. Var(aX) = a 2 Var(X) With a being a constant 5. Var(aX+b) = a 2 Var(X) With a and b being constants 6. Var(X+Y) = Var (X) + Var(Y) Var (X-Y) = Var (X) + Var(Y) Var (ax+by) = a 2 Var(X) + b 2 Var(Y) If X and Y are independent random variables. Otherwise see properties of covariance (later).

Covariance Let X and Y be two random variables with means E(X) = μ x and E(Y) = μ y. Then the covariance between the two is: Cov X, Y = E[ X μ x Y μ y ] = E XY μ x μ y

Covariance For a discrete random variable this translates to Cov X, Y = x y x i y i f x i, y i μ x μ y For a continuous random variable Cov X, Y = XYf x i, y i μ x μ y dxdy

Covariance Properties of Covariance 1. cov(x,y) = E(XY) E(X)E(Y) If X and Y are two random variables 2. cov(x,x) = var(x) 3. cov(x,a) = 0 With a being a constant. 4. cov(a+bx, c+dy) = bd cov(x,y) With a,b,c, and d being constants 5. cov(x,y) = 0 If X and Y are independent, since E(XY)=E(X)E(Y)=μ x μ y 6. var(x+y) = var(x) + var(y) + 2cov(X,Y) var(x-y) = var(x) + var(y) 2cov(X,Y) If X and Y are two random variables The opposite does not hold: if the covariance between X and Y is zero, we can not infer that they are independent.

Correlation coefficient As we saw in Lecture 1, Cov(x, y) ρ = Var X Var(Y) = Cov(x, y) ς(x)ς(y)

Conditional moments We can define the conditional expectation of X, given that Y=y i, E X Y = y i = x x f(x Y = y i ) for discrete = x f(x Y = y i ) dx for continuous And similar for conditional variance and higher moments.

Normal distribution t (or Student s t ) distribution Chi-squared distribution F distribution

Normal Distribution Normality may arise: When a random variable is the result of many independent, random influences, none of which is dominant, From a Central Limit Theorem, When a random variable is logged.

Normal Distribution Probability Density Function of a Normal Distribution f y i = 2πς 2 1 2 exp( 1 y i μ 2 ) 2 ς Two parameters: mean (μ) and variance (ς 2 ) The normal distribution is the exponential of a quadratic. X ~ N(μ,σ 2 ): The random variable X is normally distributed with mean μ and variance σ 2.

Normal Distribution Properties: Bell shaped and symmetric. Mean = Median = Mode. Skewness and Excess Kurtosis are equal to zero.

Normal Distribution Linear Transformations Any linear transformation of a normally distributed random variable is also normally distributed. Y ~ N(μ,σ 2 ) W = a + by ~ N(a+bμ,b 2 σ 2 ) A very useful linear transformation is: Z i = y i μ ς and Z is distributed Standard Normal.

Standard Normal Distribution The standard normal distribution has mean 0 and variance (and standard deviation) 1. Z ~ N(0,1) The SND is the reference distribution for the normal tables.

Standard Normal Distribution Areas Under a Normal and Standard Normal Distribution Z Y = μ + σz Probabilities What it means P(-1 < Z < 1) P(μ σ < Y < μ + σ) = 0.6826 Prob. of being within 1 sd From mean is 68.26%. P(-1.96 < Z < 1.96) P(μ 1.96σ < Y < μ + 1.96σ) = 0.95 95% of the normal distribution lies within 1.96 sds form the mean. P(-2 < Z < 2) P(μ 2σ < Y < μ + 2σ) = 0.9544 Prob. of being within 2 sd from the mean is 95.44%. P(-3 < Z < 3) P(μ 3σ < Y < μ + 3σ) = 0.997 Prob. of being within 3 sd from the mean is 99.7%.

Standard Normal Distribution

Chi-squared Distribution The sum of n independent squared standard normal variables, A n A = Z i 2 i=1 ~χ 2 (n) follows a chi-squared distribution with n degrees of freedom The chi-squared distribution takes only positive values and ranges from 0 to infinity Estimates of variance are distributed χ 2

Chi-squared Distribution χ 2 is skewed. For relatively few degrees of freedom (d.f.) the distribution is highly skewed to the right, but as the d.f. increase, the distribution approaches the normal. The mean of the chi-squared random variable is n and its variance is 2n, where n is the d.f.

Student s t Distribution Consider two independent variables: one standard normal (Z~N 0,1 ) and one Chi-squared (X~χ 2 (n) ). G = Z X ~t(n) n i.e. G follows a Student s t distribution with n degrees of freedom. This is used regularly in hypothesis testing (Lecture 5) when we divide an estimate of a coefficient (~N) by its standard error ( ~χ 2 n )

Student s t Distribution The Student s t-distribution is a bell shaped, symmetric distribution, similar to the standard normal distribution. It has mean 0 and variance n/(n-2). For low d.f., it is flatter and with fatter tails than the normal Used to compensate for the extra uncertainty when the sample is used to calculate the parameters of a normal distribution.

F Distribution If we have two independent Chi-squared variables X 1 (n) and X 2 (m). The ratio of these two chi-squared variables, each divided by its degrees of freedom follows an F- distribution. X 1 n B = ~F X n,m 2 m Two indexing parameters: the degrees of freedom in the numerator and the degrees of freedom in the denominator.

F Distribution F resembles the χ 2 : is always non negative and is skewed to the right.

F Distribution Use the identity E.g. Left-Hand Tail of an F-Distribution F a b,1 α = 1/Fb a,α For F 5,10 the 0.05 right-hand critical value F 5 10,0.05 = 3.33 For F 10,5 the 0.05 right-hand critical value F 10 5,0.05 = 4.74 For F 5,10 the 0.05 left-hand critical value F 5 10,0.95 = 0.21097

Sampling Distributions Given random sampling Statistics (eg. sample mean), being calculated from a sample, are random variables. The value of the statistic is the outcome of a random process, the random sampling process. The value of the statistic will differ depending on the sample drawn. As such it will have a spread of values, each with an associated probability, i.e. a probability distribution. Probability distributions for statistics are commonly referred to as sampling distributions.