Probability Theory and Statistics. Peter Jochumzen

Similar documents
Lecture 2: Review of Probability

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

LIST OF FORMULAS FOR STK1100 AND STK1110

Random Variables and Their Distributions

Problem Set #5. Econ 103. Solution: By the complement rule p(0) = 1 p q. q, 1 x 0 < 0 1 p, 0 x 0 < 1. Solution: E[X] = 1 q + 0 (1 p q) + p 1 = p q

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

1 Probability theory. 2 Random variables and probability theory.

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Review of Probability Theory

Stat 5101 Notes: Algorithms

ECON Fundamentals of Probability

Preliminary Statistics Lecture 3: Probability Models and Distributions (Outline) prelimsoas.webs.com

2 (Statistics) Random variables

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

BASICS OF PROBABILITY

Statistics for scientists and engineers

Northwestern University Department of Electrical Engineering and Computer Science

Exercises and Answers to Chapter 1

Probability and Distributions

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Recitation 2: Probability

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

1 Review of Probability and Distributions

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

ENGG2430A-Homework 2

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Actuarial Science Exam 1/P

Probability- the good parts version. I. Random variables and their distributions; continuous random variables.

Theory of probability and mathematical statistics

Exam P Review Sheet. for a > 0. ln(a) i=0 ari = a. (1 r) 2. (Note that the A i s form a partition)

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

STT 441 Final Exam Fall 2013

This does not cover everything on the final. Look at the posted practice problems for other topics.

1.1 Review of Probability Theory

Final Exam # 3. Sta 230: Probability. December 16, 2012

Chp 4. Expectation and Variance

01 Probability Theory and Statistics Review

Joint Probability Distributions and Random Samples (Devore Chapter Five)

1: PROBABILITY REVIEW

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Stat 5101 Notes: Algorithms (thru 2nd midterm)

matrix-free Elements of Probability Theory 1 Random Variables and Distributions Contents Elements of Probability Theory 2

1 Review of Probability

Week 12-13: Discrete Probability

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Math-Stat-491-Fall2014-Notes-I

Practice Examination # 3

If we want to analyze experimental or simulated data we might encounter the following tasks:

Lecture 11. Probability Theory: an Overveiw

Lecture 2: Repetition of probability theory and statistics

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

1 Presessional Probability

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

STA1000F Summary. Mitch Myburgh MYBMIT001 May 28, Work Unit 1: Introducing Probability

More than one variable

Multivariate Random Variable

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Quick Tour of Basic Probability Theory and Linear Algebra

STAT 512 sp 2018 Summary Sheet

14.30 Introduction to Statistical Methods in Economics Spring 2009

1 Random Variable: Topics

September Statistics for MSc Weeks 1-2 Probability and Distribution Theories

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Review of Statistics I

Bivariate distributions

3. Probability and Statistics

1 Random variables and distributions

STAT Chapter 5 Continuous Distributions

Probability Notes. Compiled by Paul J. Hurtado. Last Compiled: September 6, 2017

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

Multiple Random Variables

MAS223 Statistical Inference and Modelling Exercises

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

Continuous random variables

Review of Statistics

Problem 1. Problem 2. Problem 3. Problem 4

Lecture 6 Basic Probability

Chapter 2: Random Variables

1 Probability and Random Variables

Multiple Random Variables

Elements of Probability Theory

Review 1: STAT Mark Carpenter, Ph.D. Professor of Statistics Department of Mathematics and Statistics. August 25, 2015

Preliminary Statistics. Lecture 3: Probability Models and Distributions

Stat 704 Data Analysis I Probability Review

Math Review Sheet, Fall 2008

Review: mostly probability and some statistics

Algorithms for Uncertainty Quantification

Properties of Summation Operator

Introduction. The Linear Regression Model One popular model is the linear regression model. It writes as :

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

1 Basic continuous random variable problems

Definition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R

Topics in Probability and Statistics

Master s Written Examination

Math 494: Mathematical Statistics

Transcription:

Probability Theory and Statistics Peter Jochumzen April 18, 2016

Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................ 3 1.3 Rules of Probability....................................... 3 1.4 Conditional Probabilities and Independent Events...................... 4 1.5 Random Variables........................................ 4 1.6 Discrete and Continuous Random Variables.......................... 4 1.7 Probability Mass Function.................................... 5 1.8 Cumulative Distribution Function............................... 5 1.9 Cumulative Distribution Function, Discrete Random Variables............... 6 1.10 Cumulative Distribution Function, Continuous Random Variable.............. 6 1.11 Probability Density Function.................................. 6 1.12 Function of a Random Variable................................. 7 1.13 Expected Value.......................................... 7 1.14 Variance.............................................. 7 1.15 The Constant Random Variable................................. 8 1.16 The Discrete Uniform Distribution............................... 8 1.17 The Bernoulli Distribution.................................... 9 1.18 The Binomial Distribution.................................... 9 1.19 The Continuous Uniform Distribution............................. 9 1.20 The Exponential Distribution.................................. 9 1.21 The Normal Distribution..................................... 10 1.22 Two Random Variables..................................... 10 1.23 Probability Mass Function, Two Random Variables...................... 11 1.24 Marginal Probability Mass Function.............................. 11 1.25 Cumulative Distribution Function, Two Random Variables.................. 11 1.26 Probability Density Function, Two Continuous Random Variables............. 11 1.27 Marginal Probability Density Function............................. 12 1.28 Conditional Distributions, Discrete Random Variables.................... 12 1.29 Conditional Distributions, Continuous Random Variables.................. 13 1.30 Independent Random Variables................................. 13 1.31 Conditional Expectation..................................... 13 1.32 Function of Two Random Variables............................... 13 1.33 Expected Value and Variance of a Linear Function of Two Random Variables....... 14 1.34 Expected Value of a Product of Two Random Variables................... 14 1.35 Covariance of Two Random Variables............................. 14 1.36 Covariance of Two Random Variables, Results........................ 14 1.37 Correlation............................................ 15 1.38 Several Random Variables.................................... 15 1.39 Random Sample......................................... 16 1.40 Random Sample of Vectors................................... 16 1.41 Statistics............................................. 17 1.42 Sample Mean........................................... 17 1.43 Sample Variance......................................... 17 1.44 Sample Covariance........................................ 17 1.45 Properties of the Sample Mean................................. 18 1

1.46 Properties of the Sample Covariance.............................. 18 1.47 The Chi-square Distribution................................... 18 1.48 Properties of the Sample Variance............................... 19 1.49 The Standard Error of the Sample Mean............................ 19 1.50 The t-distribution........................................ 19 1.51 The F-distribution........................................ 19 1.52 Statistical Model......................................... 20 1.53 Estimator............................................. 20 1.54 Properties of Estimators..................................... 20 1.55 Standard Error.......................................... 21 1.56 The Z-statistics.......................................... 21 1.57 Generalizing the Z-statistics................................... 21 1.58 The t-statistics.......................................... 22 1.59 Generalizing the t-statistics................................... 22 1.60 Critical Values.......................................... 22 1.61 Hypothesis Testing, Theory................................... 23 1.62 Hypothesis Tests Involving the Mean.............................. 24 1.63 Confidence Intervals....................................... 25 1.64 Asymptotically Unbiased Estimator.............................. 25 1.65 Consistency............................................ 25 1.66 plim Rules............................................. 26 1.67 Convergence in Distribution................................... 26 1.68 Central Limit Theorem..................................... 26 2

Chapter 1 Probability Theory And Statistics 1.1 Experiment, Outcome and Event The sample space S of an experiment is the set of all possible outcomes of that experiment. sample space S is the universal set. The Example. Experiment: Toss a dice. Sample space S = {1, 2, 3, 4, 5, 6}. We say that the sample space is finite if it contains a finite number of elements. Elements of the sample space are called outcomes or possible outcomes. An event is a subset of the sample space S. Example (continued). A = {1, 3, 5} is the event toss an odd number. The sample space is called a certain event and an empty set is called an impossible event. An event containing only one outcome is called an elementary event. If A, B are two events then A B, A B, A C (the complement of A in S) and so on are events. Two events A, B are called mutually exclusive if A B = 1.2 Probability If A is an event then P (A) denotes the probability that the experiment will result in an outcome in A Example. Toss a fair dice and let A = {1, 3, 5}, B = {4}. Then P (A) = 1/2, P (B) = 1/6. If S is a finite sample space with n outcomes then we say that each outcome is equally likely if P (A) = 1/n for each elementary event A. Example. Each outcome of a fair dice is equally likely. If the outcomes of an experiment are equally likely then P (A) = k n where A is an arbitrary event with k outcomes and n is the number of outcomes of the sample space. 1.3 Rules of Probability In this section, A, B are arbitrary events in the sample space S. 1. Probabilities are between 0 and 1: 0 P (A) 1 3

2. Probabilities of certain and impossible events: P (S) = 1 P ( ) = 0 3. Probabilities of mutually exclusive events: If A, B are mutually exclusive events then P (A B) = P (A) + P (B) (This rule can be extended, in a trivial way, to the case when we have many mutually exclusive events) 4. Probabilities of complements: P (A C ) = 1 P (A) 5. Probabilities of subsets: If A B then P (A) P (B) 6. Probabilities of unions: P (A B) = P (A) + P (B) P (A B) 1.4 Conditional Probabilities and Independent Events In this section, A, B are arbitrary events in the sample space S. If P (B) > 0 we define the conditional probability of the event A given B, denoted by P (A B), as We say that A, B are independent events if P (A B) = P (A B) P (B) P (A B) = P (A)P (B) A, B are independent events and P (B) > 0 if and only if P (A B) = P (A) 1.5 Random Variables Simplified definition: A random variable on a sample space S is a function or a mapping from S to R. If the random variable is denoted by X then X : S R. If c is a constant then X = c is an event defined as the collection of outcomes that X will map to c. If c is a constant then P (X = c) is the probability of the event X = c. Similarly, if a, b are constants, X < a, X b, a < X b and so on are events with probabilities P (X < a), P (X b), P (a < X b) and so on. More generally, If A is an arbitrary subset of R then X A is an event defined as the collection of outcomes that X will map to any number in A and P (X A) is the probability of this event. 1.6 Discrete and Continuous Random Variables If X is a random variable then we define the range of X as the collection of real numbers that X can take. The range of X is typically denoted by R or R X. A finite set is a set with a finite number of elements. The set of natural numbers N = {1, 2, 3, } is a countably infinite set The set of integers Z = {0, ±1, ±2, ±3, } is also a countably infinite set and so is any infinite subset of of Z 4

The set of real numbers R is an uncountably infinite set. All intervals such as (a, b), [a, b], (a, b], [a, b), (, b), (a, ) where a < b are real numbers are uncountably infinite sets. If the range of X is finite or countable infinite then we say that X is a discrete random variable. If the range of X is uncountable infinite and P (X = c) for all real numbers c then we say that X is a continuous random variable. 1.7 Probability Mass Function In this section, X is a discrete random variable so that R, the range of X, is finite or countable infinite. If the range of X is finite, we denote the range by R = {x 1, x 2,, x n }. If the range of X is countably infinite, we denote the range by R = {x 1, x 2, }. For an arbitrary real number x, we define as the probability mass function, pmf, of X. f(x) = P (X = x) If x is not in the range of X then X = x is an empty set and f(x) = 0. If x is in the range of X then X = x is a non empty set and 0 f(x) 1. If A is a subset of the range R then the probability that X takes a value in A is given by P (X A) = x A f(x) The pmf must satisfy f(x) = 1 x R If the range is finite, the expression can also be written n f(x i ) = 1 1.8 Cumulative Distribution Function In this section, X is an arbitrary random variable. For an arbitrary real number x, we define F (x) = P (X x) as the cumulative distribution function, cdf of X. The cdf F has the following properties: 1. F is an increasing function. 2. F is continuous from the right (but not continuous). 3. lim x F (x) = 0 4. lim x F (x) = 1 i=1 5

1.9 Cumulative Distribution Function, Discrete Random Variables In this section, X is a discrete random variable with pmf f and cdf F. We define (See Left hand limits ) We have the following results F (x 0 ) = lim F (x) x x 0 F (x ) = P (X < x) f(x) = F (x) F (x ) F (x) = x i x f(x i ) P (a < X b) = F (b) F (a) P (a X b) = F (b) F (a ) P (a < X < b) = F (b ) F (a) P (a X < b) = F (b ) F (a ) 1.10 Cumulative Distribution Function, Continuous Random Variable In this section, X is a continuous random variable. If X is a continuous random variable then F is a continuous function. F (x ) = F (x) and P (X x) = P (X < x) (remember, P (X = x) = 0) We have the following results P (a < X b) = P (a X b) = P (a < X < b) = P (a X < b) = F (b) F (a) 1.11 Probability Density Function In this section, X is a continuous random variable with cdf F. The probability density function (pdf) of X is defined as (assuming that the derivative exists). f(x) = df (x) dx f(x) is not a probability. If x is not in the range of X then f(x) = 0. If x is in the range of X then f(x) 0 (with no upper limit). If A is a subset of the range R then the probability that X takes a value in A is given by P (X A) = f(x)dx In particular, if A is an interval, A = [a, b], then P (a X b) = x A b a f(x)dx 6

The pdf of any continuous random variable must satisfy 1. Non-negative: f(x) 0 for all x. 2. Integrate to 1: f(x)dx = 1 x R Since f(x) = 0 when x is outside the range of X we can also write this as f(x)dx = 1 Calculating the cdf from the pdf: F (x) = x f(t)dt 1.12 Function of a Random Variable If X is a random variable and g is a real-valued function of one real variable with a domain equal to the range of X then Y = g(x) is a new random variable. The range of Y is given by the range of g. 1.13 Expected Value If X is a discrete random variable then we define the expected value of X, denoted by E(X), as E(X) = x R xf(x) where f(x) is the pmf of X. We often use the symbol µ for E(X) or µ X if we need to clarify the name of the random variable. If X is a continuous random variable then we define the expected value of X, also denoted by E(X), as E(X) = xf(x)dx where f(x) is the pdf of X. If X is a discrete random variable and Y = g(x) then the expected value of Y is given by x R E(Y ) = x R g(x)f(x) If X is a continuous random variable and Y = g(x) then the expected value of Y is given by E(Y ) = g(x)f(x)dx x R If X is any random variable and g is a linear function, Y = a + bx then the expected value of Y is given by E(Y ) = a + be(x) 1.14 Variance If X is an arbitrary random variable with expected value µ then we define the variance of X as V ar(x) = E[(X µ) 2 ] = E(X 2 ) µ 2 We often write the right-hand side as E(X µ) 2 with the implicit understanding that this is the expected value of (X µ) 2, not the square of E(X µ). 7

We often use the symbol σ 2 for V ar(x) or σx 2 if we need to clarify the name of the random variable. If X is discrete, it follows that V ar(x) = x R(x µ) 2 f(x) If X is continuous, it follows that V ar(x) = (x µ) 2 f(x)dx x R If X is a discrete random variable and Y = g(x) with E(Y ) = µ Y then the variance of Y is given by V ar(y ) = x R(g(x) µ Y ) 2 f(x) If X is a continuous random variable and Y = g(x) with E(Y ) = µ Y by V ar(y ) = (g(x) µ Y ) 2 f(x)dx x R then the variance of Y is given If X is a any random variable and g is a linear function, Y = a + bx then the variance of Y is given by V ar(y ) = b 2 V ar(x) The standard deviation of a random variable X is defined as the square root of the variance, σ = V ar(x) 1.15 The Constant Random Variable A discrete random variable X whose range is a single number c is called a constant random variable or simply a constant and we write X = c. The pmf of a constant random variable is given by f(c) = 1, f(x) = 0 for all x c. We have E(X) = c and V ar(x) = 0; the expected value of a constant is the constant itself and the variance of a constant is zero. 1.16 The Discrete Uniform Distribution A discrete random variable X that takes n different values with equal probability is said to follow a discrete uniform distribution. Formally, the range of X is R = {x 1, x 2, x n } and the pmf is given by f(x i ) = 1/n for i = 1,, n. We have and V ar(x) = 1 n µ = E(X) = 1 n n i=1 x i n (x i µ) 2 = 1 n i=1 n x 2 i µ 2 i=1 8

1.17 The Bernoulli Distribution A discrete random variable X with range R = {0, 1} is said to follow a Bernoulli distribution. We typically denote f(1) = P (X = 1) by p such that f(0) = P (X = 0) = 1 p with 0 p 1 and write X Ber(p). We have E(X) = p V ar(x) = p(1 p) 1.18 The Binomial Distribution A discrete random variable X with range R = {0, 1,, n} and pmf ( ) n P (X = k) = f(k) = p k (1 p) n k k where ( ) n n! = k k!(n k)! is said to follow a binomial distribution and we write X B(p) We have E(X) = np V ar(x) = np(1 p) 1.19 The Continuous Uniform Distribution A continuous random variable X with range R = [a, b], b > a and pdf f(x) = 1 b a is said to follow a continuous uniform distribution and we write We have E(X) = a + b 2 X U[a, b] V ar(x) = 1 (b a)2 12 If X U[a, b] and Y = c + dx is a linear function of X with d 0 then Y will also follow a uniform distribution. 1.20 The Exponential Distribution A continuous random variable X with range R = [0, ) and pdf f(x) = 1 λ e x/λ is said to follow an exponential distribution with parameter λ and we write where λ > 0. We have X exp(λ) E(X) = λ V ar(x) = λ 2 The probability density function of an exponential distribution is sometimes written as Written like this, E(X) = λ 1 V ar(x) = λ 2 f(x) = λe λx 9

1.21 The Normal Distribution A continuous random variable X with range R = (, ) and pdf ( ) 1 f(x) = exp (x µ)2 2πσ 2 2σ 2 is said to follow an normal distribution with parameters µ, σ 2 and we write X N(µ, σ 2 ) We have E(X) = µ V ar(x) = σ 2 If µ = 0 and σ 2 = 1 we say that X follows a standard normal distribution. A random variable that follows a standard normal distribution is typically denoted by Z and Z N(0, 1) with E(Z) = 0, V ar(z) = 1 and with probability density function f(z) = 1 ) exp ( x2 2π 2 If X N(µ X, σx 2 ) and Y = a + bx is a linear function of X with b 0 then where µ Y = a + bµ X and σ 2 Y = b2 σ 2 X. Y N(µ Y, σ 2 Y ) If X N(µ, σ 2 ) then X µ σ N(0, 1) Excel: If X N with expected value m and standard deviation s then P (X x): NORM.DIST(x,m,s,TRUE) P (X x): 1 NORM.DIST(x,m,s,TRUE) the value x such that P (X x) = p: NORM.INV(p,m,s) If X N(0, 1) then you can use NORM.S.DIST(x,TRUE) and NORM.S.INV(p) 1.22 Two Random Variables Given a sample space S, we can define two functions, X : S R and Y : S R. We then have two random variables. Given two random variables X, Y and two constants x, y, X = x, Y = y is an event. It is the collection of outcomes that X will map to x and Y maps to y. The probability of the event X = x, Y = y is denoted by P (X = x, Y = y). Similarly, expressions such as X x, Y y are events whose probability is P (X x, Y y) X, Y are called discrete random variables if the range of X and the range of Y are both finite or infinite countable. X, Y are called continuous random variables if the range of X and the range of Y are both uncountable and P (X = x, Y = y) = 0 for all x, y. The range of X, Y is the collection of all ordered pairs (x, y) such that there is an outcome in S that is mapped to x by X and to y by Y. 10

1.23 Probability Mass Function, Two Random Variables In this section, X, Y are two discrete random variables. For any two real numbers (x, y), we define f(x, y) = P (X = x, Y = y) as the probability mass function, pmf, of X, Y. If (x, y) is not in the range of X, Y then X = x, Y = y is an empty set and f(x, y) = 0. If (x, y) is in the range of X, Y then X = x, Y = y is a non empty set and 0 f(x, y) 1. If A is a subset of the range R then the probability that (X, Y ) takes a value in A is given by P ((X, Y ) A) = f(x, y) (x,y) A The pmf must satisfy (x,y) R f(x, y) = 1 1.24 Marginal Probability Mass Function In this section, X, Y are two discrete random variables with range R and pmf f(x, y). The marginal probability mass function of X is defined as f X (x) = y f(x, y) where the sum is over all y such that (x, y) R. The marginal pmf of Y is defined similarly as f Y (y) = x f(x, y) To distinguish f(x, y) from the marginal distributions, we sometimes call it the joint probability mass function or joint pmf and denote it by f X,Y (x, y). 1.25 Cumulative Distribution Function, Two Random Variables In this section, X, Y are two arbitrary random variables. For two real numbers (x, y), we define F (x, y) = P (X x, Y y) as the cumulative distribution function, cdf of X, Y. 1.26 Probability Density Function, Two Continuous Random Variables In this section, X, Y are two continuous random variable with cdf F (x, y). The probability density function (pdf) of X, Y is defined as (assuming that the derivative exists) f(x, y) = 2 F (x, y) x y 11

f(x, y) is not a probability. If (x, y) is not in the range of X, Y then f(x, y) = 0. If (x, y) is in the range of X, Y then f(x, y) 0 (with no upper limit). f(x, y) is sometimes denoted by f X,Y (x, y) 1.27 Marginal Probability Density Function In this section, X, Y are two continuous random variable with pdf f(x, y). The marginal probability density function of X is defined as f X (x) = f(x, y)dy where the integral is over all y such that (x, y) R. Similarly, the marginal pdf of Y is defined as f Y (y) = y x f(x, y)dx f(x, y) is sometimes called the joint probability density function or joint pdf and denoted by f X,Y (x, y). 1.28 Conditional Distributions, Discrete Random Variables In this section, X, Y are two discrete random variables with joint pmf f X,Y (x, y) and marginal pmf s f X (x) and f Y (y). X = x and Y = y are two separate events. Therefore, if P (Y = y) 0 then P (X = x Y = y) = P (X = x Y = y) P (Y = y) = P (X = x, Y = y) P (Y = y) The probability P (X = x Y = y) is called the conditional probability of X given Y Similarly if P (X = x) 0 then the conditional probability of Y given X is P (Y = y X = x) = P (X = x, Y = y) P (X = x) We denote the conditional probability P (X = x Y = y) by f(x y) or by f X Y (x y) if we need to specify the names of the random variables. P (Y = y X = x) is denoted by f(y x) or by f Y X (y x). f(x y) is called a conditional probability mass function (of x given y) or a conditional pmf Since P (X = x, Y = y) = f X,Y (x, y) we have and whenever the denominators are not zero. f X Y (x y) = f X,Y (x, y) f Y (y) f Y X (y x) = f X,Y (x, y) f X (x) 12

1.29 Conditional Distributions, Continuous Random Variables In this section, X, Y are two continuous random variables with joint pdf f X,Y (x, y) and marginal pdf s f X (x) and f Y (y). We define f(x y) = f X,Y (x, y) f Y (y) as a conditional probability density function (of x given y) whenever f Y (y) 0. Similarly, we define whenever f X (x) 0. f(y x) = f X,Y (x, y) f X (x) 1.30 Independent Random Variables Two random variables X, Y are said to be independent if and only if for all (x, y) in the range of X, Y. f(x, y) = f X (x)f Y (y) X, Y are independent if and only if f(x y) = f X (x) for all x, y. Also X, Y are independent if and only if f(y x) = f Y (y) for all x, y. 1.31 Conditional Expectation If X, Y are two discrete random variables then we define the conditional expectation of X given Y = y as the number E(X Y = y) = xf(x y) x E(Y X = x) is defined similarly. If X, Y are two continuous random variables then we define the conditional expectation of X given Y = y as the number E(X Y = y) = xf(x y)dx E(Y X = x) is defined similarly. If X, Y are independent random variables then E(X Y = y) = E(X) and E(Y X = x) = E(Y ). Without specifying a specific value for Y, E(X Y ) is a random variable. Law of iterated expectation: E(E(X Y )) = E(Y ) 1.32 Function of Two Random Variables If X, Y are two random variables and g is a real-valued function of two variables whose domain is equal to the range of X, Y then Z = g(x, Y ) is a new random variable. If g is a linear function, g(x, y) = a + bx + cy then we say that Z = a + bx + cy is a linear function of the random variables X and Y. x 13

1.33 Expected Value and Variance of a Linear Function of Two Random Variables If X, Y are two random variables and Z = a + bx + cy is a linear function of X and Y then E(Z) = E(a + bx + cy ) = a + be(x) + ce(y ) If X, Y are two independent random variables and Z = a + bx + cy is a linear function of X and Y then V ar(z) = V ar(a + bx + cy ) = b 2 V ar(x) + c 2 V ar(y ) 1.34 Expected Value of a Product of Two Random Variables If X, Y are two independent random variables and Z = XY is a product of X and Y then E(Z) = E(XY ) = E(X)E(Y ) If, in addition, f and g are two real valued functions of a real variable and Z = f(x)f(y ) is a product of f(x) and f(y ) then E(Z) = E(f(X)f(Y )) = E(f(X))E(f(Y )) 1.35 Covariance of Two Random Variables If X, Y are two random variables with expected values µ X and µ Y covariance of X and Y, denoted by Cov(X, Y ) as the number respectively then we define the Cov(X, Y ) = E[(X µ X )(Y µ Y )] = E(XY ) µ X µ Y Cov(X, Y ) is sometimes denoted by σ X,Y or σ XY 1.36 Covariance of Two Random Variables, Results 1. If X, Y are two independent random variables then Cov(X, Y ) = 0. The opposite is not necessarily true. 2. Cov(X, X) = V ar(x) 3. If c is a constant then Cov(X, c) = 0 4. Covariance of linear functions of random variables (a, b, c, d are constants): Cov(a + bx, c + dy ) = bdcov(x, Y ) 5. Covariance of linear functions of two random variables (a, b, c, d are constants): Cov(aX 1 + bx 2, cy 1 + dy 2 ) = accov(x 1, Y 1 ) + adcov(x 1, Y 2 ) + bccov(x 2, Y 1 ) + bdcov(x 2, Y 2 ) 6. If X, Y are two arbitrary random variables then V ar(a + bx + cy ) = b 2 V ar(x) + c 2 V ar(y ) + 2bcCov(X, Y ) 14

1.37 Correlation If X, Y are two random variables then we define the correlation between X and Y, denoted by Corr(X, Y ) as the number Cov(X, Y ) Corr(X, Y ) = V ar(x)v ar(y ) Corr(X, Y ) is sometimes denoted by ρ X,Y For any two random variables X, Y, or ρ XY 1 Corr(X, Y ) 1 We have Corr(X, Y ) = 0 if and only if Cov(X, Y ) = 0. We then say that X and Y are uncorrelated. Corr(X, Y ) > 0 if and only if Cov(X, Y ) > 0. We then say that X and Y are positively correlated. Corr(X, Y ) < 0 if and only if Cov(X, Y ) < 0. We then say that X and Y are negatively correlated. 1.38 Several Random Variables Given a sample space S, we can define n functions, X i : S R for i = 1,, n. We then have n random variables X 1,, X n. Together, they are called a random vector denoted by X = (X 1,, X n ). Given n random variables X 1,, X n and n constants x 1,, x n, X 1 = x 1,, X n = x n is an event. It is the collection of outcomes that X 1 will map to x 1 and X 2 maps to x 2 and so on. The probability of the event X 1 = x 1,, X n = x n is denoted by P (X 1 = x 1,, X n = x n ). Similarly, expressions such as X 1 x 1,, X n x n are events whose probability is P (X 1 x 1,, X n x n ) X 1,, X n are called discrete random variables if the range of all of the X i s are finite or infinite countable. X 1,, X n are called continuous random variables if the range of all of the X i s are uncountable and P (X 1 = x 1,, X n = x n ) = 0 for all x 1,, x n. The range of X 1,, X n is the collection of all ordered n-tuples (x 1,, x n ) such that there is an outcome in S that is mapped to x 1 by X 1 and so on. If (x 1,, x n ) are n real numbers then If X 1,, X n are n arbitrary random variables then we define the cumulative distribution function, cdf of X 1,, X n as F (x 1,, x n ) = P (X 1 x 1,, X n x n ) If X 1,, X n are n discrete random variables then we define the probability mass function, pmf, of X 1,, X n as f(x 1,, x n ) = P (X 1 = x 1,, X n = x n ) If X 1,, X n are n continuous random variables then we define the probability density function, pdf, of X 1,, X n as f(x 1,, x n ) = n F (x 1,, x n ) x 1 x n (assuming that the derivative exists). If X 1,, X n are n discrete random variables then we define the marginal probability mass function of X 1 as f X1 (x 1 ) = f(x 1,, x n ) x 2,,x n where the sum is over all x 2,, x n such that (x 1,, x n ) is in the range of X 1,, X n. The marginal pmf of X 2,, X n are defined similarly. 15

If X 1,, X n are n continuous random variables then we define the marginal probability density function of X i similarly by integrating away the remaining random variables. n random variables X 1,, X n are said to be independent if and only if for all ((x 1,, x n ) in the range of X 1,, X n. f(x 1,, x n ) = f X1 (x 1 ) f Xn (x n ) If X 1,, X n are n random variables and g is a real-valued function of n variables whose domain is equal to the range of X 1,, X n then Z = g(x 1,, X n ) is a new random variable. If g is a linear function, g(x 1,, x n ) = c 0 +c 1 x 1 + +c n x n then we say that Z = c 0 +c 1 X 1 + +c n X n is a linear function of the random variables X 1,, X n (c 0, c 1, c n are constants). If X 1,, X n are n random variables and Z = c 0 +c 1 X 1 + +c n X n is a linear function of X 1,, X n then E(Z) = E(c 0 + c 1 X 1 + + c n X n ) = c 0 + c 1 E(X 1 ) + + c n E(X n ) If X 1,, X n are n independent random variables and Z = c 0 + c 1 X 1 + + c n X n is a linear function of X 1,, X n then V ar(z) = V ar(c 0 + c 1 X 1 + + c n X n ) = c 2 1V ar(x 1 ) + + c 2 nv ar(x n ) If X 1,, X n are n arbitrary random variables then V ar(c 0 + c 1 X 1 + + c n X n ) = 1.39 Random Sample n n c 2 i V ar(x i ) + 2 c i c j Cov(X i, X j ) i=1 i=1 j<i A random sample is defined as an experiment on which we have defined a collection of n random variables X 1,, X n. After the experiment has been performed, we have n observations x 1,, x n viewed as drawings from X 1,, X n. A random sample is an independent random sample if and only if X 1,, X n are independent random variables. A random sample is an identically distributed random sample if and only if the marginal distributions of X 1,, X n are all the same (each random variable in the random sample has the same distribution). In this case, each random variable X i has the same mean and the same variance for i = 1,, n. A random sample is an independent and identically distributed, IID, random sample if and only if it is an independent and identically distributed random sample. Note: In some texts a random sample is always an IID random sample. 1.40 Random Sample of Vectors A random sample of vectors is defined as an experiment on which we have defined a collection of n random vectors X 1,, X n where each X i = (X i,1,, X i,k ) is a random vector with m random variables, i = 1,, n. In total, the sample consists of nm random variables. After the experiment we have nm observations denoted by x 1,, x n where each x i = (x i,1,, x i,k ). Independence and identical distribution is defined the same as for a random sample of random variables. This means that the sample is an IID sample if every X i,k is independent of each X j,k for i j. Random variables within the same random vector may be dependent. Also, the marginal distribution of X i,k must be the X j,k for all i, j = 1,, n but they need not be the same within the random vector. 16

1.41 Statistics If X 1,, X n is a random sample and g is a real-values function of n variables with a domain that contains the range of X 1,, X n then Θ = g(x 1,, X n ) is called a statistic. As a function of random variables, a statistic is itself a random variable. Once the experiment has been performed, the outcome of the random variables X 1,, X n is observed and denoted by the numbers x 1,, x n. The outcome of the statistic Θ, the observed value of the statistic, can then be calculated as g(x 1,, x n ) which is now a number that we view as a drawing from the random variable Θ. The definition may be extended in a natural way to a random sample of vectors. 1.42 Sample Mean If X 1,, X n is a random sample then we define a statistic called the sample mean, denoted as X, by X = 1 n The observed value of the sample mean is called the observed sample mean and it is denoted by x. 1.43 Sample Variance If X 1,, X n is a random sample then we defines the sample variance, denoted as S 2, by S 2 = 1 n 1 n i=1 X i n (X i X) 2 The observed value of the sample variance is called the observed sample variance and it is denoted by s 2. S = S 2 is called the sample standard deviation (s is the observed sample standard deviation). Never use the term variance for S 2 or s 2. Variance is a property of a random variable while the sample variance S 2 is a function of the random variables constituting a random sample. Sample variance is to sample mean as variance is to expected value. 1.44 Sample Covariance Consider an IID random sample of size n of random vectors of size 2, X 1,, X n with X i = (X i,1, X i,2 ). i=1 We defines the sample covariance, denoted as S 2 1,2, by S 2 1,2 = 1 n 1 n (X i,1 X 1 )(X i,2 X 2 ) i=1 where X 1 is the sample average of X 1,1,, X n,1 and X 2 is the sample average of X 1,2,, X n,2 Once the experiment has been performed, the outcome of S 2 1,2, sometimes called the observed sample covariance, is denoted by the number s 2 1,2 = 1 n 1 n (x i,1 x 1 )(x i,2 x 2 ) where x 1 and x 2 are observed sample averages of the corresponding sub-samples. i=1 17

The matrix ( ) S 2 S = 1 S1,2 2 S2,1 2 S2 2 is called the sample covariance matrix. Here, S 2 1 and S 2 2 are the sample variances of sub-samples 1 and 2. Note that S is symmetric as S 2 2,1 = S 2 1,2. These definitions may be extended to the general case where the random vectors are of size m. The sample covariance matrix will then be an m m matrix. 1.45 Properties of the Sample Mean If X 1,, X n is a random sample and E(X i ) = µ for i = 1,, n then E( X) = µ If X 1,, X n is an IID random sample with E(X i ) = µ and V ar(x i ) = σ 2 for i = 1,, n then E( X) = µ V ar( X) = σ2 n We denote the standard deviation of X by SD( X) and we have SD( X) = V ar( X) = σ n If X 1,, X n is an IID random sample with X i N(µ, σ 2 ) for i = 1,, n then X N(µ, σ 2 /n) (1.1) 1.46 Properties of the Sample Covariance If X 1,, X n is an IID random sample of size 2 with E(X i,1 ) = µ 1, E(X i,2 ) = µ 2, V ar(x i,1 ) = σ 2 1, V ar(x i,2 ) = σ 2 2 and Cov(X i,1, X i,2 ) = σ 2 1,2 for i = 1,, n then E(S 2 1,2) = σ 2 1,2 1.47 The Chi-square Distribution If Z follows a standard normal distribution then Y = Z 2 is said to follow a chi-square distribution with one degree of freedom and we write Y χ 2 1 If Z 1, Z k are k independent standard normal random variables, then Y = is said to follow a chi-square distribution with k degree of freedom and we write If Y χ 2 k then the range of Y is [0, ) If Y χ 2 k then E(Y ) = k and V ar(y ) = 2k k i=1 Y χ 2 k Z 2 i If Y 1 χ 2 k 1 and Y 2 χ 2 k 2 and Y 1 and Y 2 are independent then Y 1 + Y 2 χ 2 k 1+k 2 Excel: If Y χ 2 k then P (Y y): CHISQ.DIST(y,k,TRUE) P (Y y): CHISQ.DIST.RT(y,k) the value y such that P (Y y) = p: CHISQ.INV(p,k) the value y such that P (Y y) = p: CHISQ.INV.RT(p,k) 18

1.48 Properties of the Sample Variance If X 1,, X n is an IID random sample with E(X i ) = µ and V ar(x i ) = σ 2 for i = 1,, n then E(S 2 ) = σ 2 If X 1,, X n is an IID random sample with X i N(µ, σ 2 ) for i = 1,, n then X and S 2 are independent random variables. If X 1,, X n is an IID random sample with X i N(µ, σ 2 ) for i = 1,, n then (n 1)S 2 σ 2 χ 2 n 1 1.49 The Standard Error of the Sample Mean If X 1,, X n is an IID random sample with E(X i ) = µ and V ar(x i ) = σ 2 for i = 1,, n then we define the standard error of the sample mean, denoted by SE( X), as SE( X) = S 2 n = S n (Remember, SD( X) = σ/ n). 1.50 The t-distribution If Z follows a standard normal distribution and Y follows a chi-squared distribution with k degrees of freedom and Z and Y are independent, then t = Z is said to follow a t-distribution with k degrees Y/k of freedom and we write If T t k then the range of T is (, ) If T t k then E(T ) = 0 T t k The probability density function of the t-distribution is symmetric around 0, P (T c) = P (T c) for all constants c. It has fatter tails compared to Z, P (T > c) > P (Z > c) for any constant c > 0. Excel: If T t k then P (T t): T.DIST(t,k,TRUE) P (T t): T.DIST.RT(t,k) P ( T t) = P (T t) + P (T t): T.DIST.2T(t,k) the value t such that P (T t) = p: T.INV(p,k) the value t such that P ( T t) = p: T.INV.2T(p,k) 1.51 The F-distribution If X follows a chi-square distribution with k 1 degrees of freedom and Y follows a chi-squared distribution with k 2 degrees of freedom and X and Y are independent random variables, then F = X/k 1 Y/k 2 is said to follow an F-distribution with k 1 and k 2 degrees of freedom and we write F F (k 1, k 2 ) 19

If F F (k 1, k 2 ) then the range of F is [0, ) Excel: If F F (k 1, k 2 ) then P (F f): F.DIST(f,k1,k2,TRUE) P (F f): F.DIST.RT(f,k1,k2) the value f such that P (F f) = p: F.INV(p,k1,k2) the value f such that P (F f) = p: F.INV.RT(p,k1,k2) 1.52 Statistical Model If X 1,, X n is a random sample with a given probability density/mass function f(x 1,, x n ) which depends on k unknown parameters θ = (θ 1,, θ k ) then taken together we have a (parametric) statistical model or a data generating process. With a statistical model and given values for θ 1, θ k we can simulate an outcome x 1,, x n for a random sample using a computer. Four common statistical models: 1. X 1,, X n is an independent random sample with E(X i ) = µ and V ar(x i ) = σ 2 for i = 1,, n. This model has two unknown parameters, µ, σ 2. 2. X 1,, X n is an IID random sample with X i N(µ, σ 2 ) for i = 1,, n. This is a special case of 1. We write X i IIDN(µ, σ 2 ) i = 1,, n 3. X 1,, X n is an independent random sample with E(X i ) = µ and V ar(x i ) = σ 2 0 for i = 1,, n where σ 2 0 is a known constant. This model has one unknown parameter µ and is a special case of 1. 4. X 1,, X n is an IID random sample with X i N(µ, σ 2 0) for i = 1,, n where σ 2 0 is a known constant. This is a special case of 2 and of 3. We write X i IIDN(µ, σ 2 0) i = 1,, n 1.53 Estimator If Θ = g(x 1,, X n ) is a statistic in a statistical model and we use the outcome of Θ to estimate θ, one of the unknown parameters of the statistical model, then the statistic Θ is called an estimator for θ denoted by ˆθ. The sample mean X is the common estimator of the mean µ in the four common statistical models (see section 1.52). Therefore, X can also be denoted by ˆµ. The sample variance S 2 is the common estimator of the variance σ 2 in the common statistical models when σ 2 is unknown (models 1 and 2). Therefore, S 2 can also be denoted by ˆσ 2. If g is a linear function then Θ = g(x 1,, X n ) is called a linear statistic or a linear estimator if it estimates an unknown parameter. The sample mean is a linear estimator while the sample variance is not. 1.54 Properties of Estimators If θ is an unknown parameter in a statistical model and ˆθ is an estimator of θ then we say that the estimator is unbiased if E(ˆθ) = θ The sample mean X is an unbiased estimator of the mean µ in the four common statistical models. The sample variance S 2 is an unbiased estimator of the variance σ 2 in the common statistical models when σ 2 is unknown (models 1 and 2). 20

If θ is an unknown parameter in a statistical model and ˆθ 1 and ˆθ 2 are two unbiased estimators of the same parameter θ then we say that ˆθ 1 is more efficient than ˆθ 2 if V ar(ˆθ 1 ) V ar(ˆθ 2 ) If θ is an unknown parameter in a statistical model and ˆθ is an unbiased estimator which is more efficient than any other unbiased estimator then ˆθ is called a minimum variance unbiased estimator, MVUE or the best unbiased estimator. If X i IIDN(µ, σ 2 ), i = 1,, n (common statistical model number 2) then X is MVUE of µ and S 2 is MVUE of σ 2. If θ is an unknown parameter in a statistical model and ˆθ is a linear unbiased estimator which is more efficient than any other linear unbiased estimator then ˆθ is called a best linear unbiased estimator, BLUE. If X 1,, X n is an independent random sample with E(X i ) = µ and V ar(x i ) = σ 2 for i = 1,, n (common statistical model number 1) then X is BLUE of µ. 1.55 Standard Error Given: ˆθ1,, ˆθ k which are estimators of the unknown parameters θ 1,, θ k in a statistical model. The variance of each estimator, V ar(ˆθ i ), typically depends on some or all of the unknown parameters and is unknown. If each unknown parameter is replaced by its corresponding estimator we get the estimated variance of ˆθ i denoted by V ˆ ar(ˆθ i ). The square root of the estimated variance of an estimator ˆθ is called the standard error of ˆθ, denoted by SE(ˆθ) In the fundamental statistical model, V ar( X) = σ 2 /n and V ˆ ar( X) = S 2 /n and SE( X) = S/ n as defined before. 1.56 The Z-statistics If X 1,, X n is an IID random sample with E(X i ) = µ and V ar(x i ) = σ0 2 for i = 1,, n (common statistical model number 3) and µ 0 is an arbitrary constant then we define a statistic called the Z-statistic by If µ 0 = µ then Z = X µ 0 SD( X) = X µ 0 σ 0 / n E(Z) = 0 V ar(z) = 1 If X i N(µ, σ 2 0) for i = 1,, n (common statistical model number 4) and µ 0 = µ then Z N(0, 1) Note that if µ 0 µ then E(Z) 0 and Z cannot follow a standard normal. 1.57 Generalizing the Z-statistics If X 1,, X n is a random sample and Θ is a statistic such that E(Θ) = µ Θ and V ar(θ) = σ 2 Θ then for a given constant µ 0 we define the Z-statistic by If Θ N(µ Θ, σ 2 Θ ) and µ 0 = µ Θ then Z N(0, 1), Z = Θ µ 0 σ Θ Z = Θ µ Θ σ Θ N(0, 1) The Z-statistic defined in section 1.56 when X 1,, X n is an IID random sample with E(X i ) = µ and V ar(x i ) = σ 2 is a special case where Θ = X, µ Θ = µ, σ 2 Θ = σ2 /n. 21

1.58 The t-statistics If X 1,, X n is an IID random sample with E(X i ) = µ and V ar(x i ) = σ 2 for i = 1,, n (common statistical model number 1) and µ 0 is an arbitrary constant then we define a statistic called the t-statistic by t = X µ 0 SE( X) = X µ 0 S/ n where S is the square root of the sample variance, see section 1.43. If X i N(µ, σ 2 ) for i = 1,, n (common statistical model number 2) and µ = µ 0 then the t-statistics follows a t-distribution with n 1 degrees of freedom, t = X µ S/ n t n 1 Note that if µ 0 µ then the t-statistic will not follow a t-distribution. 1.59 Generalizing the t-statistics If X 1,, X n is a random sample and Θ is a statistic such that E(Θ) = µ Θ and V ar(θ) = σθ 2 and if ˆσ Θ 2 is a non-negative estimator of σ2 Θ then for a given constant µ 0 we define the t-statistic by where ˆσ Θ = ˆσ Θ 2 is the standard error of Θ. If µ 0 = µ Θ, Θ N(µ Θ, σ 2 Θ ) pˆσ Θ 2 /σ2 Θ χ2 p Θ and ˆσ Θ 2 are independent random variables t = Θ µ 0 ˆσ Θ then the t-statistics follows a t-distribution with p degrees of freedom, t t p The t-statistic for common statistical model number 2 (see section 1.58) is a special case where Θ = X, µ Θ = µ, σ 2 Θ = σ2 /n, ˆσ 2 Θ = S2 /n and p = n 1. 1.60 Critical Values If Z N(0, 1) then we define the critical value z α by P (Z > z α ) = α Because of symmetry, P (Z < z α ) = α as well. In Excel, you can calculate z α using NORM.S.INV(1 α) see section 1.21. If T t n then we define the critical value t α,n by P (T > t α,n ) = α Because of symmetry, P (T < t α,n ) = α as well. In Excel, you can calculate t α,n using T.INV.RT(α, n) see section 1.50. You can calculate the two-tailed critical value t α/2,n using T.INV.2T(α, n) 22

If X χ 2 n then we define the critical value χ 2 α,n by P (X > χ 2 α,n) = α In Excel, you can calculate χ 2 α,n using CHISQ.INV.RT(α, n) see section 1.47. If F F (n 1, n 2 ) then we define the critical value f α,n1,n 2 by P (F > f α,n1,n 2 ) = α In Excel, you can calculate f α,n1,n 2 using F.INV(α, n 1, n 2 ) see section 1.51. 1.61 Hypothesis Testing, Theory Given a statistical model X 1,, X n with unknown parameters θ = (θ 1,, θ k ), a null hypothesis is a subset D 0 of D, where D is the set of all possible values for the unknown parameters. The remaining values, D \ D 0, is called the alternative hypothesis. We say that a null hypothesis is true if the true value of θ is inside D 0. Otherwise we say that it is false. If D 0 contains only a single point, then the null hypothesis is called simple. If at least one of the unknown parameters is restricted to a given value in D 0, then the null hypothesis is called sharp. You test a given hypothesis by defining a test statistic Θ = g(x 1,, X n ) defining a rejection region (or a critical region) which is a subset of the real numbers rejecting the null hypothesis if the outcome of the test statistic falls into the rejection region. If it does not we say that we fail to reject or do not reject the null hypothesis. We never accept a null hypothesis. Rejection of the null hypothesis when it is true is called a Type I error. The probability of committing a type I error is denoted by α. This probability is also called the size of the critical region and the level of significance of the test. Non rejection of the null hypothesis when it is false is called a Type II error. committing a type II error is denoted by β. 1 β is called the power of the test. The probability of It is common to first select the level of significance α and then to choose the rejection region in such a way that the probability of committing a type I error is precisely α. Once the outcome of the test statistic has been calculated, the p-value of the test is the level of significance such that the value of the test statistic is equal to one of the end-points of the critical region such that we are indifferent between rejecting the null hypothesis and not rejecting the null hypothesis. 23

1.62 Hypothesis Tests Involving the Mean Throughout this section, X 1,, X n is an IID random sample with X i N(µ, σ 2 ) for i = 1,, n where µ is unknown while σ 2 may be known or not known. The null hypothesis is always the subset restricted by µ = µ 0 where µ 0 is a given number. The null hypothesis is written as H 0 : µ = µ 0 If all values for µ are possible then the alternative hypothesis is the subset restricted by µ µ 0 denoted by H 1 : µ µ 0 This is called a two-sided alternative. If restrict µ to µ µ 0 then the alternative hypothesis is the subset restricted by µ < µ 0 denoted by This is called a one-sided alternative. H 1 : µ < µ 0 Similarly, if we restrict µ to µ µ 0 then the one-sided alternative is All tests have a level of significance equal to α. H 1 : µ > µ 0 1. σ 2 is known, two sided alternative H 1 : µ µ 0 Test statistic: The Z statistic Under the null: Z N(0, 1) Critical region: (, z α/2 ) and (z α/2, ) (see section 1.60) 2. σ 2 is known, one sided alternative H 1 : µ < µ 0 Test statistic: The Z statistic Under the null: Z N(0, 1) Critical region: (, z α ) 3. σ 2 is known, one sided alternative H 1 : µ > µ 0 Test statistic: The Z statistic Under the null: Z N(0, 1) Critical region: (z α, ) 4. σ 2 is unknown, two sided alternative H 1 : µ µ 0 Test statistic: The t statistic Under the null: t t n 1 Critical region: (, t α/2,n 1 ) and (t α/2,n 1, ) (see section 1.60) 5. σ 2 is unknown, one sided alternative H 1 : µ < µ 0 Test statistic: The t statistic Under the null: t t n 1 Critical region: (, t α,n 1 ) 6. σ 2 is unknown, one sided alternative H 1 : µ > µ 0 Test statistic: The t statistic Under the null: t t n 1 Critical region: (t α,n 1, ) 24

1.63 Confidence Intervals Given a statistical model X 1,, X n where θ is one of the unknown parameters, an interval estimate of θ is an interval of the form Θ 1 < θ < Θ 2 where Θ 1 and Θ 2 are two statistics such that Θ 1 < Θ 2 always holds. If Θ 1 < θ < Θ 2 is an interval estimate of θ such that P r(θ 1 < θ < Θ 2 ) = 1 α then Θ 1 < θ < Θ 2 is a (1 α)100% confidence interval for θ. In general, the smaller the α, the bigger the confidence interval. If X 1,, X n is an IID random sample with X i N(µ, σ 2 ) for i = 1,, n where σ 2 is known then X z α/2 σ n < µ < X + z α/2 σ n is a (1 α)100% confidence interval for µ. If X 1,, X n is an IID random sample with X i N(µ, σ 2 ) for i = 1,, n where σ 2 is unknown then X t α/2,n 1 S n < µ < X + t α/2,n 1 S n is a (1 α)100% confidence interval for µ. S 2 is the sample variance, see section 1.43. If X 1,, X n is an IID random sample with X i N(µ, σ 2 ) for i = 1,, n where σ 2 is known or unknown then the null hypothesis H 0 : µ = µ 0 against the two-sided alternative H 1 : µ µ 0 will be rejected at the level of significance α if and only if µ 0 is outside the (1 α)100% confidence interval for µ. 1.64 Asymptotically Unbiased Estimator X 1,, X n is a random sample of a statistical model with unknown parameters θ = (θ 1,, θ k ). To investigate how a given statistic Θ or a given estimator ˆθ depends on the sample size, we will add a subscript n and write Θ n or ˆθ n If θ is an unknown parameter in a statistical model of sample size n and ˆθ n is an estimator of θ then we say that the estimator is asymptotically unbiased if E(ˆθ n ) θ as n An unbiased estimator is always asymptotically unbiased but the opposite is not necessarily true. 1.65 Consistency If X 1,, X n is a random sample and Θ n is a given statistic then we say that Θ n converges in probability to a constant c if for each ε > 0 P ( Θ n c < ε) 1 as n We then write plim Θ n = c If θ is an unknown parameter in a statistical model of sample size n and ˆθ n is an estimator of θ then we say that the estimator is a consistent estimator of θ if (The opposite is not true in general). plim θ n = θ 25

If E(Θ n ) c and V ar(θ n ) 0 as n then Θ n converges in probability to c. If ˆθ n is an asymptotically unbiased estimator of θ and V ar(ˆθ n ) 0 as n then ˆθ n is a consistent estimator of θ. If X 1,, X n is an IID random sample with E(X i ) = µ and V ar(x i ) = σ 2 for i = 1,, n then X is a consistent estimator of µ and S 2 is a consistent estimator of σ 2. 1.66 plim Rules If X 1,, X n is a random sample and Θ n is a given statistics such that plim Θ n = c then for any arbitrary function f : R R. plim f(θ n ) = f(c) If X 1,, X n is a random sample and Θ 1,, Θ k are k statistics (subscript n removed for clarity) such that plim Θ j = c j for j = 1,, k then for any arbitrary function g : R k R. plim g(θ 1,, Θ k ) = g(c 1,, c k ) In particular, plim Θ 1 Θ 2 = c 1 c 2 plim Θ 1 Θ 2 = c 1 c 2 Keep in mind that the plim-rules do not apply unless the statistics converge to a constant. 1.67 Convergence in Distribution If X 1, X 2 is an infinite sequence of random variables then we say that they converge in distribution to a random variable X if lim n F n(x) = F (x) for every x at which F is continuous. Here, F n is the cdf of X n and F is the cdf of X. 1.68 Central Limit Theorem If X 1,, X n is an IID random sample with E(X i ) = µ and V ar(x i ) = σ 2 for i = 1,, n then the Z-statistic converges in distribution to the standard normal. Formally, if Z n = X n µ σ/ n then Z n N(0, 1) where convergence is in distribution. With the assumptions of this section and with n large enough, X is approximately normally distributed with mean µ and variance σ 2 /n. With the assumptions of this section and with n large enough, the t-statistic t = X n µ S/ n is approximately t-distributed with n 1 degrees of freedom. 26