Stat 704 Data Analysis I Probability Review

Similar documents
Course information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment.

Lecture 2. October 21, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Probability measures A probability measure, P, is a real valued function from the collection of possible events so that the following

Continuous random variables

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

Analysis of Experimental Designs

ECON Fundamentals of Probability

Probability Theory and Statistics. Peter Jochumzen

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Lecture 2: Review of Probability

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Lecture 2: Repetition of probability theory and statistics

Statistics and Sampling distributions

STA 256: Statistics and Probability I

Random Variables and Their Distributions

7 Random samples and sampling distributions

STAT 4385 Topic 01: Introduction & Review

Econ 371 Problem Set #1 Answer Sheet

Lecture 1: August 28

STAT Chapter 5 Continuous Distributions

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Chapter 4 Multiple Random Variables

Algorithms for Uncertainty Quantification

1 Probability theory. 2 Random variables and probability theory.

Will Landau. Feb 28, 2013

1: PROBABILITY REVIEW

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

18.05 Exam 1. Table of normal probabilities: The last page of the exam contains a table of standard normal cdf values.

This does not cover everything on the final. Look at the posted practice problems for other topics.

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Chapter 2. Continuous random variables

Probability and Distributions

Discrete Probability distribution Discrete Probability distribution

Introduction to Normal Distribution

CHAPTER 6 SOME CONTINUOUS PROBABILITY DISTRIBUTIONS. 6.2 Normal Distribution. 6.1 Continuous Uniform Distribution

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Preliminary Statistics. Lecture 3: Probability Models and Distributions

6.041/6.431 Fall 2010 Quiz 2 Solutions

Test Problems for Probability Theory ,

Math438 Actuarial Probability

Chapter 11 Sampling Distribution. Stat 115

LECTURE 1. Introduction to Econometrics

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

Lecture 12: Small Sample Intervals Based on a Normal Population Distribution

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Def 1 A population consists of the totality of the observations with which we are concerned.

Probability Review. Chao Lan

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Preliminary Statistics Lecture 3: Probability Models and Distributions (Outline) prelimsoas.webs.com

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Review of Probability. CS1538: Introduction to Simulations

Continuous Random Variables

Introduction to bivariate analysis

Bias Variance Trade-off

MAS223 Statistical Inference and Modelling Exercises

conditional cdf, conditional pdf, total probability theorem?

Statistical Hypothesis Testing

Introduction to bivariate analysis

Continuous Distributions

1.1 Review of Probability Theory

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Semester , Example Exam 1

3 Continuous Random Variables

2 Functions of random variables

COMPSCI 240: Reasoning Under Uncertainty

18.05 Practice Final Exam

STA 260: Statistics and Probability II

MATH Solutions to Probability Exercises

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fundamental Tools - Probability Theory II

Math Review Sheet, Fall 2008

Statistics for Engineers Lecture 5 Statistical Inference

Sampling Distributions of Statistics Corresponds to Chapter 5 of Tamhane and Dunlop

Continuous Random Variables

Contents 1. Contents

Stat 710: Mathematical Statistics Lecture 31

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

If we want to analyze experimental or simulated data we might encounter the following tasks:

Non-parametric Inference and Resampling

STAT 509 Section 3.4: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Homework for 1/13 Due 1/22

Brief Review of Probability

An introduction to biostatistics: part 1

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

Review: mostly probability and some statistics

BASICS OF PROBABILITY

More than one variable

Review of Probability Theory

SOME SPECIFIC PROBABILITY DISTRIBUTIONS. 1 2πσ. 2 e 1 2 ( x µ

2 (Statistics) Random variables

General Random Variables

Statistics, Data Analysis, and Simulation SS 2015

Transcription:

1 / 39 Stat 704 Data Analysis I Probability Review Dr. Yen-Yi Ho Department of Statistics, University of South Carolina

A.3 Random Variables 2 / 39 def n: A random variable is defined as a function that maps an outcome from some random phenomenon to a real number. More formally, a random variable is a map or function from the sample space of an experiment, S, to some subset of the real numbers R R. Restated: A random variable assigns a measurement to the result of a random phenomenon. Example 1: The starting salary (in thousands of dollars) Y for a new tenure-track statistics assistant professor. Example 2: The number of students N who accept offers of admission to USC s graduate statistics program.

cdf, pdf, pmf Every random variable has a cumulative distribution function (cdf) associated with it: F(y) = P(Y y). Discrete random variables have a probability mass function (pmf) f (y) = P(Y = y) = F(y) F(y ) = F(y) lim x y F(x). Continuous random variables have a probability density function (pdf) such that for a < b P(a Y b) = b a f (y)dy. For continuous random variables, f (y) = F (y). Question: Are the two examples on the previous slide continuous or discrete? 3 / 39

Example 1 4 / 39 Let X be the result of a coin flip where X = 0 represents tails and X = 1 represent heads. p(x) = (0.5) x (0.5) 1 x for x = 0, 1 Suppose that we do not know whether or not the coin is fair. Let θ be the probability of a head p(x) = (θ) x (1 θ) 1 x for x = 0, 1

Example 2 5 / 39 Assume that the time in years from diagnosis until death of a person with a specific kind of cancer follows a density like { e x/5 f (x) = 5 for x > 0 0 otherwise Is this a valid density? 1 e raised to any power is always positive 2 0 f (x) dx = 0 e x/5 /5 dx = e x/5 = 1 0

Example 2 (Continued) 6 / 39 What s the probability that a randomly selected person from this distribution survives more than 6 years? P(X 6) = Approximiate in R 6 e t 5 5 dt = e t/5 pexp(6, 1/5, lower.tail=false) 6 = e 6/5 0.301

Quantiles 7 / 39 The α th quantiles of a distribution with distribution function F is the point x α so that F(x α ) = α A percentile is simple a quantile with α expressed as a percent The median is the 50 th percentile

Example 2 (Continued) 8 / 39 What is the 25 th percentile of the exponential distribution considered before? The cumulative density function is F(x) = x 0 e t/5 5 dt = e t/5 x 0 = 1 e x/5 We want to solve for x: 0.25 = F(x) = 1 e x/5 x = log(0.75) 5 1.44. Therefore, 25% of the patients from this population live less than 1.44 years. R can approximate exponential quantile for you qexp(0.25, 1/5)

A.3 Expected value 9 / 39 The expected value, or mean of a random variable is a weighted average according to its probability distribution. It is in general, defined as E{Y } = y df (y). For discrete random variables, this is E{Y } = y f (y). y:f (y)>0 For continuous random variables this is E{Y } = y f (y)dy. (A.12) (A.14)

A.3 E{ } is linear 10 / 39 Note: If a and c are constants, E{a + cy } = a + ce{y }. (A.13) In particular, E(a) = a E{cY } = ce{y } E{Y + a} = E{Y } + a

A.3 Variance The variance of a random variable measures the spread of its probability distribution. It is the expected squared deviation about the mean: σ 2 {Y } = E{(Y E{Y }) 2 } Equivalently, σ 2 {Y } = E{Y 2 } (E{Y }) 2 Note: If a and c are constants, σ 2 {a + cy } = c 2 σ 2 {Y } (A.15) (A.15a) (A.16) In particular, σ 2 {a} = 0 σ 2 {cy } = c 2 σ 2 {Y } σ 2 {Y + a} = σ 2 {Y } Note: The standard deviation of Y is σ{y } = σ 2 {Y }. 11 / 39

A.3 Example 12 / 39 Suppose Y is the high temperature in Celsius of a September day in Seattle. Say E(Y ) = 20 and var(y ) = 5. Let W be the high temperature in Fahrenheit. Then { } 9 E{W } = E 5 Y + 32 = 9 5 E{Y } + 32 = 9 20 + 32 = 68 degrees. 5 { } ( ) 9 9 2 σ 2 {W } = σ 2 5 Y + 32 = σ 2 {Y } = 3.24(5) = 16.2 degrees 2. 5 σ{w } = σ 2 {W } = 16.2 = 4.02 degrees.

A.3 Covariance 13 / 39 For two random variables Y and Z, the covariance of Y and Z is σ{y, Z } = E{(Y E{Y })(Z E{Z })}. Note σ{y, Z } = E{YZ } E{Y }E{Z } (A.21) If Y and Z have positive covariance, lower values of Y tend to correspond to lower values of Z (and large values of Y with large values of Z ). Example: Y is work experience in years and Z is salary in e. If Y and Z have negative covariance, lower values of Y tend to correspond to higher values of Z and vice versa. Example: Y is the weight of a car in tons and Z is miles per gallon.

A.3 Covariance is linear 14 / 39 If a 1, c 1, a 2, c 2 are constants, σ{a 1 + c 1 Y, a 2 + c 2 Z } = c 1 c 2 σ{y, Z } (A.22) Note: by definition σ{y, Y } = σ 2 {Y }. The correlation coefficient between Y and Z is the covariance scaled to be between 1 and 1: ρ{y, Z } = σ{y, Z } σ{y }σ{z } (A.25a) If ρ{y, Z } = 0 then Y and Z are uncorrelated.

A.3 Independent random variables 15 / 39 Informally, two random variables Y and Z are independent if knowing the value of one random variable does not affect the probability distribution of the other random variable. Note: If Y and Z are independent, then Y and Z are uncorrelated; i.e., ρ{y, Z } = 0. However, ρ{y, Z } = 0 does not imply independence in general. If Y and Z have a bivariate normal distribution then σ{y, Z } = 0 Y, Z independent. Question: what is the formal definition of independence for (Y, Z )?

A.3 Linear combinations of random variables 16 / 39 Suppose Y 1, Y 2,..., Y n are random variables and a 1, a 2,..., a n are constants. Then { n } n E a i Y i = a i E{Y i }. (A.29a) i=1 i=1 That is, E {a 1 Y 1 + a 2 Y 2 + + a n Y n } = a 1 E{Y 1 }+a 2 E{Y 2 }+ +a n E{Y n }. Also, σ 2 { n i=1 a i Y i } = n n a i a j σ{y i, Y j } i=1 j=1 (A.29b)

17 / 39 { n } n n σ 2 a i Y i = E[ a i Y i E( a i Y i )] 2 i=1 i=1 i=1 n n = E{ [a i Y i E(a i Y i )]} 2 = E{a i Y i E(a i Y i )} 2 + = i=1 i=1 n E[a i Y i E(a i Y i )][(a j Y j ) E(a j Y j )] i j n i=1 j=1 n a i a j σ{y i, Y j }

A.3 Linear combinations of random variables 18 / 39 For two random variables (A.30a & b) E{a 1 Y 1 + a 2 Y 2 } = a 1 E{Y 1 } + a 2 E{Y 2 }, σ 2 {a 1 Y 1 + a 2 Y 2 } = a 2 1 σ2 {Y 1 } + a 2 2 σ2 {Y 2 } + 2a 1 a 2 σ{y 1, Y 2 }. Note: if Y 1,..., Y n are all independent (or even just uncorrelated), then σ 2 { n i=1 a i Y i } = n ai 2 σ2 {Y i }. i=1 (A.31) Also, if Y 1,..., Y n are all independent, then { n } n n σ a i Y i, c i Y i = a i c i σ 2 {Y i }. i=1 i=1 i=1 (A.32)

A.3 Important example Suppose Y 1,..., Y n are independent random variables, each with mean µ and variance σ 2. Define the sample mean as Ȳ = 1 n n i=1 Y i. Then E{Ȳ } = E { 1 n Y 1 + + 1 n Yn } = 1 n E{Y 1} + + 1 n E{Yn} = 1 n µ + + 1 n µ ( ) 1 = n n µ = µ. σ 2 {Ȳ } = σ2 { 1 n Y 1 + + 1 n Yn } = 1 n 2 σ2 {Y 1 } + + 1 n 2 σ2 {Y n} ( ) 1 = n n 2 σ2 = σ2 n. (Casella & Berger pp. 212 214) 19 / 39

A.3 Central Limit Theorem 20 / 39 The Central Limit Theorem takes this a step further. When Y 1,..., Y n are independent and identically distributed (i.e. a random sample) from any distribution such that E{Y i } = µ and σ 2 {Y i } = σ 2, and n is reasonably large, Ȳ N ) (µ, σ2, n where is read as approximately distributed as. Note that E{Ȳ } = µ and σ2 {Ȳ } = σ2 n as on the previous slide. The CLT slaps normality onto Ȳ. Formally, the CLT states n( Ȳ µ) D N(0, σ 2 ). (Casella & Berger pp. 236 240)

Section A.4 Gaussian & related distributions 21 / 39 Normal distribution (Casella & Berger pp. 102 106) A random variable Y has a normal distribution with mean µ and standard deviation σ, denoted Y N(µ, σ 2 ), if it has the pdf { 1 f (y) = exp 1 ( ) } y µ 2, 2πσ 2 2 σ for < y <. Here, µ R and σ > 0. Note: If Y N(µ, σ 2 ) then Z = Y µ σ N(0, 1) is said to have a standard normal distribution.

A.4 Sums of independent normals Note: If a and c are constants and Y N(µ, σ 2 ), then a + cy N(a + cµ, c 2 σ 2 ). Note: If Y 1,..., Y n are independent normal such that Y i N(µ i, σ 2 i ) and a 1,..., a n are constants, then ( n n a i Y i = a 1 Y 1 + + a n Y n N a i µ i, i=1 i=1 n i=1 a 2 i σ2 i Example: Suppose Y 1,..., Y n are iid from N(µ, σ 2 ). Then (Casella & Berger p. 215) Ȳ N ) (µ, σ2. n ). 22 / 39

Exercise 23 / 39 Let Y 11,..., Y 1n1 iid N(µ1, σ 2 1 ) independent of Y 21,..., Y 2n2 iid N(µ2, σ 2 2 ) and set Ȳi = n i j=1 Y ij/n i, i = 1, 2 1 What is E{Ȳ1 Ȳ2}? 2 What is σ 2 {Ȳ1 Ȳ2}? 3 What is the distribution of Ȳ1 Ȳ2?

Exercise 24 / 39 Consider X N(0, 1) and Z N(0, 1), X Z Let Y = ρx + 1 ρ 2 Z What is 1 σ 2 {Y } 2 σ{x, Y } 3 ρ{x, Y }

A.4 χ 2 distribution 25 / 39 def n: If Z 1,..., Z ν iid N(0, 1), then X = Z 2 1 + + Z 2 ν χ 2 ν, chi-square with ν degrees of freedom. Note: E(X) = ν and var(x) = 2ν. Plot of χ 2 1, χ2 2, χ2 3, χ2 4 PDFs: 0.5 0.4 0.3 0.2 0.1 2 4 6 8 10

A.4 t distribution 26 / 39 def n: If Z N(0, 1) independent of X χ 2 ν then T = Z X/ν t ν, t with ν degrees of freedom. Note that E(T ) = 0 for ν 2 and var(t ) = t 1, t 2, t 3, t 4 PDFs: ν ν 2 for ν 3. 0.35 0.3 0.25 0.2 0.15 0.1 0.05-4 -2 2 4

A.4 F distribution 27 / 39 def n: If X 1 χ 2 ν 1 independent of X 2 χ 2 ν 2 then F = X 1/ν 1 X 2 /ν 2 F ν1,ν 2, F with ν 1 degrees of freedom in the numerator and ν 2 degrees of freedom in the denominator. Note: The square of a t ν random variable is an F 1,ν random variable. Proof: [ ] 2 tν 2 Z = = Z 2 χ 2 ν /ν χ 2 ν/ν = χ2 1 /1 χ 2 ν/ν = F 1,ν. Note: E(F) = ν 2 /(ν 2 2) for ν 2 > 2. Variance is function of ν 1 and ν 2 and a bit more complicated. Question: If F F(ν 1, ν 2 ), what is F 1 distributed as?

Relate plots to E(F ) = ν 2 /(ν 2 2) 28 / 39 F 2,2, F 5,5, F 5,20, F 5,200 PDFs: 1 0.8 0.6 0.4 0.2 1 2 3 4

A.6 Normal population inference 29 / 39 A model for a single sample Suppose we have a random sample Y 1,..., Y n of observations from a normal distribution with unknown mean µ and unknown variance σ 2. We can model these data as Y i = µ + ɛ i, i = 1,..., n, where ɛ i N(0, σ 2 ). Often we wish to obtain inference for the unknown population mean µ, e.g. a confidence interval for µ or hypothesis test H 0 : µ = µ 0.

A.6 Standardize Ȳ to get t random variable 30 / 39 Let s 2 = 1 n 1 n i=1 (Y i Ȳ )2 be the sample variance and s = s 2 be the sample standard deviation. Fact: (n 1)s2 σ 2 = 1 n σ 2 i=1 (Y i Ȳ )2 has a χ 2 n 1 distribution (this can be shown using results from linear models). Fact: Ȳ µ σ/ has a N(0, 1) distribution. n Fact: Ȳ is independent of s2. So then any function of Ȳ is independent of any function of s 2. Therefore [ ] Ȳ µ σ/ n 1 σ 2 n i=1 (Y i Ȳ )2 n 1 (Casella & Berger Theorem 5.3.1, p. 218) = Ȳ µ s/ n t n 1.

A.6 Building a confidence interval 31 / 39 Let 0 < α < 1, typically α = 0.05. Let t n 1 (1 α/2) be such that P(T t n 1 ) = 1 α/2 for T t n 1. t n 1 quantiles t n 1 density 1 α α 2 α 2 t n 1(1 α 2) 0 t n 1(1 α 2)

A.6 Confidence interval for µ 32 / 39 Under the model Y i = µ + ɛ i, i = 1,..., n, where ɛ i N(0, σ 2 ), ) s/ n t n 1(1 α/2) ( 1 α = P t n 1 (1 α/2) Ȳ µ ( = P s t n 1 (1 α/2) Ȳ µ n = P ( Ȳ s n t n 1 (1 α/2) µ Ȳ + s n t n 1 (1 α/2) ) s n t n 1 (1 α/2) )

A.6 Confidence interval for µ 33 / 39 So a (1 α)100% random probability interval for µ is Ȳ ± t n 1 (1 α/2) s n where t n 1 (1 α/2) is the (1 α/2)th quantile of a t n 1 random variable: i.e. the value such that P(T < t n 1 (1 α/2)) = 1 α/2 where T t n 1. This, of course, turns into a confidence interval after Ȳ = ȳ and s 2 are observed, and no longer random.

A.6 Standardizing with Ȳ instead of µ 34 / 39 Note: If Y 1,..., Y n iid N(µ, σ 2 ), then: n ( ) Yi µ 2 χ 2 σ n, i=1 and n ( Yi Ȳ i=1 σ ) 2 χ 2 n 1.

Confidence interval example 35 / 39 Say we collect n = 30 summer daily high temperatures and obtain ȳ = 77.667 and s = 8.872. To obtain a 90% CI, we need, where α = 0.10, yielding t 29 (1 α/2) = t 29 (0.95) = 1.699, ( ) 8.872 77.667 ± (1.699) (74.91, 80.42). 30 Interpretation: With 90% confidence, the interval between 74.91 and 80.42 degrees covers the true mean high temperature.

Example: Page 645 36 / 39 n = 63 faculty voluntarily attended a summer workshop on case teaching methods (out of 110 faculty total). At the end of the following academic year, their teaching was evaluated on a 7-point scale (1=really bad to 7=outstanding). Calculate the mean and confidence interval for the mean for only the Attended cases.

R code 37 / 39 ############################## # Example 2, p. 645 (Chapter 15) ############################### scores<-c(4.8, 6.4, 6.3, 6.0, 5.4, 5.8, 6.1, 6.3, 5.0, 6.2, 5.6, 5.0, 6.4, 5.8, 5.5, 6.1, 6.0, 6.0, 5.4, 5.8, 6.5, 6.0, 6.1, 4.7, 5.6, 6.1, 5.8, 4.8, 5.9, 5.4, 5.3, 6.0, 5.6, 6.3, 5.2, 6.0, 6.4, 5.8, 4.9, 4.1, 6.0, 6.4, 5.9, 6.6, 6.0, 4.4, 5.9, 6.5, 4.9, 5.4, 5.8, 5.6, 6.2, 6.3, 5.8, 5.9, 6.5, 5.4, 5.9, 6.1, 6.6, 4.7, 5.5, 5.0, 5.5, 5.7, 4.3, 4.9, 3.4, 5.1, 4.8, 5.0, 5.5, 5.7, 5.0, 5.2, 4.2, 5.7, 5.9, 5.8, 4.2, 5.7, 4.8, 4.6, 5.0, 4.9, 6.3, 5.6, 5.7, 5.1, 5.8, 3.8, 5.0, 6.1, 4.4, 3.9, 6.3, 6.3, 4.8, 6.1, 5.3, 5.1, 5.5, 5.9, 5.5, 6.0, 5.4, 5.9, 5.5, 6.0) status<-c(rep("attended", 63), rep("notattend", 47)) dat<-data.frame(scores, status) str(dat)

R code 38 / 39 ############ make a side-by-side dotplot keep<-which(dat[,2]=="attended") pdf("teaching.pdf") stripchart(dat[,1] dat[,2], pch=21, method="jitter", jitter=0.2, vertical=true, ylab="teaching Score", xlab="attendance Status", ylim=c(min(dat[,1]), max(dat[,1]))) abline(v=1, col="grey", lty=2) abline(v=2, col="grey", lty=2) lines(x=c(0.9, 1.1), rep(mean(dat[keep,1]),2), col=4) lines(x=c(1.9, 2.1), rep(mean(dat[-keep,1]),2), col=4) dev.off() ########### calculate CI for scores from Attended cases dat2<-dat[keep,] str(dat2) t.test(dat2[,1])

Attended NotAttend 3.5 4.0 4.5 5.0 5.5 6.0 6.5 Attendance Status Teaching Score 39 / 39