Precept 4: Hypothesis Testing

Similar documents
Midterm Exam 1 Solution

Mathematical Statistics

Lecture 1: Probability Fundamentals

Things to remember when learning probability distributions:

Hypothesis Testing. ECE 3530 Spring Antonio Paiva

Chapter 5. Chapter 5 sections

Single Maths B: Introduction to Probability

1 Basic continuous random variable problems

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual

CME 106: Review Probability theory

Design of Engineering Experiments

Random Variables and Their Distributions

Lecture 2: Repetition of probability theory and statistics

Class 26: review for final exam 18.05, Spring 2014

Analysis of Engineering and Scientific Data. Semester

LECTURE 5. Introduction to Econometrics. Hypothesis testing

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

1 Basic continuous random variable problems

Chapter 5 continued. Chapter 5 sections

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Ch3. Generating Random Variates with Non-Uniform Distributions

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

Lecture 2: Review of Probability

Exam 2 Practice Questions, 18.05, Spring 2014

Distributions of linear combinations

Evaluating Hypotheses

This does not cover everything on the final. Look at the posted practice problems for other topics.

Probability and Distributions

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Advanced Herd Management Probabilities and distributions

Lecture 20 Random Samples 0/ 13

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Practice Problems Section Problems

ECE 302, Final 3:20-5:20pm Mon. May 1, WTHR 160 or WTHR 172.

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

Last few slides from last time

CSE 312 Final Review: Section AA

MAS223 Statistical Inference and Modelling Exercises

EECS 126 Probability and Random Processes University of California, Berkeley: Fall 2014 Kannan Ramchandran September 23, 2014.

Probability Review. Chao Lan

EECS 126 Probability and Random Processes University of California, Berkeley: Spring 2015 Abhay Parekh February 17, 2015.

MAT 271E Probability and Statistics

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

STAT 414: Introduction to Probability Theory

Review of Probability. CS1538: Introduction to Simulations

Review of probabilities

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability and Probability Distributions. Dr. Mohammed Alahmed

Brief Review of Probability

Preliminary Statistics. Lecture 5: Hypothesis Testing

Guidelines for Solving Probability Problems

STAT Chapter 5 Continuous Distributions

(Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3)

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 2

Terminology. Experiment = Prior = Posterior =

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Introductory Econometrics. Review of statistics (Part II: Inference)

Probability Distributions Columns (a) through (d)

Bias Variance Trade-off

(Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3)

Probability Density Functions and the Normal Distribution. Quantitative Understanding in Biology, 1.2

Quick Tour of Basic Probability Theory and Linear Algebra

ECE 313 Probability with Engineering Applications Fall 2000

Page Max. Possible Points Total 100

18.05 Practice Final Exam

STAT 418: Probability and Stochastic Processes

Math 105 Course Outline

18.440: Lecture 28 Lectures Review

Midterm Exam 1 (Solutions)

MATH Solutions to Probability Exercises

2 Random Variable Generation

IEOR 6711: Stochastic Models I SOLUTIONS to the First Midterm Exam, October 7, 2008

STAT 4385 Topic 01: Introduction & Review

S n = x + X 1 + X X n.

ISyE 6644 Fall 2014 Test #2 Solutions (revised 11/7/16)

Algorithms for Uncertainty Quantification

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Math 50: Final. 1. [13 points] It was found that 35 out of 300 famous people have the star sign Sagittarius.

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

Assignment 9. Due: July 14, 2017 Instructor: Dr. Mustafa El-Halabi. ! A i. P (A c i ) i=1

SDS 321: Introduction to Probability and Statistics

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Statistical Methods in Particle Physics

3 Multiple Discrete Random Variables

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Review of probability and statistics 1 / 31

Discrete Probability Distributions

CENTRAL LIMIT THEOREM (CLT)

18.05 Exam 1. Table of normal probabilities: The last page of the exam contains a table of standard normal cdf values.

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

Applied Quantitative Methods II

ECON Fundamentals of Probability

6.4 Type I and Type II Errors

A brief review of basics of probabilities

Probability and Statistical Decision Theory

Lectures on Elementary Probability. William G. Faris

1 Probability theory. 2 Random variables and probability theory.

Lecture 16. Lectures 1-15 Review

PRACTICE PROBLEMS FOR EXAM 2

Transcription:

Precept 4: Hypothesis Testing Soc 500: Applied Social Statistics Ian Lundberg Princeton University October 6, 2016

Learning Objectives 1 Introduce vectorized R code 2 Review homework and talk about RMarkdown 3 Review conceptual ideas of hypothesis testing 4 Practice for next homework 5 If time allows...random variables!

Expected value, bias, consistency (relevant to homework) Suppose we flip a fair coin n times and estimate the proportion of heads as ˆπ = H 1 + H 2 +... + H n + 1 n Derive the expected value of this estimator ( ) H1 + H 2 +... + H n + 1 E(ˆπ) = E n = 1 n E(H 1 + H 2 +... + H n + 1) = 1 n (E[H 1] + E[H 2 ] +... + E[H n ] + E[1]) = 1 (0.5 + 0.5 +... + 0.5 + 1) n = 1 (n(0.5) + 1) n = 0.5 + 1 n

Expected value, bias, consistency (relevant to homework) ˆπ = H 1 + H 2 +... + H n + 1 n E(ˆπ) = 0.5 + 1 n Is this estimator biased? What is the bias?

Expected value, bias, consistency (relevant to homework) ˆπ = H 1 + H 2 +... + H n + 1 n E(ˆπ) = 0.5 + 1 n Is this estimator biased? What is the bias? Bias = Expected value - Truth

Expected value, bias, consistency (relevant to homework) ˆπ = H 1 + H 2 +... + H n + 1 n E(ˆπ) = 0.5 + 1 n Is this estimator biased? What is the bias? Bias = Expected value - Truth = E(ˆπ) π

Expected value, bias, consistency (relevant to homework) ˆπ = H 1 + H 2 +... + H n + 1 n E(ˆπ) = 0.5 + 1 n Is this estimator biased? What is the bias? Bias = Expected value - Truth = = E(ˆπ) π ( 0.5 + 1 ) 0.5 = 1 n n

Expected value, bias, consistency (relevant to homework) ˆπ = H 1 + H 2 +... + H n + 1 n Derive the variance of this estimator. Remember, Var(X 1 + X 2 ) = Var(X 1 ) + Var(X 2 ) + 2Cov(X 1, X 2 ) and Var(aX ) = a 2 Var(X )!

Expected value, bias, consistency (relevant to homework) ˆπ = H 1 + H 2 +... + H n + 1 n Derive the variance of this estimator. Remember, Var(X 1 + X 2 ) = Var(X 1 ) + Var(X 2 ) + 2Cov(X 1, X 2 ) and Var(aX ) = a 2 Var(X )! ( ) H1 + H 2 +... + H n + 1 V (ˆπ) = V n

Expected value, bias, consistency (relevant to homework) ˆπ = H 1 + H 2 +... + H n + 1 n Derive the variance of this estimator. Remember, Var(X 1 + X 2 ) = Var(X 1 ) + Var(X 2 ) + 2Cov(X 1, X 2 ) and Var(aX ) = a 2 Var(X )! ( ) H1 + H 2 +... + H n + 1 V (ˆπ) = V n = 1 n 2 [V (H 1 + H 2 +... + H n + 1)]

Expected value, bias, consistency (relevant to homework) ˆπ = H 1 + H 2 +... + H n + 1 n Derive the variance of this estimator. Remember, Var(X 1 + X 2 ) = Var(X 1 ) + Var(X 2 ) + 2Cov(X 1, X 2 ) and Var(aX ) = a 2 Var(X )! ( ) H1 + H 2 +... + H n + 1 V (ˆπ) = V n = 1 n 2 [V (H 1 + H 2 +... + H n + 1)] = 1 n 2 [V (H 1) + V (H 2 ) +... + V (H n ) + V (1)] (since independent)

Expected value, bias, consistency (relevant to homework) ˆπ = H 1 + H 2 +... + H n + 1 n Derive the variance of this estimator. Remember, Var(X 1 + X 2 ) = Var(X 1 ) + Var(X 2 ) + 2Cov(X 1, X 2 ) and Var(aX ) = a 2 Var(X )! ( ) H1 + H 2 +... + H n + 1 V (ˆπ) = V n = 1 n 2 [V (H 1 + H 2 +... + H n + 1)] = 1 n 2 [V (H 1) + V (H 2 ) +... + V (H n ) + V (1)] (since independent) = 1 [.5(.5) +.5(.5) +... +.5(.5)] (since variance of Bernoulli is p(1 p)) n2

Expected value, bias, consistency (relevant to homework) ˆπ = H 1 + H 2 +... + H n + 1 n Derive the variance of this estimator. Remember, Var(X 1 + X 2 ) = Var(X 1 ) + Var(X 2 ) + 2Cov(X 1, X 2 ) and Var(aX ) = a 2 Var(X )! ( ) H1 + H 2 +... + H n + 1 V (ˆπ) = V n = 1 n 2 [V (H 1 + H 2 +... + H n + 1)] = 1 n 2 [V (H 1) + V (H 2 ) +... + V (H n ) + V (1)] (since independent) = 1 [.5(.5) +.5(.5) +... +.5(.5)] (since variance of Bernoulli is p(1 p)) n2 = 1.25 [n(.5).5] = n2 n

Expected value, bias, consistency (relevant to homework) ˆπ = H 1 + H 2 +... + H n + 1 n E(ˆπ) = 0.5 + 1 n V (ˆπ) =.25 n Is this estimator consistent?

Expected value, bias, consistency (relevant to homework) ˆπ = H 1 + H 2 +... + H n + 1 n E(ˆπ) = 0.5 + 1 n V (ˆπ) =.25 n Is this estimator consistent? Yes. In large samples E(ˆπ) π and V (ˆπ) 0, so we will get arbitrarily close to the truth as the sample size grows.

Hypothesis Testing: Setup Copied from Brandon Stewart s lecture slides Goal: test a hypothesis about the value of a parameter. Statistical decision theory underlies such hypothesis testing.

Hypothesis Testing: Setup Copied from Brandon Stewart s lecture slides Goal: test a hypothesis about the value of a parameter. Statistical decision theory underlies such hypothesis testing. Trial Example: Suppose we must decide whether to convict or acquit a defendant based on evidence presented at a trial. There are four possible outcomes: Decision Convict Acquit Guilty Defendant Innocent

Hypothesis Testing: Setup Copied from Brandon Stewart s lecture slides Goal: test a hypothesis about the value of a parameter. Statistical decision theory underlies such hypothesis testing. Trial Example: Suppose we must decide whether to convict or acquit a defendant based on evidence presented at a trial. There are four possible outcomes: Guilty Decision Convict Correct Acquit We could make two types of errors: Defendant Innocent Correct

Hypothesis Testing: Setup Copied from Brandon Stewart s lecture slides Goal: test a hypothesis about the value of a parameter. Statistical decision theory underlies such hypothesis testing. Trial Example: Suppose we must decide whether to convict or acquit a defendant based on evidence presented at a trial. There are four possible outcomes: Defendant Guilty Innocent Decision Convict Correct Type I Error Acquit Correct We could make two types of errors: Convict an innocent defendant (type-i error)

Hypothesis Testing: Setup Copied from Brandon Stewart s lecture slides Goal: test a hypothesis about the value of a parameter. Statistical decision theory underlies such hypothesis testing. Trial Example: Suppose we must decide whether to convict or acquit a defendant based on evidence presented at a trial. There are four possible outcomes: Defendant Guilty Innocent Decision Convict Correct Type I Error Acquit Type II Error Correct We could make two types of errors: Convict an innocent defendant (type-i error) Acquit a guilty defendant (type-ii error)

Hypothesis Testing: Setup Copied from Brandon Stewart s lecture slides Goal: test a hypothesis about the value of a parameter. Statistical decision theory underlies such hypothesis testing. Trial Example: Suppose we must decide whether to convict or acquit a defendant based on evidence presented at a trial. There are four possible outcomes: Defendant Guilty Innocent Decision Convict Correct Type I Error Acquit Type II Error Correct We could make two types of errors: Convict an innocent defendant (type-i error) Acquit a guilty defendant (type-ii error) Our goal is to limit the probability of making these types of errors.

Hypothesis Testing: Setup Copied from Brandon Stewart s lecture slides Goal: test a hypothesis about the value of a parameter. Statistical decision theory underlies such hypothesis testing. Trial Example: Suppose we must decide whether to convict or acquit a defendant based on evidence presented at a trial. There are four possible outcomes: Defendant Guilty Innocent Decision Convict Correct Type I Error Acquit Type II Error Correct We could make two types of errors: Convict an innocent defendant (type-i error) Acquit a guilty defendant (type-ii error) Our goal is to limit the probability of making these types of errors. However, creating a decision rule which minimizes both types of errors at the same time is impossible. We therefore need to balance them.

Hypothesis Testing: Error Types Copied from Brandon Stewart s lecture slides Defendant Guilty Innocent Decision Convict Correct Type-I error Acquit Type-II error Correct Now, suppose that we have a statistical model for the probability of convicting and acquitting, conditional on whether the defendant is actually guilty or innocent.

Hypothesis Testing: Error Types Copied from Brandon Stewart s lecture slides Defendant Guilty Innocent Decision Convict Correct α Acquit Type-II error Correct Now, suppose that we have a statistical model for the probability of convicting and acquitting, conditional on whether the defendant is actually guilty or innocent. Then, our decision-making rule can be characterized by two probabilities: α = Pr(type-I error) = Pr(convict innocent)

Hypothesis Testing: Error Types Copied from Brandon Stewart s lecture slides Defendant Guilty Innocent Decision Convict Correct α Acquit β Correct Now, suppose that we have a statistical model for the probability of convicting and acquitting, conditional on whether the defendant is actually guilty or innocent. Then, our decision-making rule can be characterized by two probabilities: α = Pr(type-I error) = Pr(convict innocent) β = Pr(type-II error) = Pr(acquit guilty)

Hypothesis Testing: Error Types Copied from Brandon Stewart s lecture slides Defendant Guilty Innocent Decision Convict 1 β α Acquit β 1 α Now, suppose that we have a statistical model for the probability of convicting and acquitting, conditional on whether the defendant is actually guilty or innocent. Then, our decision-making rule can be characterized by two probabilities: α = Pr(type-I error) = Pr(convict innocent) β = Pr(type-II error) = Pr(acquit guilty) The probability of making a correct decision is therefore 1 α (if innocent) and 1 β (if guilty).

Hypothesis Testing: Error Types Copied from Brandon Stewart s lecture slides Defendant Guilty Innocent Decision Convict 1 β α Acquit β 1 α Now, suppose that we have a statistical model for the probability of convicting and acquitting, conditional on whether the defendant is actually guilty or innocent. Then, our decision-making rule can be characterized by two probabilities: α = Pr(type-I error) = Pr(convict innocent) β = Pr(type-II error) = Pr(acquit guilty) The probability of making a correct decision is therefore 1 α (if innocent) and 1 β (if guilty). Hypothesis testing follows an analogous logic, where we want to decide whether to reject (= convict) or fail to reject (= acquit) a null hypothesis (= defendant) using sample data.

Hypothesis Testing: Steps Copied from Brandon Stewart s lecture slides Null Hypothesis (H 0 ) False True Decision Reject 1 β α Fail to Reject β 1 α 1 Specify a null hypothesis H 0 (e.g. the defendant = innocent)

Hypothesis Testing: Steps Copied from Brandon Stewart s lecture slides Null Hypothesis (H 0 ) False True Decision Reject 1 β α Fail to Reject β 1 α 1 Specify a null hypothesis H 0 (e.g. the defendant = innocent) 2 Pick a value of α = Pr(reject H 0 H 0 ) (e.g. 0.05). This is the maximum probability of making

Hypothesis Testing: Steps Copied from Brandon Stewart s lecture slides Null Hypothesis (H 0 ) False True Decision Reject 1 β α Fail to Reject β 1 α 1 Specify a null hypothesis H 0 (e.g. the defendant = innocent) 2 Pick a value of α = Pr(reject H 0 H 0 ) (e.g. 0.05). This is the maximum probability of making a type-i error we decide to tolerate, and called the significance level of the test.

Hypothesis Testing: Steps Copied from Brandon Stewart s lecture slides Null Hypothesis (H 0 ) False True Decision Reject 1 β α Fail to Reject β 1 α 1 Specify a null hypothesis H 0 (e.g. the defendant = innocent) 2 Pick a value of α = Pr(reject H 0 H 0 ) (e.g. 0.05). This is the maximum probability of making a type-i error we decide to tolerate, and called the significance level of the test. 3 Choose a test statistic T, which is a function of sample data and related to H 0 (e.g. the count of testimonies against the defendant)

Hypothesis Testing: Steps Copied from Brandon Stewart s lecture slides Null Hypothesis (H 0 ) False True Decision Reject 1 β α Fail to Reject β 1 α 1 Specify a null hypothesis H 0 (e.g. the defendant = innocent) 2 Pick a value of α = Pr(reject H 0 H 0 ) (e.g. 0.05). This is the maximum probability of making a type-i error we decide to tolerate, and called the significance level of the test. 3 Choose a test statistic T, which is a function of sample data and related to H 0 (e.g. the count of testimonies against the defendant) 4 Assuming H 0 is true, derive the null distribution of T (e.g. standard normal)

Hypothesis Testing: Steps Copied from Brandon Stewart s lecture slides Null Hypothesis (H 0 ) False True Decision Reject 1 β α Fail to Reject β 1 α 5 Using the critical values from a statistical table, evaluate how unusual the observed value of T is under the null hypothesis: If the probability of drawing a T at least as extreme as the observed T is less than α, we reject H 0. (e.g. there are too many testimonies against the defendant for her to be innocent, so reject the hypothesis that she was innocent.) Otherwise, we fail to reject H 0. (e.g. there is not enough evidence against the defendant, so give her the benefit of the doubt.)

Hypothesis testing example: Our ages Suppose we are interested in whether the average age among Princeton students in graduate courses is 25. We assume our class is a representative sample (though this is a HUGE assumption). Go to http://tiny.cc/ygplfy and enter your age in years.

Hypothesis testing example: Our ages What is the null?

Hypothesis testing example: Our ages What is the null?h 0 : µ = 25 What is the alternative?

Hypothesis testing example: Our ages What is the null?h 0 : µ = 25 What is the alternative?h a : µ 25 What is the significance level?

Hypothesis testing example: Our ages What is the null?h 0 : µ = 25 What is the alternative?h a : µ 25 What is the significance level?α =.05 What is the test statistic?

Hypothesis testing example: Our ages What is the null?h 0 : µ = 25 What is the alternative?h a : µ 25 What is the significance level?α =.05 What is the test statistic?z = X 25 σ/ n What is the null distribution?

Hypothesis testing example: Our ages What is the null?h 0 : µ = 25 What is the alternative?h a : µ 25 What is the significance level?α =.05 What is the test statistic?z = X 25 σ/ n What is the null distribution? Z N(0, 1) What is are the critical values?

Hypothesis testing example: Our ages What is the null?h 0 : µ = 25 What is the alternative?h a : µ 25 What is the significance level?α =.05 What is the test statistic?z = X 25 σ/ n What is the null distribution? Z N(0, 1) What is are the critical values? 1.96 and 1.96 What is the rejection region?

Hypothesis testing example: Our ages What is the null?h 0 : µ = 25 What is the alternative?h a : µ 25 What is the significance level?α =.05 What is the test statistic?z = X 25 σ/ n What is the null distribution? Z N(0, 1) What is are the critical values? 1.96 and 1.96 What is the rejection region? We reject if Z < 1.96 or Z > 1.96.

Joint Distributions The joint distribution of X and Y is defined by a joint PDF f (x, y), or equivalently by a joint CDF F (x, y).

Join CDF visualization F(.5,.25) = P(X<.5, Y<.25) 1.00 0.75 Y 0.50 0.25 0.00 0.00 0.25 0.50 0.75 1.00 X

CDF practice problem 1 Modified from Blitzstein and Morris Suppose a < b, where a and b are constants (for concreteness, you could imagine a = 3 and b = 5). For some distribution with PDF f and CDF F, which of the following must be true? 1 f (a) < f (b) 2 F (a) < F (b) 3 F (a) F (b)

Joint CDF practice problem 2 Modified from Blitzstein and Morris Suppose a 1 < b 1 and a 2 < b 2. Show that F (b 1, b 2 ) F (a 1, b 2 ) + F (b 1, a 2 ) F (a 1, a 2 ) 0.

Joint CDF practice problem 2 Modified from Blitzstein and Morris Suppose a 1 < b 1 and a 2 < b 2. Show that F (b 1, b 2 ) F (a 1, b 2 ) + F (b 1, a 2 ) F (a 1, a 2 ) 0. = [F (b 1, b 2 ) F (a 1, b 2 )] + [F (b 1, a 2 ) F (a 1, a 2 )]

Joint CDF practice problem 2 Modified from Blitzstein and Morris Suppose a 1 < b 1 and a 2 < b 2. Show that F (b 1, b 2 ) F (a 1, b 2 ) + F (b 1, a 2 ) F (a 1, a 2 ) 0. = [F (b 1, b 2 ) F (a 1, b 2 )] + [F (b 1, a 2 ) F (a 1, a 2 )] = (something 0) + (something 0)

Joint CDF practice problem 2 Modified from Blitzstein and Morris Suppose a 1 < b 1 and a 2 < b 2. Show that F (b 1, b 2 ) F (a 1, b 2 ) + F (b 1, a 2 ) F (a 1, a 2 ) 0. = [F (b 1, b 2 ) F (a 1, b 2 )] + [F (b 1, a 2 ) F (a 1, a 2 )] = (something 0) + (something 0) 0

Marginalizing Y X

Correlation between sum and difference of dice Suppose you roll two dice and get numbers X and Y. What is Cov(X + Y, X Y )? Let s solve by simulation!

Correlation between sum and difference of dice

Correlation between sum and difference of dice draw.sum.diff <- function() { x <- sample(1:6,1) y <- sample(1:6,1) return(c(x+y,x-y)) } samples <- matrix(nrow=5000,ncol=2) colnames(samples) <- c("x","y") set.seed(08544) for (i in 1:5000) { samples[i,] <- draw.sum.diff() }

Sum and difference of dice: Plot

Sum and difference of dice: Plot ggplot(data.frame(samples), aes(x=x,y=y)) + geom_point() + scale_x_continuous(breaks=c(2:12), name="\nsum of dice") + scale_y_continuous(breaks=c(-5:5), name="difference of dice\n") + ggtitle("sum and difference\nof two dice") + theme(text=element_text(size=20)) + ggsave("sumdiff.pdf", height=4, width=5)

Sum and difference of dice: Plot 5 4 3 2 1 01 2 3 4 5 2 3 4 5 6 7 8 9 10 11 12 Sum of dice Difference of dice Sum and difference of two dice

Examples to clarify independence Blitzstein and Hwang, Ch. 3 Exercises 1 Give an example of dependent r.v.s X and Y such that P(X < Y ) = 1. 2 Can we have independent r.v.s X and Y such that P(X < Y ) = 1? 3 If X and Y are independent and Y and Z are independent, does this imply X and Z are independent?

Examples to clarify independence Blitzstein and Hwang, Ch. 3 Exercises 1 Give an example of dependent r.v.s X and Y such that P(X < Y ) = 1. Y N(0, 1), X = Y 1 2 Can we have independent r.v.s X and Y such that P(X < Y ) = 1? X Uniform(0, 1), Y Uniform(2, 3) 3 If X and Y are independent and Y and Z are independent, does this imply X and Z are independent? No. Consider X N(0, 1), Y N(0, 1), with X and Y independent, and Z = X.

The power of conditioning: Coins Suppose your friend has two coins, one which is fair and one which has a probability of heads of 3/4. Your friend picks a coin randomly and flips it. What is the probability of heads?

The power of conditioning: Coins Suppose your friend has two coins, one which is fair and one which has a probability of heads of 3/4. Your friend picks a coin randomly and flips it. What is the probability of heads? P(H) =

The power of conditioning: Coins Suppose your friend has two coins, one which is fair and one which has a probability of heads of 3/4. Your friend picks a coin randomly and flips it. What is the probability of heads? P(H) = P(H F )P(F ) + P(H F C )P(F C )

The power of conditioning: Coins Suppose your friend has two coins, one which is fair and one which has a probability of heads of 3/4. Your friend picks a coin randomly and flips it. What is the probability of heads? P(H) = P(H F )P(F ) + P(H F C )P(F C ) = 0.5(.05) +.75(.05) = 0.625

The power of conditioning: Coins Suppose your friend has two coins, one which is fair and one which has a probability of heads of 3/4. Your friend picks a coin randomly and flips it. What is the probability of heads? P(H) = P(H F )P(F ) + P(H F C )P(F C ) = 0.5(.05) +.75(.05) = 0.625 Suppose the flip was heads. What is the probability that the coin chosen was fair? P(F H) =

The power of conditioning: Coins Suppose your friend has two coins, one which is fair and one which has a probability of heads of 3/4. Your friend picks a coin randomly and flips it. What is the probability of heads? P(H) = P(H F )P(F ) + P(H F C )P(F C ) = 0.5(.05) +.75(.05) = 0.625 Suppose the flip was heads. What is the probability that the coin chosen was fair? P(F H) = P(H F )P(F ) P(H)

The power of conditioning: Coins Suppose your friend has two coins, one which is fair and one which has a probability of heads of 3/4. Your friend picks a coin randomly and flips it. What is the probability of heads? P(H) = P(H F )P(F ) + P(H F C )P(F C ) = 0.5(.05) +.75(.05) = 0.625 Suppose the flip was heads. What is the probability that the coin chosen was fair? P(F H) = P(H F )P(F ) P(H) = 0.5(0.5) 0.625) = 0.4

Exponential: log Uniform Figure credit: Wikipedia The Uniform distribution is defined on the interval (0, 1). Suppose we wanted a distribution defined on all positive numbers. Definition X follows an exponential distribution with rate parameter λ if X 1 λ log(u)

Exponential: log Uniform The exponential is often used for wait times. For instance, if you re waiting for shooting stars, the time until a star comes might be exponentially distributed. Key properties: Memorylessness: Expected remaining wait time does not depend on the time that has passed E(X ) = 1 λ V (X ) = 1 λ 2

Exponential-uniform connection Suppose X 1, X 2 iid Expo(λ). What is the distribution of X 1 X 1 +X 2?

Exponential-uniform connection Suppose X 1, X 2 iid Expo(λ). What is the distribution of X 1 X 1 +X 2? The proportion of the wait time that is represented by X 1 is uniformly distributed over the interval, so X 1 X 1 + X 2 Uniform(0, 1)

Gamma: Sum of independent Exponentials Figure credit: Wikipedia Definition Suppose we are waiting for a shooting stars, with the time between stars X 1,..., X a iid Expo(λ). The distribution of time until the ath shooting star is G a X i Gamma(a, λ) i=1

Gamma: Properties Properties of the Gamma(a, λ) distribution include: E(G) = a λ V (G) = a λ 2

Beta: Uniform order statistics Suppose we draw U 1,..., U k Uniform(0, 1), and we want to know the distribution of the jth order statistic, U (j). Using the Uniform-Exponential connection, we could also think of these U (j) as being the location of the jth Exponential in a series of k + 1 Exponentials. Thus, U (j) j i=1 X i j i=1 X i + k+1 i=j+1 X i Beta(j, k j + 1) This defines the Beta distribution. Can we name the distribution at the top of the fraction?

Beta: Uniform order statistics Suppose we draw U 1,..., U k Uniform(0, 1), and we want to know the distribution of the jth order statistic, U (j). Using the Uniform-Exponential connection, we could also think of these U (j) as being the location of the jth Exponential in a series of k + 1 Exponentials. Thus, U (j) j i=1 X i j i=1 X i + k+1 i=j+1 X i Beta(j, k j + 1) This defines the Beta distribution. Can we name the distribution at the top of the fraction? G j G j + G k j+1

What do Betas look like? Figure credit: Wikipedia

Poisson: Number of Exponential events in a time interval Figure credit: Wikipedia Definition Suppose the time between shooting stars is distributed X Expo(λ). Then, the number of shooting stars in an interval of time t is distributed Y t Poisson(λt)

Poisson: Number of Exponential events in a time interval Properties of the Poisson: If Y Pois(λt), then V (Y ) = E(Y ) = λt Number of events in disjoint intervals are independent

χ 2 n: A particular Gamma Definition We define the chi-squared distribution with n degrees of freedom as ( n χ 2 n Gamma 2, 1 ) 2 More commonly, we think of it as the sum of a series of independent squared Normals, Z 1,..., Z n iid Normal(0, 1): χ 2 n n i=1 Z 2 i

χ 2 n: A particular Gamma Figure credit: Wikipedia

Normal: Square root of χ 2 1 Figure credit: Wikipedia with a random sign Definition Z follows a Normal distribution if Z S χ 2 1, where S is a random sign with equal probability of being 1 or -1.

Normal: An alternate construction Note: This is far above and beyond what you need to understand for the course! Box-Muller Representation of the Normal Let U 1, U 2 iid Uniform. Then so that Z 1, Z 2 iid N(0, 1) What is this? Z 1 2 log U 2 cos(2πu 1 ) Z 2 2 log U 2 sin(2πu 1 ) Inside the square root: 2 log U 2 Expo ( 1 2 ) Gamma ( 2 2, 1 2 ) χ 2 2 Inside the cosine and sine: 2πU 1 is a uniformly distributed angle in polar coordinates, and the cos and sin convert this to the cartesian x and y components, respectively. Altogether in polar coordinates: The square root part is like a radius distributed χ Z, and the second part is an angle.