GOV 2001/ 1002/ E-2001 Section 3 Theories of Inference

Similar documents
GOV 2001/ 1002/ Stat E-200 Section 1 Probability Review

From Model to Log Likelihood

1 Probability Distributions

Gov 2001: Section 4. February 20, Gov 2001: Section 4 February 20, / 39

ST417 Introduction to Bayesian Modelling. Conjugate Modelling (Poisson-Gamma)

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Matematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer

Lecture #11: Classification & Logistic Regression

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!

Statistical Methods in Particle Physics

Logit Regression and Quantities of Interest

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

review session gov 2000 gov 2000 () review session 1 / 38

Mathematical statistics

Primer on statistics:

Chapter 5. Bayesian Statistics

Lecture 1 Bayesian inference

One-parameter models

R Functions for Probability Distributions

Common ontinuous random variables

Logit Regression and Quantities of Interest

Probability and Estimation. Alan Moses

GOV 2001/ 1002/ E-2001 Section 10 1 Duration II and Matching

Foundations of Statistical Inference

Bayesian Econometrics

HPD Intervals / Regions

Theory of Inference: Homework 4

Math Review Sheet, Fall 2008

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Introduction to Machine Learning

Exam 3, Math Fall 2016 October 19, 2016

7. Estimation and hypothesis testing. Objective. Recommended reading

Lecture : Probabilistic Machine Learning

Introduction to Bayesian Methods

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Data Analysis and Uncertainty Part 2: Estimation

Computational Perception. Bayesian Inference

INTRODUCTION TO BAYESIAN METHODS II

Time Series and Dynamic Models

Lecture 13 Fundamentals of Bayesian Inference

18.05 Practice Final Exam

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you.

Frequentist Statistics and Hypothesis Testing Spring

Likelihood and Bayesian Inference for Proportions

Continuous random variables

Announcements. Proposals graded

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

CS 361: Probability & Statistics

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

g-priors for Linear Regression

Theory of Maximum Likelihood Estimation. Konstantin Kashin

CPSC 340: Machine Learning and Data Mining

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Foundations of Statistical Inference

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

TMA4265: Stochastic Processes

STA 2201/442 Assignment 2

EXAMPLE: INFERENCE FOR POISSON SAMPLING

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Likelihood and Bayesian Inference for Proportions

EE/CpE 345. Modeling and Simulation. Fall Class 10 November 18, 2002

Bayesian Decision and Bayesian Learning

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Principles of Bayesian Inference

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Bayesian Inference for Normal Mean

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

Inference for a Population Proportion

EE/CpE 345. Modeling and Simulation. Fall Class 9

STAT Chapter 5 Continuous Distributions

Probability models for machine learning. Advanced topics ML4bio 2016 Alan Moses

Applied Statistics for the Behavioral Sciences

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

Hypothesis Testing: The Generalized Likelihood Ratio Test

Likelihoods. P (Y = y) = f(y). For example, suppose Y has a geometric distribution on 1, 2,... with parameter p. Then the pmf is

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

Robustness and Distribution Assumptions

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Review of Probabilities and Basic Statistics

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Stat 451 Lecture Notes Simulating Random Variables

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Density Estimation. Seungjin Choi

Exam 2 Practice Questions, 18.05, Spring 2014

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Confidence Intervals for Normal Data Spring 2018

David Giles Bayesian Econometrics

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

ST495: Survival Analysis: Hypothesis testing and confidence intervals

This does not cover everything on the final. Look at the posted practice problems for other topics.

Principles of Bayesian Inference

Transcription:

GOV 2001/ 1002/ E-2001 Section 3 Theories of Inference Solé Prillaman Harvard University February 11, 2015 1 / 48

LOGISTICS Reading Assignment- Unifying Political Methodology chs 2 and 4. Problem Set 3- Due by 6pm next Wednesday on Canvas. Assessment Question- Due by 6pm on Wednesday the 25th on Canvas. You must work alone and only one attempt. 2 / 48

CANVAS What to do when you think you have the right answer, but Canvas marks it wrong: Double check your code and make sure you think your answer is right. Email us - we are happy to tell you if something went wrong in Canvas Do nothing - we will double check all of your answers in Canvas We are going to do everything we can to minimize these issues. 3 / 48

REPLICATION PAPER 1. Read Publication, Publication 2. Find a coauthor. See the Canvas discussion board to help with this. 3. Choose a paper based on the crieria in Publication, Publication. 4. Have a classmate sign-off on your paper choice. 5. Upload the paper or papers to Canvas quiz by 2/25. 4 / 48

REPLICATION PAPER Everyone will select a paper to replicate and will replicate the tables and figures from this paper. Due March 25. Extension school students may work with a coauthor for this replication, but they must be from the extension school (since you are not going to write a paper based on this). You will not write an original research paper. Instead you will take the final. The final will be a take-home exam, structured identically to the problem sets. You will have one week to complete it and this will be during the first week of May. If you would rather write the paper then take the final, email Gary, Stephen, and myself. If you want to do this, you will have to find a coauthor and commit completely to the paper. 5 / 48

DISCUSSION FORUM Everyone should change their notification preferences in Canvas! This makes sure that when updates are made on the discussion board, you know about them. Go to Settings in the top right, then under Ways to Contact you make sure your correct email address is there. Go to Settings in the top right, and then Notifications, and choose Notify me right away for Discussion and Discussion Post. 6 / 48

MODEL SET-UP Y i f N (µ i, σ 2 ) µ i = βx i Y i f N (µ, σ 2 i ) σ 2 i = βx i Y i f N (µ i, σ 2 i ) µ i = βx 1i σ 2 i = αx 2i 7 / 48

QUIZ BREAK! If P(Y i = y i ) = ( n y i ) π y i (1 π i ) n y i and π i is a function of x i and β, how do we write out the systematic and stochastic components of this model? 8 / 48

BOOTSTRAPPED STANDARD ERRORS We can estimate the sampling distribution of our parameters using bootstrapping. 1. Start with observed sample and model. 2. Determine the quantity of interest (the estimand). 3. Choose an estimator. 4. Take a re-sample of size n (with replacement) from the sample and calculate the estimate of the parameter using the estimator. 5. Repeat step 4 many times, say 1000, each time storing the estimate of the parameter. 6. Calculate the standard deviation of your 1000 estimates of the parameter. 9 / 48

OUTLINE Likelihood Inference Bayesian Inference Neyman-Pearson Inference 10 / 48

INFERENCE For today we will only think about the univariate case: Y f (θ) We want to use data that we have observed values of y to infer what θ is. How do we do this? 11 / 48

LIKELIHOOD: AN EXAMPLE Ex. Waiting for the Redline How long will it take for the next T to get here? 12 / 48

LIKELIHOOD: AN EXAMPLE Y is a Exponential random variable with parameter λ. f (y) = λe λy Exponential Distribution f(y) 0.00 0.05 0.10 0.15 0.20 0.25 0 5 10 15 y 13 / 48

WHAT IS LIKELIHOOD? Last week we assumed λ =.25 to get the probability of waiting for the redline for Y mins, i.e. we set λ to look at the probability of drawing some data. p(y λ) =.25e.25y λ =.25 data. p(y λ =.25) =.25e.25y p(2 < y < 10 λ) =.525 So we can calculate p(y λ). 14 / 48

WHAT IS LIKELIHOOD? BUT, what we care about estimating is λ! Given our observed data, what is the probability of getting a particular λ? data λ. p(λ y) =? f(x) 0.04 0.05 0.06 0.07 0.08 0.09 0.10 f(x) 0.0 0.2 0.4 0.6 0.8 1.0 f(x) 0.0 0.5 1.0 1.5 2.0 0 2 4 6 8 10 y 0 2 4 6 8 10 y 0 2 4 6 8 10 y 15 / 48

LIKELIHOOD AND BAYES RULE Why is Bayes rule important? Often we want to know P(θ data). But what we do know is P(data θ). We ll be able to infer θ by using a variant of Bayes rule. Bayes Rule: P(θ y) = P(y θ)p(θ) P(y) 16 / 48

LIKELIHOOD The whole point of likelihood is to leverage information about the data generating process into our inferences. Here are the basic steps: 1. Think about your data generating process. (What do the data look like? Use your substantive knowledge.) 2. Find a distribution that you think explains the data. (Poisson, Binomial, Normal, Something else?) 3. Derive the likelihood. 4. Plot the likelihood. 5. Maximize the likelihood to get the MLE. Note: This is the case in the univariate context. We ll be introducing covariates later on in the term. 17 / 48

DATA GENERATING PROCESS AND DISTRIBUTION OF Y Y is a Exponential random variable with parameter λ. f (y λ) = λe λy Exponential Distribution f(y) 0.00 0.05 0.10 0.15 0.20 0.25 0 5 10 15 y So our model is just: Y F exp (λ) 18 / 48

DERIVE THE LIKELIHOOD From Bayes Rule: Let p(λ y) = p(y λ)p(λ) p(y) k(y) = p(λ) p(y) Likelihoodists assume that θ is fixed and Y is random. So k(y) is an unknown constant. The λ in k(y) is the true λ, a constant that doesn t vary. So k(y) is just a function of y and therefore a constant since we know y (its our data). 19 / 48

DERIVE THE LIKELIHOOD p(λ y) = p(y λ)p(λ) p(y) L(λ y) = p(y λ)k(y) p(y λ) On Monday it took 12 minutes for the train to come. L(λ y 1 ) p(y 1 λ) λe λ y 1 λe λ 12 20 / 48

THE LIKELIHOOD THEORY OF INFERENCE P(θ y) = p(θ)p(y θ) p(y) = k(y)p(y θ) L(θ y) = k(y)p(y θ) p(y θ) L(θ y) p(y θ) is the Likelihood Axiom. What is p(y θ)? 21 / 48

QUIZ BREAK! Why is the Likelihood only a measure of relative uncertainty and not absolute uncertainty? How do we interpret L(θ y)? Comparing values of L(θ y) across data sets is meaningless. Why? 22 / 48

PLOT THE LIKELIHOOD First, note that we can take advantage of a lot of pre-packaged R functions rbinom, rpoisson, rnorm, runif gives random values from that distribution pbinom, ppoisson, pnorm, punif gives the cumulative distribution (the probability of that value or less) dbinom, dpoisson, dnorm, dunif gives the density (i.e., height of the PDF useful for drawing) qbinom, qpoisson, qnorm, qunif gives the quantile function (given quantile, tells you the value) 23 / 48

PLOT THE LIKELIHOOD We want to plot L(λ y) λe λ 12 1. We can do this easily since we know that y is exponentially distributed. # dexp provides the PDF of the exponential distribution # If we set x to be 12 and lambda to be.25, # then p(y lambda) = dexp(x = 12, rate =.25, log=false) [1] 0.01244677 # To plot the likelihood, we can plot the PDF for different # values of lambda curve(dexp(12, rate = x), xlim = c(0,1), xlab = "lambda", ylab = "likelihood") 24 / 48

PLOT THE LIKELIHOOD 2. Or we can write a function for our likelihood and plot that function. # Write a function for L( lambda y) # The inputs are lambda and y (our data) likelihood <- function(lambda, y){ lambda * exp(-lambda * y) } # We can use curve to plot this likelihood function curve(likelihood(lambda = x, y = 12), xlim = c(0,1), xlab = "lambda", ylab = "likelihood") 25 / 48

MAXIMIZE THE LIKELIHOOD likelihood 0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.0 0.2 0.4 0.6 0.8 1.0 lambda What do you think the maximum likelihood estimate will be? 26 / 48

SOME PERSPECTIVE What does this likelihood represent? This is the likelihood of λ give one data point of Y and our assumed model ( Y F exp (λ)). We do not have a systematic component, because we do not have any covariates (x). Next week we will discuss likelihood inference when you have (a) covariates and (b) multiple observations! Stay tuned! 27 / 48

OUTLINE Likelihood Inference Bayesian Inference Neyman-Pearson Inference 28 / 48

LIKELIHOODS AND BAYESIAN POSTERIORS Likelihood: p(λ y) = p(λ)p(y λ) p(y) L(λ y) = k(y)p(y λ) p(y λ) There is a fixed, true value of λ. We use the likelihood to estimate λ with the MLE. Bayesian Posterior Density: p(λ y) = p(λ)p(y λ) p(y) p(λ)p(y λ) λ is a random variable and therefore has fundamental uncertainty. We use the posterior density to make probability statements about λ. 29 / 48

UNDERSTANDING THE POSTERIOR DENSITY In Bayesian inference, we have a prior subjective belief about λ, which we update with the data to form posterior beliefs about λ. p(λ y) p(λ)p(y λ) p(λ y) is the posterior density p(λ) is the prior density p(y λ) is proportional to the likelihood 30 / 48

BAYESIAN INFERENCE The whole point of bayesian inference is to leverage information about the data generating process along with subjective beliefs about our parameters into our inference. Here are the basic steps: 1. Think about your subjective beliefs about the parameters you want to estimate. 2. Find a distribution that you think explains your prior beliefs of the parameter. 3. Think about your data generating process. 4. Find a distribution that you think explains the data. 5. Derive the posterior distribution. 6. Plot the posterior distribution. 7. Summarize the posterior distribution. (posterior mean, posterior standard deviation, posterior probabilities) 31 / 48

SETTING THE PRIOR Continuing from our example about the redline, we believe that the rate at which trains arrive is distributed according to the Gamma distribution with known parameters α and β. That is: p(λ) = βα Γ(α) λα 1 e βλ α= 1 and β= 2 α= 3 and β= 2 p(λ) 0.0 1.0 2.0 0 1 2 3 4 5 p(λ) 0.0 0.3 0 1 2 3 4 5 λ λ 32 / 48

DERIVE THE POSTERIOR p(λ y) p(λ)p(y λ) βα Γ(α) λα 1 e βλ λe λ 12 λ α 1 e βλ λe λ 12 dropping constants λ α e βλ λ 12 combining terms λ α e λ(β+12) So p(λ y) Gamma(α + 1, β + 12) 33 / 48

PLOT THE POSTERIOR We want to plot p(λ y) 1. We can do this easily since we know that λ y is distributed gamma. # dgamma provides the PDF of the gamma distribution # To plot the posterior, we can plot the PDF for different # values of lambda # Let s set alpha = 1 and beta = 2 curve(dgamma(x, shape = 2, rate = 14), xlim = c(0,1), xlab = "lambda", ylab = "posterior") 34 / 48

PLOT THE POSTERIOR 2. Or we can do this without knowing the exact distribution of our posterior and plot a proportional distribution. # Write a function for p( lambda y) # The inputs are lambda, y (our data), beta, and alpha posterior.propto <- function(lambda, y, beta, alpha){ lambdaˆalpha * exp(-lambda * (beta + y)) } # We can use curve to plot this likelihood function # Let s set alpha = 1 and beta = 2 curve(posterior.propto(lambda = x, y = 12, alpha = 1, beta = 2), xlim = c(0,1), xlab = "lambda", ylab = "posterior") 35 / 48

PLOT THE POSTERIOR Posterior Density Density 0 1 2 3 4 5 0.0 0.2 0.4 0.6 0.8 1.0 λ 36 / 48

SUMMARIZE THE POSTERIOR DISTRIBUTION What is the mean of the posterior distribution? 1. We can simulate to get the mean of the posterior # Again let s assume that alpha is 1 and beta is 2 sim.posterior <- rgamma(10000, shape = 2, rate = 14) mean(sim.posterior) 2. Or we can calculate this analytically because the mean of a gamma is α β mean <- (1 + 1)/(2 + 12) mean 37 / 48

SUMMARIZE THE POSTERIOR DISTRIBUTION What is the probability that λ is between 0.2 and 0.4? 1. We can integrate our proportional function and divide this by the total area from our proportional function density1 <- integrate(posterior.propto, alpha = 1, beta = 2, y = 12, lower =.2, upper =.4) total.density <- integrate(posterior.propto, alpha = 1, beta = 2, y = 12, lower = 0, upper = 100) density1$value / total.density$value 2. Or we can calculate this directly because we know the functional form of the posterior: pgamma(.4, shape = 2, rate = 14) - pgamma(.2, shape = 2, rate = 14) 38 / 48

OUTLINE Likelihood Inference Bayesian Inference Neyman-Pearson Inference 39 / 48

HYPOTHESIS TESTING 1. Set your null hypothesis- the assumed null state of the world. ex. H 0 : β = c 2. Set your alternative hypothesis- the claim to be tested. ex. H A : β c 40 / 48

HYPOTHESIS TESTING 3. Choose your test statistic (estimator)- function of the sample and the null hypothesis used to test your hypothesis. ex. ˆβ c ˆ SE[ ˆβ] 4. Determine the null distribution- the sampling distribution of the test statistic assuming that the null hypothesis is true. ex. ˆβ c ˆ SE[ ˆβ] t n k 1 41 / 48

HYPOTHESIS TESTING How do we choose our estimator? By the CLT, if n is large, then ˆβ N(β, σ 2 n i=1 (x i x) 2 ). If we standardize, we get that Z = ˆβ β N(0, 1) SE[ ˆβ] If we estimate σ 2, we get that Z = ˆβ β ˆ SE[ ˆβ] t n k 1 We can plug in c for β since we are assuming the null hypothesis is true. 42 / 48

HYPOTHESIS TESTING 5. Choose your significance level, α. This is the accepted amount of Type 1 Error. 6. Set your decision rule and crititcal value- the function that specifies where we accept or reject the null hypothesis for a given value of the test statistic; a function of α. ex. reject null if ˆβ c ˆ t 1 α/2,n k 1 SE[ ˆβ] 43 / 48

HYPOTHESIS TESTING How do we get t 1 α/2,n k 1? ## alpha =.05 ## 9 degrees of freedom qt(1-.025, df=9) Density 0.0 0.2 0.4 PDF of the t Distribution (9 degrees of freedom) 95% 4 2 0 2 4 t 44 / 48

HYPOTHESIS TESTING 6. Calculate the rejection region- the set of values for which we will reject the null hypothesis. ex. [, t 1 α/2,n k 1 ] and [t 1 α/2,n k 1, ] 7. Using your sample, calculate your observed estimate. 8. Determine if your estimate is within the rejection region and draw conclusion about hypothesis. ex. Reject null Fail to reject null if if ˆβ c ˆ ˆβ c SE[ ˆ ˆβ] SE[ ˆβ] [, t 1 α/2,n k 1] [t 1 α/2,n k 1, / [, t 1 α/2,n k 1 ] [t 1 α/2,n k 1, 45 / 48

P-VALUES p-value- Assuming that the null hypothesis is true, the probability of getting something at least as extreme as our observed test statistic, where extreme is defined in terms of the alternative hypothesis. How do we find p-values? # create a population with mean 12, sd 4 pop <- rnorm(n=1e6, mean=12, sd=4) # draw a single sample of size 100 from population my.sample <- sample(pop, size=100, replace=t) # calculate our test statistic # this is a test statistic for the sample mean # as an estimator of the true mean (which we set to be 12) test.statistic <- mean(my.sample) /(sd(my.sample)/10) # find the p-value p.value <- 2*(1-pnorm(test.statistic)) 46 / 48

P-VALUES 47 / 48

QUESTIONS Questions? 48 / 48