ECO 312 Fall 2013 Chris Sims LIKELIHOOD, POSTERIORS, DIAGNOSING NON-NORMALITY

Similar documents
EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Kinetics of Complex Reactions

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Exponential Families and Bayesian Inference

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Lecture 2: Monte Carlo Simulation

Topic 9: Sampling Distributions of Estimators

Chapter 6 Principles of Data Reduction

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Bayesian Methods: Introduction to Multi-parameter Models

MATH/STAT 352: Lecture 15

Stat 421-SP2012 Interval Estimation Section

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

11 Correlation and Regression

Topic 9: Sampling Distributions of Estimators

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Topic 9: Sampling Distributions of Estimators

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Power and Type II Error

Module 1 Fundamentals in statistics

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Lecture 19: Convergence

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Last Lecture. Wald Test

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

IE 230 Seat # Name < KEY > Please read these directions. Closed book and notes. 60 minutes.

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

NAME: ALGEBRA 350 BLOCK 7. Simplifying Radicals Packet PART 1: ROOTS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

Chapter 8: Estimating with Confidence

NUMERICAL METHODS FOR SOLVING EQUATIONS

STATISTICAL INFERENCE

CONFIDENCE INTERVALS STUDY GUIDE

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Lecture 7: Properties of Random Samples

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Statistics 511 Additional Materials

5. Likelihood Ratio Tests

Properties and Hypothesis Testing

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Stat 200 -Testing Summary Page 1

Chapter 6 Sampling Distributions

1.010 Uncertainty in Engineering Fall 2008

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Maximum Likelihood Estimation

Problem Set 4 Due Oct, 12

PRACTICE PROBLEMS FOR THE FINAL

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

4. Partial Sums and the Central Limit Theorem

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Estimation for Complete Data

MATH 10550, EXAM 3 SOLUTIONS

Math 10A final exam, December 16, 2016

Monte Carlo Integration

Lecture 1 Probability and Statistics

Approximations and more PMFs and PDFs

Parameter, Statistic and Random Samples

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

The Poisson Distribution

Questions and Answers on Maximum Likelihood

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Algebra of Least Squares

Most text will write ordinary derivatives using either Leibniz notation 2 3. y + 5y= e and y y. xx tt t

x = Pr ( X (n) βx ) =

Chapter 4. Fourier Series

Topic 18: Composite Hypotheses

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Distribution of Random Samples & Limit theorems

WEIGHTED LEAST SQUARES - used to give more emphasis to selected points in the analysis. Recall, in OLS we minimize Q =! % =!

1 Models for Matched Pairs

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Linear Regression Demystified

L = n i, i=1. dp p n 1

Statistical Pattern Recognition

Stat410 Probability and Statistics II (F16)

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Parameter, Statistic and Random Samples

Expectation and Variance of a random variable

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

MIDTERM 3 CALCULUS 2. Monday, December 3, :15 PM to 6:45 PM. Name PRACTICE EXAM SOLUTIONS

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Supplemental Material: Proofs

1 Inferential Methods for Correlation and Regression Analysis

The Method of Least Squares. To understand least squares fitting of data.

The Growth of Functions. Theoretical Supplement

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Chapter 2 The Monte Carlo Method

Binomial Distribution

Transcription:

ECO 312 Fall 2013 Chris Sims LIKELIHOOD, POSTERIORS, DIAGNOSING NON-NORMALITY (1) A distributio that allows asymmetry differet probabilities for egative ad positive outliers is the asymmetric double expoetial, with pdf { e αx if x 0 p(x α, β) e βx if x < 0 The symbol stads for is proportioal to To make this distributio itegrate to oe, we have to multiply the expressios above by αβ/(α + β) (Ad this is importat for geeratig the likelihood fuctio from the pdf) Suppose we have a iid sample {x 1, x } from this distributio (a) Write dow the likelihood for the sample Show that the sum of the positive x j s ( x + j ) ad the sum of the egative x j s ( x j ), as a pair, form a sufficiet statistic Because the sample is iid, the pdf is the product of the idividual observatio pdf s, ie p( x) = αβe αx+ j +βx j α + β j=1 = ( ) αβ e α x+ j β (x j α + β Sice this sample pdf depeds o the data x oly via the two umbers x + j ad x j, these are sufficiet statistics [Note that i geeral the pdf could deped o the data i other ways as well, so log as the pdf factors ito oe piece that depeds o the sufficiet statistics ad the ukow parameters, ad aother that does ot deped o the ukow parameters The part that does ot deped o the ukow parameters drops out whe we cosider the pdf as a fuctio of the parameters (ie as a likelihood) ad ormalize it, or its product with a prior pdf, to itegrate to oe] (b) The mode of this distributio is obviously x = 0 Calculate the mea of the distributio as a fuctio of α ad β [Hit: The mea of a expoetially distributed variable (with pdf αe αx o (0, )) is 1/α The mea of x is the mea of the positive part of x times the probability of x > 0 plus the mea of the egative part of x times the probability of x < 0] The expected value of x coditioial o x > 0 is the pdf over that regio, ormalized to itegrate to oe, ie the stadard expoetial with parameter α So the mea coditioal o x > 0 is 1/α The mea coditioal o x < 0 is, by the same argumet with sig reversed, 1/β The itegral of e αx over x > 0 is 1/α, ad the itegral of e βx over β < 0 is 1/β Therefore P[x > 0]/P[x < 0] = β/α, which c 2013 by Christopher A Sims This documet is licesed uder the Creative Commos Attributio-NoCommercial-ShareAlike 30 Uported Licese

2 LIKELIHOOD, POSTERIORS, DIAGNOSING NON-NORMALITY i tur implies P[x > 0] = β/(α + β) The ucoditioal mea of x is P[x > 0]E[x x > 0] + P[x < 0]E[x x < 0] = β α(α + β) α β(α + β) = β α αβ : (c) Calculate the maximum likelihood estimates of α ad β as a fuctio of the sufficiet statistics This ivolves solvig a pair of oliear equatios i two ukows, but they ca be solved by had You get the equatios to solve by fidig first order coditios for a maximum of the likelihood (or the log likelihood, which is a little more coveiet) The log likelihood is log(αβ) log(α + β) α x + j = β x j The first order coditios with respect to α ad β are α : α α + β x + j β : β α + β + x j If my algebra is right, the solutio is α = β = 1 + S + 1 + S, where S + = x + j, S = x j, ad = S S + (d) For the the Bad day o Wall Street daily per cet chage i the DJIA data of Stock ad Watso, the sum of the positive values is 300706 ad the sum of the egative values is -275526 The umber of observatios is 7561 Fid the correspodig maximum likelihood α ad β, as well as the implied mea retur Are these estimates cosistet with the idea that large egative values of the variable are more likely tha large positive oes? Pluggig the give values ito the equatios for the MLE s gives us α = 1284515, β = 1342048 They imply more rapid decay toward zero i the left (egative) tail of the distributio tha i the right So they cotradict the otio that large egative values are more likely tha large positive values The sample data of largest absolute value are egative, but there are oly a few of these i the distat tail Of course this result might also reflect the fact that the sample mea of the data is positive, ad if the model is to imply a positive mea retur, it must have α < β (e) Suppose we had a prior pdf o α ad β that made them idepedet, ad with the idetical pdf s 10e 10α, 10e 10β What would the posterior pdf be? what would be the values of α ad β that maximize the posterior pdf? [Hit:

LIKELIHOOD, POSTERIORS, DIAGNOSING NON-NORMALITY 3 The posterior pdf ca be put ito a form that looks just like the likelihood, but with altered values for x + j ad x j, so the same formulas you used to maximize the likelihood ca be re-used] With this prior pdf, the sample pdf times the prior pdf becomes ( ) αβ e α( x+ j +10)+β (x j 10 α + β As a fuctio of α ad β, this is just the origial likelihood with S + ad S both icreased i absolute value by 10 Pluggig these modified values of S + ad S ito the formulas for the MLE s, we get α = 1280164, β = 1337298 Sice the prior desity is mootoe decreasig away from zero, it is expected that the MLE s get pulled toward zero, but despite the rather strog priors, the estimates are chaged oly slightly The large sample makes the likelihood domiate ther prior eve though the prior probability of, eg, a β as large as the MLE is less tha 000002

4 LIKELIHOOD, POSTERIORS, DIAGNOSING NON-NORMALITY (2) We might also cosider a t distributio for the Bad Day o Wall Street data These data are available as a R data file bdwsrdata that ca be loaded ito R with the load() fuctio, or as a csv file that ca be loaded ito R with readcsv() or used i aother program If it is loaded by bdws <- load( bdwsrdata ), bdws$pctchg will be the percetage chage time series we are iterested i As displayed i class, the t pdf with ν degrees of freedom, locatio parameter µ, ad scale parameter σ, is give by p(x µ, σ, ν) = Γ((ν + 1))/2) Γ(ν/2)σ νπ ( 1 + 1 ν ( ) ) x µ 2 (ν+1)/2 σ [Note: If you use the R built-i facilities for computig desities, cdf s, ad quatiles of distributios, you do t eed to make ay use of the expressio above for this problem set] The pdf of a iid sample from this distributio does ot have a sufficiet statistic The maximum likelihood values for the parameters of this distributio for the bdws$pctchg data are µ = 004784383, σ = 072188439, ν = 335478224 (a) Make a ormal quatile-quatile plot (qqorm() does this i R) for the data, icludig a referece lie showig what the slope would be if the data were ormal The referece lie ca be produced with qqlie() i R Its default settigs simply calculate ad plot a straight lie through the two poits defied by the 25 ad 75 quatiles of the sample distributio ad referece distributio (which is by default ormal) (b) Make a quatile-quatile plot of the data agaist the maximum likelihood t distributio Commet o whether it looks better tha the ormal q-q plot, ad explai your coclusio This plot requires usig qqplot(), which requires that you specify the theoretical quatile fuctio to be used i the plot The R qt() fuctio produces quatiles of the t, but its oly parameter is the degrees of freedom ν To accout for the locatio ad scale parameters, you eed to give as the distributio argumet i qqlie fuctio(p) qt(p, df=335478224)*sig + mu The call to qqplot() should have x = qt(ppoits(n), df=335478224)*sig + mu, where N is the umber of observatios, legth(bdws$pctchg) The examples at the bottom of the R help page o qqplot may be helpful Here s the ormal qq plot, from qqorm(bdws$pctchg) followed by qqlie(bdw$pctchg)

LIKELIHOOD, POSTERIORS, DIAGNOSING NON-NORMALITY 5 Normal Q Q Plot Sample Quatiles 20 10 0 10 4 2 0 2 4 Theoretical Quatiles Here s the t qqplot The formula at the bottom shows how it was computed The default qqorm() ad qqplot() plot idividual poits, ot lies I used type="l" to make it a lie But this requires usig sort(bdws$pctchg) as the argumet rather tha bdws$pctchg itself, as otherwise the lies zigzag all over the place

6 LIKELIHOOD, POSTERIORS, DIAGNOSING NON-NORMALITY sort(bdws$pctchg) 20 10 0 10 15 10 5 0 5 10 15 qt(seq(05/n, 1, by = 1/N), df = 335478) * 0721884 + 0047843 (c) [extra credit] Create a plot cotaiig the estimated smoothed pdf for the data obtaied with desity(), the t desity with maximum likelihood parameters for the sample, ad the ormal desity with maximum likelihood parameters for the data (ie, mea(bdws$pctchg) ad sd(bdwd$pctchg)) Use differet colors for the three lies This is extra credit because gettig these fuctios scaled ad shifted so they all itegrate to oe requires some care ad thought Note that if p(x) is the desity of x, p(y/σ)/σ is the desity of y = xσ Turs out this is ot that hard to set up I used this R code: plot(desity(bdws$pctchg)) lies(seq(-10, 10, legth=500), dt((seq(-10, 10, legth=500) - 047843)/072188439, df=335478224)/72188439, col="red", lwd=3) The lwd=3 is to make the t desity plot a slightly fatter lie, so oe ca see that it lies almost perfectly o top of the smoothed sample pdf for the data

LIKELIHOOD, POSTERIORS, DIAGNOSING NON-NORMALITY 7 desitydefault(x = bdws$pctchg) Desity 00 01 02 03 04 05 20 10 0 10 N = 7571 Badwidth = 01204 Oe fial ote The largest egative value of the variable is -2563, or a deviatio from µ of 2568 The probability of this big a deviatio from µ i the egative directio is less tha 1 i 100,000, uder our estimated t distributio But we have over 7,000 observatios It turs out that the probability of seeig a deviatio this big i a sample this size is about 07 So the occurrece of this big egative value is slightly surprisig, but, i cotrast with what would be implied by a ormal distributio, ot at all impossible