Inference and Regression

Similar documents
Inference and Regression

ECONOMETRICS I. Assignment 5 Estimation

IIT JAM : MATHEMATICAL STATISTICS (MS) 2013

Inference and Regression

Probability and Statistics Notes

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Preliminary Statistics. Lecture 3: Probability Models and Distributions

First Year Examination Department of Statistics, University of Florida

Inference and Regression

1. Point Estimators, Review

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Class 8 Review Problems solutions, 18.05, Spring 2014

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Practice Problems Section Problems

Lecture 2: Repetition of probability theory and statistics

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Solutions to the Spring 2015 CAS Exam ST

Master s Written Examination

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Question Points Score Total: 76

Problem Selected Scores

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

1. Let X be a random variable with probability density function. 1 x < f(x) = 0 otherwise

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Financial Econometrics

Probability and Distributions

Department of Statistical Science FIRST YEAR EXAM - SPRING 2017

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

ACM 116: Lectures 3 4

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

1 Review of Probability and Distributions

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Statistics for scientists and engineers

Density Estimation: ML, MAP, Bayesian estimation

Introduction to Simple Linear Regression

Sampling Distributions

Stat 5101 Notes: Brand Name Distributions

Homework 4 Solution, due July 23

Final Examination. STA 215: Statistical Inference. Saturday, 2001 May 5, 9:00am 12:00 noon

SPRING 2007 EXAM C SOLUTIONS

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

MAS3301 Bayesian Statistics

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Statistics (1): Estimation

Algorithms for Uncertainty Quantification

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

Institute of Actuaries of India

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

Eco517 Fall 2014 C. Sims FINAL EXAM

Sampling Distributions

, find P(X = 2 or 3) et) 5. )px (1 p) n x x = 0, 1, 2,..., n. 0 elsewhere = 40

Math 362, Problem set 1

Stat 5102 Final Exam May 14, 2015

Econometric Analysis of Panel Data. Final Examination: Spring 2013

STAT 515 MIDTERM 2 EXAM November 14, 2018

Gaussians Linear Regression Bias-Variance Tradeoff

Exam C Solutions Spring 2005

STATISTICS SYLLABUS UNIT I

Lecture 1: August 28

ENGG2430A-Homework 2

2 Functions of random variables

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

Petter Mostad Mathematical Statistics Chalmers and GU

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

MAS223 Statistical Inference and Modelling Exercises

Course 4 Solutions November 2001 Exams

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

CDA5530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables

This exam contains 6 questions. The questions are of equal weight. Print your name at the top of this page in the upper right hand corner.

Continuous Distributions

Linear Models A linear model is defined by the expression

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Exponential Families

Statistics 3657 : Moment Approximations

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

Theory of Statistics.

STAB57: Quiz-1 Tutorial 1 (Show your work clearly) 1. random variable X has a continuous distribution for which the p.d.f.

Exam 2 Practice Questions, 18.05, Spring 2014

A Very Brief Summary of Statistical Inference, and Examples

Learning Bayesian network : Given structure and completely observed data

SDS 321: Practice questions

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Review of Probabilities and Basic Statistics

5 Operations on Multiple Random Variables

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Chapter 5 continued. Chapter 5 sections

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Good luck! Problem 1. (b) The random variables X 1 and X 2 are independent and N(0, 1)-distributed. Show that the random variables X 1

Central Limit Theorem ( 5.3)

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

MATH/STAT 3360, Probability Sample Final Examination Model Solutions

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Transcription:

Inference and Regression Assignment 3 Department of IOMS Professor William Greene Phone:.998.0876 Office: KMC 7-90 Home page:www.stern.nyu.edu/~wgreene Email: wgreene@stern.nyu.edu Course web page: www.stern.nyu.edu/~wgreene/econometrics/econometrics.htm. The random variable X has mean µ x and standard deviation σ x. The random variable Y has mean µ y and standard deviation σ y. If X and Y are independent, find E[ XY ], Var[ XY ] and the standard deviation of XY. If they are independent, E[XY] E[X]E[Y] µ x µ y. Var[XY] E[(XY) ] {E[XY]} E[X Y ] (E[X]E[Y]) E[X ]E[Y ] (E[X]) (E[Y]) (σ x + µ x )(σ y + µ y ) - µ x µ y σ x σ y + µ x µ y + σ x µ y + σ y µ x - µ x µ y σ x σ y + σ x µ y + σ y µ x The standard deviation is the square root of this.. The number of auto vandalism claims reported per month at Sunny Daze Insurance Company (SDIC) has mean 00 and standard deviation 0. The individual losses have mean $,00/claim and standard deviation $80/claim. The number of claims and the amounts of individual losses are independent. Using the normal approximation, calculate the probability that SDIC s aggregate auto vandalism losses reported for a month will be less than $0,000. Aggregate losses are claims times loss per claim. Expected value is the product of the means, 00claims $,00/claim $0,000 Variance based on question s 0 80 + 0,00 + 80 00 64,560,000 Standard deviation 5,349. Prob[ losses < 0,000] Prob[z < (0,000 0,000)/5,349] Prob[z < -.394 ].347. 3. Let x be the average of a sample of 6 independent normal random variables with mean 0 and variance. Determine c such that Prob( x < c).5. Prob( x-bar < c) implies that the probability that x-bar is less than c is.5 (and the probability that is it is greater than c is.5). X-bar is normally distributed with mean /sqr(6) ¼.5. So, Prob(x-bar < -c) Prob[(xbar 0)/.5 < -c/.5).5. From the normal table, -c/.5 -.675. So, 4c.675 or c.6875.

4. Show that if X ~ F n,m, then Y /X ~ F m,n. It is possible to do this problem by brute force, using a change of variable and the density of F n,m. But, the result follows trivially from the definition of F. F n,m [chi-squared(n) / n] / [chi-squared(m) / m]. Then, /F n,m [chi-squared(m) / m] / [chi-squared(n) / n] 5. Show that if T ~ t n, then T ~ F,n. T t n T Normal(0,) / sqr[chi-squared(n)/n] [Normal(0,)] / chi-squared(n)/n. The key now is to note that the square of a standard normal is chi-squared with degree of freedom. The definition of F,n then follows immediately. 6. Suppose that X,X,,X n are i.i.d. random variables on the interval [0,] with density Γ(3 α) α α f( x α ) x ( x), α> 0, 0 x. ΓαΓ ( ) ( α) The parameter α is to be estimated from the sample. It can be shown that E[X] /3 and Var[X]/[9(3α+)]. a. How could the method of moments be used to estimate α? b. What equation does the MLE of α satisfy. (I.e., what is the likelihood equation?) c. What is the asymptotic variance of the MLE of α? d. Find a sufficient statistic for α. a. You could not use the mean in the method of moments since E[X] does not depend on α. So, use the variance. Equate s to /[9(3α+)]. Solving, /s 9(3α+) or (/9)/s 3α +, or [/(9s ) ]/3 α b. The log likelihood would be the sum of the logs of the density, logl nlogγ(3α) - nlogγ(α) nlogγ(α) + (α-)σ i logx i + (α-)σ i log(-x i ). logl/ α 3nΨ(3α) - nψ(α) - nψ(α) + Σ i logx i - Σ i log(-x i ) 0. c. (Ouch!). Differentiate it again. logl/ α 9nΨ (3α) - nψ (α) 4nΨ (α) H The asymptotic variance is -/H. Since H does not involve x i, we don t need to take the expected value. d. There are two sufficient statistics that are jointly sufficient for α. The density is an exponential family, so we can see immediately that the sufficient statistics are Σ i logx i and Σ i log(-x i ). The reason there are two statistics and one parameter is that this is a version of the beta distribution which generally has two parameters, α and β and the two sufficient statistics shown. This distribution forces β to equal α. 7. Suppose that X,X,,X n are an i.i.d. random sample from a Rayleigh distribution with parameter > 0, f(x ) x exp[ x / ( )], x 0. a. Find a method of moments estimator of. b. Find the MLE of. c. Find the asymptotic variance of the MLE of.

a. First find the expected value: x E[x] exp( x / ( ) dx x exp( ax ) dx, where a /( ) 0 0 This is a gamma integral. There is a formula for this integral, but it is revealing to work through it. Make a change of variable to z x so x dx (/)z /. Make the change of variable to get / / x exp( ax ) dx z exp( az)z dz a z exp( az) dz 0 0 0 Γ(3 / ) / π a a ( / ) Γ (/ ) π. 3/ a Since this is the mean, the method of moments estimator is ˆ x. π b. The log likelihood is the sum of the logs of the densities N N xi logl log x log n i i log L n N The likelihood equation is + x 0. 3 i The solution is ˆ Σ x n N i c. Differentiate the log likelihood function again log L n 3 n x. 4 i We need the expected value of this. Superficially, this needs a solution for the expected square of x. But, we have a trick available to us. i The expected value of the first derivative is zero. So, solving the moment n equation, we find E x. Use this to find the expected value i n n 3 4n of the second derivative. This is n. The negative of the 4 inverse of the second derivative gives the asymptotic variance,. 4n 8. In the application of Bayesian estimation of a proportion of defectives that we examined in class and that is detailed in Notes-3, we used a uniform prior on [0,] for the distribution of. Suppose we assume, instead, an informative, beta prior with parameters α 3 and β 7. Repeat the analysis of the problem to find the Bayesian estimator of given this prior. (Hint: The beta distribution is a conjugate prior for the likelihood function given in the problem.) Consider estimation of the probability that a production process will produce a defective product. In case, suppose the sampling design is to choose N 5 items from the production line and count the number of defectives. If the probability that any item is defective is a constant between zero and one, then the likelihood for the sample of data is z and

L( data) D ( ) 5 D, where D is the number of defectives, say, 8. The maximum likelihood estimator of will be p D/5 0.3, and the asymptotic variance of the maximum likelihood estimator is estimated by p( p)/5 0.008704. Now, consider a Bayesian approach to the same analysis. The posterior density is obtained by the following reasoning: p(, data) p(, data) p( data ) p( ) p( data) p( data) p(, data) d p( data) Likelihood( data ) p( ) p( data) where p() is the prior density assumed for. [We have taken some license with the terminology, since the likelihood function is conventionally defined as L( data).] Inserting the results of the sample first drawn, we have the posterior density: D N D ( ) p( ) p( data ). D N D ( ) p( ) d Γ (3 + 7) 3 7 With the beta(3,7) prior, p( ) ( ). The posterior is Γ(3) Γ(3) D N DΓ (3 + 7) 3 7 ( ) ( ) Γ(3) Γ(7) p( data) D N DΓ (3 + 7) 3 7 ( ) ( ) d Γ(3) Γ(7) D+ 3 N D+ 7 ( ) D+ 3 N D+ 7 ( ) d Γ ( D+ 3 + N D+ 7) D+ 3 N D+ ( ) Γ ( D+ 3) Γ( N D+ 7) This is the density of a random variable with a beta distribution with parameters (α, β) ( D+3, N D+7). The mean of this random variable is (D+3)/(N+0) /35 0.343 (as opposed to 0.3, the MLE). The posterior variance is [(D+3)/(N D+7) ]/ [(N + ) (N + 0) ] 0.000004. 7.

9. The angle at which electrons are emitted in muon decay has a distribution with density f(x α) ( + αx)/, - < x < and - < α < where x cos. The parameter α is related to polarization. Physical considerations dictate that α < /3, but f(x α) is a density for α <. The method of moments may be used to estimate α from a sample of experimental measurements, x,,x n. The mean of the random variable x is E[x] µ x( +α x) dx α/ 3. Thus, the method of moments of α is α ˆ 3x. a. Show that E[ ˆα ] α. That is, the estimator is unbiased. If E[x] µ α/3, then by random sampling results, E[ x ] E[x] µ so E[3 x ] 3µ α. b. Show that Var[ ˆα ](3-α )/n. (Hint: what is Var[ x ]?) Use integration to find E[x ] to get Var[x] E[x ] - µ (/3 - α /9). The variance of [3 x ] 9Var[x]/n (3 - α )/n. c. Use the central limit theorem to deduce a normal approximation to the sampling distribution of ˆα. ˆα will be asymptotically normally distributed with mean α and variance (3 - α )/n. d. According to this approximation, if n 5 and α 0, what is Prob( ˆα >.5)? If n 5 and α 0, then the mean of the asymptotic distribution will be 0 and the variance will be 3/5. The probability that a normal variable with mean 0 and standard deviation sqr(3)/5.35 is >.5 or < -.5 is times the probability that the normal variable is greater than.5 or times the probability that the the standard normal variable is greater than (.5-0)/.35.43 *.07636.57. d. Show how you would obtain the MLE of α. (Note, there is no simple direct solution for the MLE.) n i [ x ] ln L ln + ln( +α ) ln L n x The likelihood equation to be solved is 0 i α ( + αx)