Inference and Regression

Similar documents
Inference and Regression

Econometrics I. Professor William Greene Stern School of Business Department of Economics 1-1/40. Part 1: Introduction

Discrete Choice Modeling

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Statistics and Data Analysis

Econometric Analysis of Panel Data. Final Examination: Spring 2018

Binary Logistic Regression

Econometric Analysis of Panel Data. Final Examination: Spring 2013

Institute of Actuaries of India

Final Exam. Name: Solution:

ECON Interactions and Dummies

Discrete Choice Modeling

Inference and Regression

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Midterm 2 - Solutions

Economics 671: Applied Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

Final Exam - Solutions

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Truncation and Censoring

ISQS 5349 Final Exam, Spring 2017.

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics

Data Analysis 1 LINEAR REGRESSION. Chapter 03

WHAT IS HETEROSKEDASTICITY AND WHY SHOULD WE CARE?

Econometric Analysis of Panel Data. Assignment 1

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.

University of California, Berkeley, Statistics 131A: Statistical Inference for the Social and Life Sciences. Michael Lugo, Spring 2012

EEE-05, Series 05. Time. 3 hours Maximum marks General Instructions: Please read the following instructions carefully

Question 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%.

WISE International Masters

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang

Pre-Calculus Multiple Choice Questions - Chapter S8

Lecture 12: Interactions and Splines

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

Linear Regression With Special Variables

ISQS 5349 Spring 2013 Final Exam

Testing and Model Selection

Econometrics I Lecture 7: Dummy Variables

QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018

Chapter 6. Exploring Data: Relationships. Solutions. Exercises:

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Mid-term exam Practice problems

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

EXAM # 2. Total 100. Please show all work! Problem Points Grade. STAT 301, Spring 2013 Name

Final Exam Bus 320 Spring 2000 Russell

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

9. Linear Regression and Correlation

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Regression #8: Loose Ends

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

DEEP, University of Lausanne Lectures on Econometric Analysis of Count Data Pravin K. Trivedi May 2005

ORF 245 Fundamentals of Engineering Statistics. Final Exam

Spatial Discrete Choice Models

ECON 497: Lecture Notes 10 Page 1 of 1

Introduction to Linear Regression Analysis

Answer Key: Problem Set 5

ECON 497 Midterm Spring

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu

Tribhuvan University Institute of Science and Technology 2065

Econometrics Problem Set 4

y response variable x 1, x 2,, x k -- a set of explanatory variables

DSST Principles of Statistics

Making sense of Econometrics: Basics

Introducing Generalized Linear Models: Logistic Regression

8 Nominal and Ordinal Logistic Regression

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

You may use your calculator and a single page of notes. The room is crowded. Please be careful to look only at your own exam.

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm

WISE International Masters

Practice Questions for Exam 1

Binary Dependent Variables

This exam contains 13 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Descriptive Statistics Class Practice [133 marks]

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.

IUT of Saint-Etienne Sales and Marketing department Mr. Ferraris Prom /04/2017

Statistics 100 Exam 2 March 8, 2017

Incentives and Nutrition for Rotten Kids: Intrahousehold Food Allocation in the Philippines

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

CHAPTER 7. + ˆ δ. (1 nopc) + ˆ β1. =.157, so the new intercept is = The coefficient on nopc is.157.

Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

Hypothesis testing. Data to decisions

Lecture 1: Description of Data. Readings: Sections 1.2,

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

The 2010 Medici Summer School in Management Studies. William Greene Department of Economics Stern School of Business

Correlation and regression

Econometric Analysis of Panel Data Assignment 4 Parameter Heterogeneity in Linear Models: RPM and HLM

MATH 10 INTRODUCTORY STATISTICS

Transcription:

Name Inference and Regression Final Examination, 2015 Department of IOMS This course and this examination are governed by the Stern Honor Code. Instructions Please write your name at the top of this page. Please answer all questions on this question book. Do not turn in a blue book. Please do not separate the pages of this exam booklet. Where a computation is required to answer a question, please show your work. (I cannot give partial credit for an incorrect numerical answer unless the work provided shows a partially correct computation.) Grading: There are 10 questions in this exam.there are 185 points in total. The point values for the questions are 1. 40 2. 40 3. 15 4. 15 5. 15 6. 15 7. 15 8. 15 9. 15 Total 185 1

[40] Part I. Labor Market The regressions on page 4 are based on data that are part of the National Longitudinal Survey that has been carried out on a yearly basis by the Bureau of Labor Statistics of the Department of Labor. The dependent variable in this regression model is the log of the monthly wage of the sample of individuals. The variables in the equations are EXP = Labor market experience in years, EXPSQ = EXP 2 WKS = Number of weeks worked this year SOUTH = 1 if the individual lives in the Southern part of the U.S. FEM = 1 if the person is female, 0 if they are male. Three regressions on page 4 below are computed (1) using the full sample of 4165 observations, (2) Using the 3392 observations for MARRIED = 1 and (3) Using the 773 observations for which the head of the household is not married. 1. Using an F test, test the hypothesis that the five slope coefficients (not the constant) in the first regression are equal to zero. Model test F[ 5, 4159] = 204.07561 Prob F > F*.00000 2. Using an F test, test the hypothesis that the same model applies to both married people and not married people. F=[(712.178-574.331-134.917)/6]/[(574.331+134.917)/(4165-6-6)] 3. Show in detail how the R-bar squared = 0.19604 (in the first regression) is computed. R-bar 2 = 1 (n-1)/(n-k)*(1-rsquared) plug in values from first regression. K=6,n=4165,Rsq=.19701. 4. The first regression uses the whole sample this is called the pooled regression. Using the pooled regression results, test the hypothesis that the coefficient on WKS equals 0.0. t=2.65. hypothesis is rejected. 5. Test the hypothesis that the coefficients on FEM in the MARRIED and NotMARRIED equations are the same. (Hint: The two subsamples are independent.) t = (-.25154-(-.37540))/sqr(.11968 2 +.03111 2 ) 2

ALL Ordinary least squares regression... LHS=LWAGE Mean = 6.67635 ---------- No. of observations = 4165 DegFreedom Mean square Regression Sum of Squares = 174.727 5 34.94544 Residual Sum of Squares = 712.178 4159.17124 Total Sum of Squares = 886.905 4164.21299 ---------- Standard error of e =.41381 Root MSE.41351 Fit R-squared =.19701 R-bar squared.19604 Model test F[ 5, 4159] = 204.07561 Prob F > F*.00000 LWAGE Coefficient Error z z >Z* Interval Constant 6.20560***.06364 97.51.0000 6.08086 6.33034 EXP.03952***.00253 15.60.0000.03455.04449 EXPSQ -.00073***.5590D-04-13.04.0000 -.00084 -.00062 WKS.00333***.00126 2.65.0080.00087.00580 SOUTH -.16040***.01417-11.32.0000 -.18817 -.13262 FEM -.42897***.02048-20.95.0000 -.46910 -.38884 ***, **, * ==> Significance at 1%, 5%, 10% level. MARRIED Ordinary least squares regression... LHS=LWAGE Mean = 6.73969 Standard deviation =.43122 ---------- No. of observations = 3392 DegFreedom Mean square Regression Sum of Squares = 56.2205 5 11.24409 Residual Sum of Squares = 574.331 3386.16962 Total Sum of Squares = 630.551 3391.18595 ---------- Standard error of e =.41185 Root MSE.41148 Fit R-squared =.08916 R-bar squared.08782 Model test F[ 5, 3386] = 66.29017 Prob F > F*.00000 LWAGE Coefficient Error z z >Z* Interval Constant 6.28777***.07373 85.28.0000 6.14326 6.43229 EXP.03531***.00290 12.16.0000.02962.04100 EXPSQ -.00064***.6361D-04-10.04.0000 -.00076 -.00051 WKS.00249*.00145 1.72.0855 -.00035.00534 SOUTH -.15925***.01584-10.05.0000 -.19030 -.12820 FEM -.25154**.11968-2.10.0356 -.48610 -.01697 Not Married Ordinary least squares regression... LHS=LWAGE Mean = 6.39839 Standard deviation =.48690 ---------- No. of observations = 773 DegFreedom Mean square Regression Sum of Squares = 48.1044 5 9.62087 Residual Sum of Squares = 134.917 767.17590 Total Sum of Squares = 183.021 772.23707 ---------- Standard error of e =.41941 Root MSE.41778 Fit R-squared =.26283 R-bar squared.25803 Model test F[ 5, 767] = 54.69445 Prob F > F*.00000 LWAGE Coefficient Error z z >Z* Interval Constant 5.98198***.12792 46.76.0000 5.73126 6.23270 EXP.04981***.00531 9.39.0000.03942.06021 EXPSQ -.00098***.00012-8.24.0000 -.00122 -.00075 WKS.00534**.00254 2.10.0358.00035.01033 SOUTH -.16276***.03224-5.05.0000 -.22595 -.09957 FEM -.37540***.03111-12.07.0000 -.43637 -.31443 3

[40] Part II. Moral Hazard This part of our study deals with a phenomenon called moral hazard. The theory of moral hazard holds that people act differently when they have insurance. In the health care world, what this means is that if people have health insurance, they use the health care system more. The model below is called a Poisson regression. We studied this in class. The dependent variable in the model is DOCVIS = the number of visits to the doctor taken by the person in the survey year. (This variable ranges from 0 to about 15, with a handful of outliers that range from 15 to 80. These are individuals who are chronically sick, or perhaps require a weekly treatment.) The insurance variable is PUBLIC. For people who have the insurance, PUBLIC = 1; for those who do not have the insurance, PUBLIC = 0. The Poisson regression model states that Y exp( λ ) i i λi Prob(DocVis i=y i) =, Yi = 0,1,... i = 1,...,N. Y! i In this model, λ i is the mean of the random variable (λ i is the regression function). To make the model into a regression, we form Expected value = E[y i x i ] = λ i = exp(β x i ) Maximum likelihood estimates of the three models based on the survey data are as follows: Model 1 contains my full theory Theory A about doctor visits. Model 2 contains only a constant term Theory Z. Model 3 contains only the constant term and AGE Theory B. 4

Poisson Regression Dependent variable DOCVIS Log likelihood function -13417.03363 Restricted log likelihood -14260.88282 Estimation based on N = 3377, K = 6 DOCVIS Coefficient Error z z >Z* Interval Constant.49959***.08009 6.24.0000.34261.65656 AGE.02059***.00077 26.80.0000.01909.02210 EDUC -.01796***.00447-4.01.0001 -.02673 -.00919 PUBLIC.34568***.03500 9.88.0000.27709.41426 INCOME -.82770***.05359-15.45.0000 -.93273 -.72267 Interaction FEMALE*INCOME Intrct01.76829***.03926 19.57.0000.69133.84524 ***, **, * ==> Significance at 1%, 5%, 10% level. Poisson Regression Dependent variable DOCVIS Log likelihood function -14260.88282 DOCVIS Coefficient Error z z >Z* Interval Constant 1.33050***.00885 150.38.0000 1.31316 1.34784 Poisson Regression Dependent variable DOCVIS Log likelihood function -13814.22155 DOCVIS Coefficient Error z z >Z* Interval Constant.32994***.03581 9.21.0000.25975.40014 AGE.02266***.00076 29.84.0000.02117.02414 1. Do the regression results provide significant evidence of moral hazard? Explain. coefficient on public is large, positive and significant. yes 2. Form the log likelihood function (logl) for estimation of the parameters β. Sum of logs of Probabilities = Sum {-λ i + y i logλ i logy i!} 3. Obtain the first order (necessary) conditions for maximizing logl with respect to β. Sum{ -1 + y i /λ i }*λ i x i. = 0. 5

4. We can use the first two sets of results to test the hypothesis that the variables in the model are collectively significant. (Like, but not the same as the F test for the linear model.) Use the likelihood ratio to test the hypothesis that all 5 slope coefficients are zero. Now, use a likelihood ratio test to test theory B (which removes all the variables except for AGE) from the model against Theory A. LR test is 2*(14260 13417) chi squared (5) LR test is 2*(14260 13814) = chi squared 5 5. The way the model is constructed, β x = α + β 1 AGE + β 2 EDUC + β 3 PUBLIC + β 4 INCOME + β 5 FEMALE*INCOME Notice, then, that for women, that is when FEMALE=1, β x = α + β 1 AGE + β 2 EDUC + β 3 PUBLIC + (β 4 + β 5 )INCOME while for men, that is when FEMALE=0 and β x = α + β 1 AGE + β 2 EDUC + β 3 PUBLIC + β 4 INCOME We are interested in how the expected value differs for men and women. Consider someone who is 35 years old, has 12 years of education, has public insurance (PUBLIC=1) and INCOME = 0.5. Compute the expected values for men and for women, and comment on the difference that you find. (This is called the partial effect of gender.) E[] = exp(.49959 +.02059*35 -.01796*12 +.34568*1 -.82770*.5 +.76829*1*.5) - exp(.49959 +.02059*35 -.01796*12 +.34568*0 -.82770*.5 +.76829*0*.5) compute to get partial effect. 6. How does the expected number of doctor visits respond to years of education? (a) Obtain the (mathematical) derivative of E[DocVis i ] = λ i with respect to EDUC. Tip: d(e t )/dt = e t. Use the chain rule as well. partial of λ i wrt educ = λ i *(-.01796). (b) Compute the value for the person in part 4; AGE = 35 years old, EDUC = 12 years of education, INCOME =.5 and has public insurance PUBLIC = 1. plug in values and compute λ i with educ =13 then with educ = 12 and compute the difference. (b) Compute λ i using these values but now with EDUC = 13 instead of 12, What do you find? you should get essentially the same answer as in part a. 6

[15] Part III. Regression Basics. Forbes Magazine reported a survey in which citizens of 150+ countries reported how happy they were with their lives, using some kind of survey scale. In a linear regression of this variable (obtained from Forbes website) on the (disability adjusted) life expectancy in the country Minitab reported the following regression results. Answer T (true) or F (false) to each of the following. Explain your answer in one short sentence. 1. The reported statistics provide evidence of a significant regression (i.e., people who live longer are happier). T F is huge. 2. 90% of the residuals are between -7 and +7. False. 90% are within +/- one standard deviation, -14 to +14. 3. Increasing life expectancy by one year causes a significant increase of about 1 happiness unit. False. No causation. 4. The correlation between the variables HAPPY and DALE is +0.675. True. square root of.455. Positive as slope is positive. 5. The regression slope estimator would be regarded as statistically significant True. t ratio is very large. 7

[15] Part IV. Analyzing Descriptive Statistics It is often found that on average women tend to give lower answers to the health satisfaction question in a survey such as the one we are analyzing in this exam. 1. The histogram below shows the relative frequencies (proportions) of the answers for men and women. The results in the histogram do agree with the suggested comparison of men and women. Explain. It looks like men have taller bars for the high values and lower bars for the low values. 2. The following statistics were gathered for the sample of men and women in a sample of 2039 observations. Test the hypothesis that the means for men and women are equal. Descriptive Statistics for 1 variables --------+-------------------------------------------------------------- Variable Mean Std.Dev. Minimum Maximum Cases --------+-------------------------------------------------------------- Subsample is FEMALE = 0 (Men) HLTHSAT 2.587426 1.100412 0.0 4.0 1084 Subsample is FEMALE = 1 (Women) HLTHSAT 2.174369 1.197662 0.0 4.0 955 --------+-------------------------------------------------------------- Use standard test. (2.587426 2.174369)/sqr(1.00412 2 /1084 + 1.197662 2 /955). 8

[15] Part V. Multiple Regression My model for the auction prices of Monet paintings was Ln$ = Constant + β 1 lnsurface Area + β 2 lnaspect Ratio + β 3 ln Height + β 4 lnwidth + β 5 Signed. (Surface area = Height Width, Aspect Ratio = Height/Width.) It looks like Minitab didn t like my model as much as I did. Explain in detail why Minitab insisted on dropping the two variables from my equation. This is the multicollinearity. lnaspect lnwidth lnheight, which is a linear combination of variables in the equation. Same with surface area. 9

[15] Part VI. Statistical Theory Suppose the density of x is f(x) = What is the density of z = x 2 x 2 1 (ln x) exp, 0 < x < +. 2π 2 x = z 1/2 so dx = (1/2)z -1/2 dz 1/2 2 ( ln z ) ( ln z) 2 1 1 1 1 1 exp = xp z 2π 2 2 z z 2 2π 4(2) This is also a lognormal [15] Part VII. Very Basic Statistics The histogram above describes the 2039 observations on the variable income used in part IV. 1. Provide a guess of the sample mean, and explain how you obtained it. about.5 middle of distribution 2. Provide a guess of the sample median and explain how you obtained it. about.45. Less than the median 3. Provide a guess of the sample standard deviation and explain precisely how you obtained it. about.16. Range 0 to 1 should be about 6 standard deviations 4. Are these data skewed to the left or to the right, or not at all? To the right 10

[15] Part VIII. Bivariate Outcomes. An important variable in the analysis of German health outcomes is whether the individual takes up the public health insurance. The table below shows the takeup rates for men and women. Cross Tabulation PUBLIC +--------+--------------+------+ FEMALE NO_INS INS Total +--------+--------------+------+ MALE 145 939 1084 FEMALE 74 881 955 +--------+--------------+------+ Total 219 1820 2039 +--------+--------------+------+ 1. What proportion of women take the insurance? What proportion of men take the insurance? 881/955 2. The chi squared value for the test of independence of gender and insurance takeup is 16.77. (See class notes 10, pages 37-43.) Should I conclude that insurance takeup and gender are independent based on these results? Explain in detail. They are not independent. Critical chi squared would be 3.84 < 16.77 3. The following are the results of a logistic regression of PUBLIC in which the only variable that explains the whether the individual takes public insurance is whether the applicant is female or not. Are these results consistent with the chi squared test in part 1? Explain. Binary Logit Model for Binary Choice Dependent variable PUBLIC Log likelihood function -686.85351 PUBLIC Coefficient Error z z >Z* Interval Constant 1.86808***.08923 20.94.0000 1.69320 2.04296 FEMALE.60891***.15037 4.05.0001.31420.90362 ***, **, * ==> Significance at 1%, 5%, 10% level. It is consistent, but this says women take up public insurance more than men. Previous result only suggests dependence. 11

[15] Part IX. Function of a Random Estimator 1. For the income data used earlier, I defined HHINCOME = household income = 100 income. The sample variance of HHINCOME is 275.919. This estimates the variance, σ 2. The variance of an estimator of a variance is approximately 2σ 4 /N. N is 2039, so the variance estimator for s 2, the estimator, is 72.89. The precision of a random variable is defined as φ = 1/σ. Estimate the precision of the income variable. Precision = 1/σ = 1/sqr(275.919) 2. The standard error for the estimator of σ 2 in part 1 above is sqr(72.89) = 8.54. How would you compute the standard error for the precision, φ = (σ 2 ) -1/2. What is the value? Show in detail how you obtain the result Use delta method. Variance of s 2 is 72.89. Derivative is -1/2 (σ 2 ) -3/2 = -1/2σ 3. Squared derivative is (1/4)/σ 6. Value is (1/4) / (275.919 3 ) So, variance of φ would be [(1/4)/275.919 3 ] * 72.89. About.00000347. Standard error =.0019. 1/s is.0602. 12