Eco517 Fall 2013 C. Sims EXERCISE DUE FRIDAY NOON, 10/4

Similar documents
Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Multiple Regression Analysis: Estimation. Simple linear regression model: an intercept and one explanatory variable (regressor)

Using generalized structural equation models to fit customized models without programming, and taking advantage of new features of -margins-

Eco517 Fall 2014 C. Sims FINAL EXAM

Multivariate Regression: Part I

Econometrics 1. Lecture 8: Linear Regression (2) 黄嘉平

Eco517 Fall 2014 C. Sims MIDTERM EXAM

Testing Restrictions and Comparing Models

Applied Statistics and Econometrics

Applied Statistics and Econometrics

Choosing among models

Introduction to Econometrics. Multiple Regression (2016/2017)

The Application of California School

ROBINS-WASSERMAN, ROUND N

Introduction to Econometrics. Multiple Regression

Chapter 10-Regression

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

Swarthmore Honors Exam 2012: Statistics

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Hypothesis Tests and Confidence Intervals in Multiple Regression

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Econometrics Midterm Examination Answers

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

STA 4273H: Sta-s-cal Machine Learning

Model comparison. Christopher A. Sims Princeton University October 18, 2016

Quantitative Understanding in Biology 1.7 Bayesian Methods

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

22s:152 Applied Linear Regression

Essential of Simple regression

Hypothesis Tests and Confidence Intervals. in Multiple Regression

CHAPTER 2 Estimating Probabilities

Part 6: Multivariate Normal and Linear Models

Linear Regression with Multiple Regressors

(I AL BL 2 )z t = (I CL)ζ t, where

1 Data Arrays and Decompositions

Bayesian Econometrics

4. Nonlinear regression functions

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

The Multinomial Model

Computational Perception. Bayesian Inference

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

CPSC 340: Machine Learning and Data Mining

Class President: A Network Approach to Popularity. Due July 18, 2014

Part 2: One-parameter models

STAT 518 Intro Student Presentation

Econ671 Factor Models: Principal Components

EXERCISE ON VAR AND VARMA MODELS

MS&E 226: Small Data

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

Applied Statistics and Econometrics

Part 4: Multi-parameter and normal models

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

CS281 Section 4: Factor Analysis and PCA

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Statistics 910, #5 1. Regression Methods

We d like to know the equation of the line shown (the so called best fit or regression line).

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER

Chapter 6: Linear Regression With Multiple Regressors

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Nonlinear Regression Functions

CONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S

Linear Regression with Multiple Regressors

Lecture 5. In the last lecture, we covered. This lecture introduces you to

Gov 2000: 9. Regression with Two Independent Variables

Bayesian Linear Regression [DRAFT - In Progress]

Producing Data/Data Collection

Producing Data/Data Collection

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

ECE521 week 3: 23/26 January 2017

Economics 573 Problem Set 5 Fall 2002 Due: 4 October b. The sample mean converges in probability to the population mean.

{X i } realize. n i=1 X i. Note that again X is a random variable. If we are to

Equations and Inequalities

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Introduction to Maximum Likelihood Estimation

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

1 The Classic Bivariate Least Squares Model

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

MITOCW MITRES18_006F10_26_0602_300k-mp4

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

Linear Models A linear model is defined by the expression

STA 431s17 Assignment Eight 1

The Simple Linear Regression Model

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Chapter 4: Factor Analysis

Quantitative Techniques - Lecture 8: Estimation

Replication of Examples in Chapter 6

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

MS&E 226: Small Data. Lecture 6: Bias and variance (v2) Ramesh Johari

Transcription:

Eco517 Fall 2013 C. Sims EXERCISE DUE FRIDAY NOON, 10/4 (1) Data on a sample of California schools is available on the course web site, either as an R data file (caschool.rdata) or an Excel file (caschool.xlsx). In R, after load( caschool.rdata, a data frame named caschool will be on your workspace. A pdf file describing the variables is available as californiatestscores.docx. The principal components of a vector of random variables with covariance matrix Σ are the uncorrelated vector of variables z = W y, where Σ = WΛW is the eigenvalue decomposition of Σ. (W s columns are Σ s eigenvectors.) The rows of W determine how much each variable in y = Wz loads on each of the principal components z. (a) Find the eigenvector decomposition of the correlation matrix of the numeric variables in the data. (The first few columns are non-numeric.) [In R, these commands may be helpful: cor(), or cov() followed by cov2cor(), and eigen().] (b) Which principal components are most important in determining testscr, according to this decomposition? (I think two stand out.) Looking at the columns of W corresponding to these two components, which variables are strongly related to testscr? Here s a sequence of R commands that check which principal components are most important in determining testscr. > ev <- eigen(cor(caschool[, -(1:5)])) > dimnames(ev$vectors)[[1]] <- names(caschool)[-(1:5)] > ev$vectors["testscr", ] [1] 0.401332321-0.120991993 0.032842203-0.031406307-0.10573 [6] -0.110538803-0.217087897-0.283478808-0.050556731 0.02193 [11] -0.103401123 0.001055974-0.810932894 The coefficients on testscr for the first and 13th are considerably bigger than any of the others. These two columns of the eigenvector matrix are enrl_tot -0.1509110-7.740888e-07 teachers -0.1465162 1.055164e-06 calw_pct -0.2875774-4.499499e-09 meal_pct -0.3790833-1.524986e-07 computer -0.1114922-3.692731e-07 testscr 0.4013323-8.109329e-01 comp_stu 0.1563686 9.388346e-08 expn_stu 0.1133054-7.164313e-10 c 2013 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License

2 EXERCISE DUE FRIDAY NOON, 10/4 str -0.1474874 4.695266e-08 avginc 0.3076185 1.650080e-08 el_pct -0.3046274 1.808937e-07 read_scr 0.4029943 4.279097e-01 math_scr 0.3833833 3.991004e-01 The second has positive, nearly equal weights on read_scr and math_scr, and a negative weight, about twice the size, on testscr. Its associated eigenvalue is tiny, zero roughly to machine precision. This is just picking up the fact that testscr is the average of math_scr and read_scr The only reason the weights are not proportional to (1, -.5, -.5) is that the scores are scaled slightly differently in the conversion to a correlation matrix. The first column picks up substantial weights on many variables, including positive, similar weights on all three score variables. The other large positive weight is on avginc. Large negative weight appear on meal_pct and el_pct. str gets a negative weight, but it s small, fourth to last in absolute value. (c) Using the same caschool data, estimate a linear regression of the testscr (school average test scores) on all the other non-numeric variables except the last two (math_scr and read_scr). [In R, the command lm(testscr ~ x + y + z, data=caschool) will work, with x + y + z replaced by a list of all the other variable names, separated by plus signs.] Here s the part of summary(lmout) from R that gives coefficients and posterior standard errors. Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 6.588e+02 9.748e+00 67.581 < 2e-16 *** enrl_tot -5.647e-04 1.648e-03-0.343 0.7321 teachers 8.203e-03 3.636e-02 0.226 0.8216 calw_pct -8.290e-02 5.834e-02-1.421 0.1561 meal_pct -3.739e-01 3.634e-02-10.288 < 2e-16 *** computer 1.570e-03 3.112e-03 0.505 0.6141 comp_stu 1.018e+01 7.767e+00 1.310 0.1908 expn_stu 1.597e-03 9.035e-04 1.767 0.0779. str -1.525e-01 3.262e-01-0.467 0.6404 avginc 6.147e-01 8.972e-02 6.851 2.71e-11 *** el_pct -1.995e-01 3.508e-02-5.686 2.47e-08 *** --- Signif. codes: 0 âăÿ***âăź 0.001 âăÿ**âăź 0.01 âăÿ*âăź 0.05 âăÿ.âă (d) Explain why the last two are left out the principal components decomposition may help.

EXERCISE DUE FRIDAY NOON, 10/4 3 As already noted, the last two are related to testscr trivially via an identity, and our regression would just recover the identity if we included those variables. (e) If you were interested in what determines testscr, would you reach different conclusions from the principal components decomposition and the regression? Note that the principal components decomposition, being based on the correlation matrix, is scale-free, while the sizes of the regression coefficients are not. The t-statistics, as found easily with the summary() function applied to the regression output in R, are more comparable to the sizes of the coefficients in W. The t statistics that exceed 2 are for the same variables that entered the first principal component: avginc, meal_pct, and the signs of the coefficients on variables other than testscr in the principal component vector match those in the regression. Note that a tightly fitting regression generally does not match this pattern. The 13th principal component, for example, has opposite-signed coefficients on testscr and the two component scores. The same-signed coefficient on the dependent variable that is found here arises when all the variables that get substantial weight are acting like noisy measures of a single underlying source of variation, which here might be good school, or, maybe more plausibly, student body from supportive backgrounds. (2) (Stopping rule paradox) Suppose one has data on a single family from an isolated and unusual subculture where a) each family continues to have children until they have a boy, and then stop having children and b) the ratio of male to female births is not the usual.5 and is unknown. The data are just the family size n, which of course consists of n 1 female births and one male birth. (a) What is the likelihood function for this single sample, as a function of the unknown probability p of a male birth? (1 p) n 1 p (b) Show that a flat-prior Bayesian posterior mean for p is biased for most values of p. You can do this analytically, but quicker, and OK for the exercise, is probably just to check it numerically. There will be one value of p for which the posterior mean is frequentist-unbiased. What is it? (3 significant figures accuracy OK). The flat-prior Bayesian posterior mean, since the likelihood is proportional to a Beta(n, 2) pdf, is 2/(n + 2). The mean of this estimator is 2 n + 2 (1 p)n 1 p The point at which it is unbiased is p =.568, approximately. for p s below that it is biased up, for p s above that it is biased down. The largest absolute bias is at p = 1, where it is a downward bias of one third. The largest upward bias is at around p =.16, where the bias is about.18.

4 EXERCISE DUE FRIDAY NOON, 10/4 (c) If instead of one family, data on a fairly large number N of families is available, taking an arithmetic average of the Bayesian posterior means for each family as calculated above is not a good way to estimate p. Why? In a large sample, the average of the biased estimates will be biased by the same amount as the individual estimates, so the estimates will not converge to the truth as the sample size increases. (d) The likelihood for the full set of observations on N families implies a posterior mean that is different from the average of the family-by-family posterior means. What is the posterior mean for the full likelihood? If we multiply the likelihoods for all the individual families together, as is appropriate if the family data are i.i.d. across families sampled, the result is just the same likelihood we would get if we ignored the family groupings and just counted boys and girls: p N (1 p) j n j N, where N is the number of families (also the number of boys) and m = j n j N is the number of girls, where j indexes families. With a flat prior, since this is a Beta(N + 1, m + 1) distribution, the posterior mean is (N + 1)/(N + m + 2). This does converge in probability to the true probability, because N/(N + m) is the average over the sample of the i.i.d. variable that is 1 for boys and 0 for girls. Since the ratio of (N + 1)/(N + m + 2) to N/(N + m) goes to 1 as N and m go to infinity, the posterior mean converges to the true value with probability one. (e) There is a frequentist-unbiased estimate of 1/p for the case of data on a single family. What is it? Is there an unbiased estimate of p itself for this case? What about a procedure that takes an arithmetic average of the perfamily unbiased estimates of 1/p when there are N families, then estimates p as the inverse of this average. Would it converge in probability to p as N increased? Can it be improved upon? This may have involved a bit more calculus cleverness than I intended. It seems intuitively plausible that n, the number of children in the family, is an unbiased estimate of 1/p. (This does involve treating n as + when p = 0.) Proving this is not hard, if you ve seen this kind of argument before, but possibly hard otherwise. The expected value of n is np(1 p) n 1 = p d dp (1 p)n = p d dp 1 p p = 1 p. So n is an unbiased estimator of 1/p. Averaging it across a large number of randomly selected families will therefore give a result that converges in probability to 1/p. Notice that the average of n across the N families is just (N + m)/n, which is the inverse of the maximum likelihood estimator of p.

EXERCISE DUE FRIDAY NOON, 10/4 5 The MLE, being average of N + m i.i.d. variables, satisfies the CLT and therefore converges to the true p at the rate (N + m) 1/2. One over this estimator of 1/p differs from the Bayesian posterior mean by a factor that converges to one at the rate 1/(N + m), so the two estimators are nearly the same in large samples. The frequentist unbiased estimator of 1/p, because it is the MLE for that function of the parameter, is a function of the sufficient statistics. Thus there is no quick way to improve on it by taking its expectation conditional on the sufficient statistics (i.e. applying the Rao-Blackwell theorem, which we stated in class). Since it is the maximum likelihood estimator of 1/p, it is also fully efficient as an estimator of that. So it can t be improved upon. Note that, though n is unbiased in a frequentist sense, it gives 1/p = 1 when n = 1. We know that 1/p 1, so any reasonable distribution for 1/p given the data in that case must have expectation above one. For characterizing postsample beliefs, calling the 1/p = 1 estimate unbiased is misleading, though nonetheless technically correct, since unbiasedness refers only to behavior of the estimator across hypothetical repeated samples, not to reasonable beliefs given the actual sample at hand.