Internal vs. external validity. External validity. Internal validity

Similar documents
Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

5.1 Two-Step Conditional Density Estimator

ENGI 4421 Central Limit Theorem Page Central Limit Theorem [Navidi, section 4.11; Devore sections ]

Ch. 1 Introduction to Estimation 1/15

BIO752: Advanced Methods in Biostatistics, II TERM 2, 2010 T. A. Louis. BIO 752: MIDTERM EXAMINATION: ANSWERS 30 November 2010

Chapter 3.1: Polynomial Functions

ENGI 4421 Central Limit Theorem Page Central Limit Theorem [Navidi, section 4.11; Devore sections ]

Quantum Mechanics for Scientists and Engineers. David Miller

Author. Introduction. Author. o Asmir Tobudic. ISE 599 Computational Modeling of Expressive Performance

Solutions to Midterm II. of the following equation consistent with the boundary condition stated u. y u x y

Every gas consists of a large number of small particles called molecules moving with very high velocities in all possible directions.

D.S.G. POLLOCK: TOPICS IN TIME-SERIES ANALYSIS STATISTICAL FOURIER ANALYSIS

A Study on Estimation of Lifetime Distribution with Covariates Under Misspecification

Inference in the Multiple-Regression

INSTRUMENTAL VARIABLES

Solutions. Definitions pertaining to solutions

Basics of heteroskedasticity

Section 11 Simultaneous Equations

Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 9

Pattern Recognition 2014 Support Vector Machines

AP Statistics Notes Unit Eight: Introduction to Inference

How do scientists measure trees? What is DBH?

The Simple Linear Regression Model: Theory

Comparative analysis of bayesian control chart estimation and conventional multivariate control chart

Markov processes and the Kolmogorov equations

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

Stat 421-SP2012 Interval Estimation Section

Intermediate Division Solutions

In this section we derive some finite-sample properties of the OLS estimator. b is an estimator of β. It is a function of the random sample data.

Statistics 511 Additional Materials

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

Resampling Methods. Chapter 5. Chapter 5 1 / 52

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

A Matrix Representation of Panel Data

Section 11 Simultaneous Equations

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

Topic 9: Sampling Distributions of Estimators

An Investigation of Stratified Jackknife Estimators Using Simulated Establishment Data Under an Unequal Probability Sample Design

Linear Regression with One Regressor

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Study of Energy Eigenvalues of Three Dimensional. Quantum Wires with Variable Cross Section

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

If σis unknown. Properties of t distribution. 6.3 One and Two Sample Inferences for Means. What is the correct multiplier? t

COWLES FOUNDATION FOR RESEARCH IN ECONOMICS YALE UNIVERSITY

1 Review of Probability & Statistics

Introduction. Question: Why do we need new forms of parametric curves? Answer: Those parametric curves discussed are not very geometric.

Statistics, Numerical Models and Ensembles

Matching a Distribution by Matching Quantiles Estimation

Cambridge Assessment International Education Cambridge Ordinary Level. Published

Distributions, spatial statistics and a Bayesian perspective

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

ALE 26. Equilibria for Cell Reactions. What happens to the cell potential as the reaction proceeds over time?

DISTURBANCE TERMS. is a scalar and x i

LECTURE 13 SPURIOUS REGRESSION, TESTING FOR UNIT ROOT = C (1) C (1) 0! ! uv! 2 v. t=1 X2 t

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

are specified , are linearly independent Otherwise, they are linearly dependent, and one is expressed by a linear combination of the others

, the random variable. and a sample size over the y-values 0:1:10.

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Grade 3 Mathematics Course Syllabus Prince George s County Public Schools

Physical Chemistry Laboratory I CHEM 445 Experiment 2 Partial Molar Volume (Revised, 01/13/03)

Lecture 8: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

5.80 Small-Molecule Spectroscopy and Dynamics

Lecture 2: Monte Carlo Simulation

Output Analysis (2, Chapters 10 &11 Law)

Parameter, Statistic and Random Samples

Accuracy assessment methods and challenges

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

1 Inferential Methods for Correlation and Regression Analysis

Lecture 7: Damped and Driven Oscillations

[1 & α(t & T 1. ' ρ 1

Evaluating enterprise support: state of the art and future challenges. Dirk Czarnitzki KU Leuven, Belgium, and ZEW Mannheim, Germany

Introducing Sample Proportions

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

BIOSTATS 640 Intermediate Biostatistics Frequently Asked Questions Topic 1 FAQ 1 Review of BIOSTATS 540 Introductory Biostatistics

Physics 2B Chapter 23 Notes - Faraday s Law & Inductors Spring 2018

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Differentiation Applications 1: Related Rates

What regression does. so β. β β β

Sampling Distributions, Z-Tests, Power

Logit regression Logit regression

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

NAME: Prof. Ruiz. 1. [5 points] What is the difference between simple random sampling and stratified random sampling?

Topic 9: Sampling Distributions of Estimators

Section 13 Advanced Topics

Lecture 13: Markov Chain Monte Carlo. Gibbs sampling

Do big losses in judgmental adjustments affect experts behaviour? Fotios Petropoulos, Robert Fildes and Paul Goodwin

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

The z Transform. The Discrete LTI System Response to a Complex Exponential

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Fundamentals of Regression Analysis

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

Simulation. Two Rule For Inverting A Distribution Function

Functional Form and Nonlinearities

Transcription:

Secti 7 Mdel Assessmet Iteral vs. exteral validity Iteral validity refers t whether the aalysis is valid fr the pplati ad sample beig stdied. Exteral validity refers t whether these reslts ca be geeralized t ther pplatis: is the pplati frm which the sample is draw represetative f a larger pplati abt which iferece is sght? Exteral validity Exteral validity is related t Assmpti #0. Bt i this case, the qesti is t whether all sample bservatis fllw the same mdel bt rather d the sample bservatis fllw the same mdel as the mre geeral pplati. Or, alteratively, are they draw frm a sb-pplati that has characteristics that wld make the cefficiets (r specificati) differet? All pplatis have sb-pplatis that vary i their characteristics. If r samplig prcess is based a particlar sb-pplati, we mst wrry abt the geeralizability f r reslts, which is exteral validity: Ca perfrm a iterally valid aalysis f a idisycratic sb-pplati that wld t geeralize t thers. Example: Nel s wrk measrig the vale f tree capy r walkability i Prtlad. D reslts geeralize t ther cities r d Prtladers vale these characteristics mre (r less) tha peple i ther cities. There are direct statistical tests fr exteral validity (less y have data draw frm a brader pplati, i which case y prbably shld have sed it t begi with). It is a sally a matter f jdgmet. Oe way that sme peple try t assess exteral validity is t split the sample i half, estimate ver e sample, the assess the predictis fr the ther sample. If predictis are gd, the bth halves f the sample may fllw same mdel. This is seless if bth halves f the sample are draw frm a sbpplati that is idisycratic, thgh. Iteral validity Give the pplati frm which the sample is draw, are the assmptis derlyig the estimatrs valid? Omitted variables They are always there. ~ 63 ~

Omitted variables bias the cefficiet estimatrs fr ay iclded variables that are crrelated with them. I a strict sese, early every ecmetric regressi is biased becase f this. What variables are mst bvisly mitted? What variables i the eqati wld be crrelated with them? Hw des this missi bias the iclded cefficiets? Prxy variables are bservable variables that are crrelated with bserved variables that shld be iclded. Prxy variables are legitimate if we are t particlarly iterested i the effect f the variable fr which they prxy. Ca t iterpret the cefficiet the prxy directly as the cefficiet the mitted variable. OK if the differece betwee the tre variable ad the prxy is crrelated with iclded variables. Pael data ca help if bserved variables vary acrss its bt t ver time r ver time bt t acrss its. Misspecificati f fctial frm Ca se RESET test t explre whether qadratics are sefl. If y kw what alterative fctial frms might be mre apprpriate, y ca test them. Measremet errr (errrs-i-variables bias) Measremet errr i depedet variable Sppse that the tre depedet variable is Y bt that we istead bserve Y = Y +ε, where ε i is a radm measremet errr. i i i The estimated mdel, the is = β +β + ( +ε ) Y. i 0 1 i i i As lg as the measremet errr i Y (ε) is crrelated with, there is bias i the estimatr f β 1. The SER will be a estimate f the stadard deviati f the cmpsite errr term + ε, bt therwise OLS is fie. Measremet errr i regressr Sppse that the depedet variable is measred accrately bt that we measre with errr: = +η. i i i The estimated mdel is = β +β + ( β η ) Y. i 0 1 i i 1 i Becase η is part f ad therefre crrelated with it, the cmpsite errr term is w crrelated with the actal regressr, meaig that βˆ 1 is biased ad icsistet. ~ 64 ~

If ad η are idepedet ad rmal, the σ plim β ˆ =. 1 β 1 σ +ση The estimatr is biased tward zer. If mst f the variati i cmes frm, the the bias will be small. As the variace f the measremet errr grws i relati t the variati i the tre variable, the magitde f the bias icreases. As a wrst-case limit, if the tre des t vary acrss r sample f bservatis ad all f the variati i r measre is radm ise, the the expected vale f r cefficiet is zer. Best slti is gettig a better measre. Alteratives are istrmetal variables r direct measremet f degree f measremet errr. Fr example, if a alterative, precise measre is available fr sme argably radm sb-sample f bservatis, the we ca calclate the variace f the tre variable ad the variace f the measremet errr ad crrect the estimate. Sample selecti bias Few samples are trly radm draws frm fll pplati. Istead, they are draws (radm r t) frm sme sb-pplati: May hmeless are cted i cess N wage data thse wh d t wrk Plls miss peple with listed phe mber Crss-ctry regressis are fte limited t the ctries fr which gd data are available (which is t a radm sample f ctries) If sample selecti is related t, the we have isses f exteral validity (d estimates apply t missed sb-pplati) bt t iteral validity. Reslts may be valid fr the sb-pplati fr which they are estimated. If sample selecti is related t Y (r, specifically, t ), the we are t drawig radmly frm the pplati distribti f the errr term (as we assme) ad r reslts will be biased. There are methds f cpig with sample-selecti bias. Imptig vales fr missig wage data t allw iclsi f fll sample Simltaeity bias (reverse r bidirectial casality) If chages i Y (presmably de t chages i ) case t chage, the ad will be crrelated ad OLS estimates will be biased ad icsistet. Fr example, fr may years macrecmists estimated Keyesia csmpti fctis by OLS: C = β +β GDP + ~ 65 ~ t 0 1 t t.

(There are time-series prblems with this regressi that we will stdy later.) Fr w, te that if aggregate demad affects tpt, the GDP i each year is C + I + G + N, s a psitive shck t csmpti (a psitive ) icreases GDP. Becase the regressi is crrelated with the errr term, OLS estimates f β 1 were biased ad icsistet. (Bt they lked gd ad had ridiclsly high R vales, s they persisted fr may years despite the prtests f ecmetricias.) The sal crrecti is t se a istrmetal-variables (tw-stage least sqares) estimatr. Heterskedasticity Atcrrelati Recall that heterskedasticity cases OLS t be iefficiet (relative t WLS), bt it is still biased ad csistet. The classical stadard errrs will be biased der heterskedasticity, bt we ca se White s rbst cvariace matrix estimatr, which we ve talked abt earlier. Usig rbst errrs is the mst cmm crrecti fr heterskedasticity. If errr terms f differet bservatis are crrelated, the OLS is als iefficiet (relative t a crrected GLS estimatr), bt is biased ad csistet. Atcrrelati ca be spatial: Umeasred eighbrhd characteristics (mitted variables) that case hses that are clse tgether t be mre r less valable. Atcrrelati is biqits i time-series data: This perid s errr term is early always related t last perid s. (Umeasred mitted variables are themselves crrelated ver time.) Agai, stadard errrs are biased, bt White s heterskedastic-csistet stadard errrs d t help here. There are estimated stadard errrs that are rbst t atcrrelati. (Use hac pti i Stata.) Alteratively, e ca try t mdel the atcrrelati ad trasfrm the mdel it e that has atcrrelati (GLS). Examples iclde AR(1) mdels i time series ad mdelig spatially crrelated errrs i crss-secti mdels. Validity i frecastig/predicti Regressi mdels may be valid fr frecastig eve if their cefficiets are t biased r csistet. Sppse that we kw that is measred with errr. ~ 66 ~

We ca still se a regressi f Y t predict the tcme f a particlar measred eve thgh the estimated cefficiet is a biased estimatr fr the effect f. That is becase we have crrectly estimated the relatiship betwee the isy ad Y. We wld t get reliable estimates if r predicti qesti relied the tre rather tha the isy. We fte bild mdels with isy data r prxy variables t get predictis f ather variable. The biggest qesti i frecastig is exteral validity: des the mdel that applies t the sample y sed fr estimati als apply t the bservati fr which y wat a frecast? Measrig predicti errr: What is the variace f Y ˆ? Yˆ =β ˆ +β ˆ 0 1 Y =β 0 +β 1 + ( ˆ ) Y Yˆ =β β ˆ + β β + 0 0 1 1 ( Yˆ) = E( Y Yˆ) = ( β ˆ 0) + ( β ˆ 1) + ( βˆ ˆ 0 β 1) + ( ) var var var cv, var. Fr simple regressi der hmskedasticity, cv 1 i 1 i i i= 1 i= 1 i σ i= 1 i= 1 i i i i= 1 i= 1 i i σ i= 1 i= 1. ( ˆ ) ( ) β =σ =σ i = = ( i ) i ~ 67 ~

S + i ( ˆ var Y ) =σ 1+ ( i ) + + =σ 1+ ( i ) ( i ) + ( ) =σ 1+ ( i ) 1 ( ) =σ 1 + +. ( i ) Predicti errr is smaller fr: Smaller errr variace i Larger sample size (thrgh bth secd ad third terms) Greater sample variati i Observatis clser () t the mea ~ 68 ~