Problem Set 7. Ideally, these would be the same observations left out when you

Size: px
Start display at page:

Download "Problem Set 7. Ideally, these would be the same observations left out when you"


1 Business 4903 Instructor: Christian Hansen Problem Set 7. Use the data in MROZ.raw to answer this question. The data consist of 753 observations. Before answering any of parts a.-b., remove 253 observations which will be used for an out-ofsample comparison in part c. answered problem on Problem Set 5. Ideally, these would be the same observations left out when you a. Estimate the model E[inlf X = {kidslt6, kidsge6, age, educ, repwage, f aminc, exper}] = p K (X) β by lasso with penalty parameter chosen by cross-validation. how you construct the dictionary of approximating functions p K (X)? Carefully explain b. Estimate the model E[inlf X = {kidslt6, kidsge6, age, educ, repwage, f aminc, exper}] = Λ(p K (X) β) where Λ( ) is the logistic cdf by l -penalized logistic regression with penalty parameter chosen by cross-validation. c. Use the 253 observations you held out to compare the estimates obtained in parts a.-b. and the estimates obtained in problem on Problem set 5. Calculate the mean square forecast error as 253 i hold out (ĝ j(x i ) y i ) 2 and the misclassification rate 253 i hold out (ŷ j,i y i ) where ŷ j,i is the Bayes-classifier based on model j - i.e. ŷ j,i = (ĝ j (x i ).5), and ĝ j (x i ) are the fitted values obtained from each of the competing models. Which procedure performs best according to each metric? Do the performance discrepancies seem large? [Note: Assuming independent sampling, you can compute a standard error for the mean square forecast error and for the misclassification rate conditioning on the estimated model.] 2. Use the data in CreditCardDefault.xls to answer this question. The data consist of observations. Before answering any of parts a.-b., remove 0000 observations which will be used for an out-of-sample comparison in part c. Ideally, these would be the same observations left out when you answered problem 2 on Problem Set 5.

2 a. Estimate the model E[default x,..., x 2 3] = p K (x,..., x 2 3) β for x,..., x 2 3 defined in CreditCardDefault.des by lasso with penalty parameter chosen by cross-validation. Carefully explain how you construct the dictionary of approximating functions p K (x,..., x 2 3)? b. Estimate the model E[default x,..., x 2 3] = Λ(p K (x,..., x 2 3) β) for x,..., x 2 3 defined in CreditCardDefault.des and Λ( ) the logistic cdf by l -penalized logistic regression with penalty parameter chosen by cross-validation. c. Use the 0000 observations you held out to compare the estimates obtained in parts a.-b. and the estimates obtained in problem 2 on Problem set 5. Calculate the mean square forecast error as 0000 i hold out (ĝ j(x i ) y i ) 2 and the misclassification rate 0000 i hold out (ŷ j,i y i ) where ŷ j,i is the Bayes-classifier based on model j - i.e. ŷ j,i = (ĝ j (x i ).5), and ĝ j (x i ) are the fitted values obtained from each of the competing models. Which procedure performs best according to each metric? Do the performance discrepancies seem large? [Note: Assuming independent sampling, you can compute a standard error for the mean square forecast error and for the misclassification rate conditioning on the estimated model.] 3. [Post-selection inference example] Consider a linear regression model Y i = β X,i + β 2,n X 2,i + ε i X,i = π X X 2,i + v i where (ε i, v i ) N(0, diag(σ 2, κ 2 )) are iid across i and independent of the n 2 design matrix X and the second equation simply parameterizes the covariance between X and X 2. Suppose that the parameter of interest is β and that you are unsure of whether X 2 should be included in the model (i.e. you are unsure about whether β 2,n = 0). Let β and β 2 be the conventional OLS estimators of β and β 2,n obtained by regressing Y on X and X 2, and let and denote the corresponding s β s β2 standard error estimators. Let ˇβ be the conventional OLS estimator of β obtained by regressing Y on X (i.e. excluding X 2 ), and let s ˇβ be the corresponding standard error estimator. a. Under the assumption that β 2,n = 0, show that ˇβ is consistent, asymptotically normal, and has variance less than or equal to that of β. 2

3 b. Let = β 2 t β2 s. Show that > c β2 Pr( t β2 n ) for c n = log(n)t n 2 (.975) 2 log(n) when β 2,n = δ with δ > 0 where t n 2 denotes the cdf of t n 2 random variable. Similarly show that Pr( t β2 c n ) when β 2,n = 0. c. Consider the estimator β = ( t β2 > c n ) β + ( t β2 c n ) ˇβ. Derive the asymptotic properties of β when (i) β 2,n = δ with δ > 0 and when (ii) β 2,n = 0. Show that t β = β β ( t β2 >c n)s β +( t β2 c n)s ˇβ d N(0, ) when the null hypothesis H 0 : β = β is true when (i) β 2,n = δ with δ > 0 and when (ii) β 2,n = 0. Conclude that β is as efficient as the oracle estimator that knows whether β 2,n = 0 despite having to learn β 2,n from the data. d. Now consider a sequence of models where β 2,n = ban n for some b with b > 0 and a n > 0 with a n and an log(n) 0. Show that β is consistent. Show that n( β β ) where β is the true value of β when E[X X 2 ] 0 (when π X 0). Explain in words what this sequence of models captures and why this is an appropriate thought experiment for understanding the finite-sample properties of the estimator β. What do the results thus far suggest about the desirability of using β in finite samples? e. Note that the moment condition underlying the definition of β is E[(Y X β X 2 β 2 )X ] = 0. Show that this moment condition is satisfied at the true values of β and β 2. Show that this moment condition does not have the orthogonality property discussed in Section 7 of the notes in that the derivative with respect to β 2 evaluated at the true parameter values is not 0. f. Let π Y denote the least squares coefficient obtained from regressing Y on X 2 with associated standard error estimator s πy and t-statistic for testing π Y = 0 of t Y = π Y s πy. Similarly, let π X denote the least squares coefficient obtained from regressing X on X 2 with associated standard error estimator s πx and t-statistic for testing π X = 0 of t X = π X s πx. Define a new estimator β = (( t Y > c n ) or ( t X > c n )) β + (( t Y c n ) and ( t X c n )) ˇβ. Show that n( β β ) d N(0, V ) when β 2,n = 0, β 2,n = δ, or β 2,n = anb n regardless of the value of E[X X 2 ] (and the sequence a n ). Suggest a consistent estimator of V. g. Note that the moment condition underlying the definition of β is E[((Y E[Y X 2 ]) (X E[Y X 2 ])β )(X E[Y X 2 ])] = 0 where E[Y X 2 ] = π Y X 2 and E[X X 2 ] = π X X 2. Show 3

4 that this moment condition is satisfied at the true values of β, π Y, and π X. Show that this moment condition has the orthogonality property discussed in Section 7 of the notes in that the derivative with respect to the nuisance parameters π Y and π X evaluated at the true parameter values is 0. h. Design a simulation experiment to illustrate the potential consequences of the lack of uniformity of the estimator β on inference. Specifically, you should be able to design a simulation experiment where the distribution of β is strongly bimodal and size of tests based on t β is far from the nominal level. Show the robustness of β within this design in that the normal approximation provides a sensible approximation to the distribution of β across simulation replications and tests based on the t-statistic formed using β and the suggested estimator of V from part (f) have approximately correct size. [A test is not uniform with respect to a class of models if there are sequences of models within this class that lead to the test being size distorted even in large samples. Uniformity of inference with respect to sensible sequences of models is very important in practice as well-designed sequences are much better able to capture actual finite-sample performance of estimators.] 4. Recall that a doubly robust estimator of an average treatment effect is given by ÂTE robust = [ Di (Y i ĝ (X i )) ( D ] i)(y i ĝ 0 (X i )) + ĝ (X i ) ĝ 0 (X i ) n ê(x i ) ê(x i ) = n i= ψ i. i= Belloni, Chernozhukov, and Hansen (204) show that n(âte robust AT E) d N(0, V ) where V = n n i= ( ψ i ÂTE robust ) 2 p V when l -penalized estimation is used to form ĝ ( ), ĝ 0 ( ), and ê( ) under regularity conditions including having iid data and the assumption that these functions are approximately sparse. By approximately sparse, we mean that g (X i ) = X i β + r, g 0 (X i ) = X i β 0 + r 0, and e(x i ) = Λ(X i γ) + r e with max{ β 0, β 0 0, γ 0 } s where r, r 0, and r e are approximation errors that satisfy max{e[r 2 ], E[r2 0 ], E[r2 e]} = O(s/n) and s2 log(p) 3 n 0. Consider the data in restatw.dat which contains the data used in the 40(k) example in the first two lectures. Use e40 (eligibility for a 40(k) plan) as the treatment variable and net tf a (net total financial assets as the dependent variable). The argument for exogeneity of a 40(k) plan 4

5 relies on conditioning on characteristics that might be associated to a person s decision to take a job and saving preferences. Potential control variables are age, inc (income), f size (family size), educ (years of education), marr (marital status), male, twoearn (part of a two-earner household), db (has a defined benefit pension), and pira (has an IRA). a. Construct an approximating dictionary to use in l -penalized estimation using the control variables above. Carefully explain your choices in choosing what functions to put in the dictionary. Explain intuitively what the sparsity assumption requires in terms of this example and within the context of the dictionary you have chosen. Does it seem plausible that this assumption would be satisfied? b. Estimate g ( ) and g 0 ( ) using lasso with penalty weights that are appropriate under heteroscedasticity and penalty parameter λ = 2.2 nφ ( (./ log(n))/2p) where p is the number of elements in your dictionary and n is the appropriate sample size. (Note that n will not be the same for estimating g 0 and g. Note that the λ given is appropriate for solving β P arg min b n (y i x ib) 2 + λ n i= i= p ˆφ j b j. Some lasso implementations use different scalings such as β P arg min (y i x b 2n ib) 2 + λ p ˆφ j b j which would require alteration of the penalty parameter so that you are solving the same problem.) Which variables are selected to approximate each function? Do these variables make sense? Explain. Should we conclude that the selected variables are the true variables in the sense that we have captured the correct model for E[Y X] and E[Y 0 X]? Explain. c. Estimate e( ) using l -penalized logistic regression with λ =. nφ ( (./ log(n))/2p) where p is the number of elements in your dictionary and n is the appropriate sample size. (Be careful about scaling again. This λ is appropriate for solving γ arg min log-likelihood(y i, x i, g) + λ p ˆ gj. g n n i= j= j= j= 5

6 If the l -penalized logistic regression function you are using uses a different scaling, you will need to adjust λ appropriately.) Which variables are selected? Do these variables make sense? Explain. Should we conclude that the selected variables are the true variables in the sense that we have captured the correct model for E[D X]? Explain. d. Take the selected variables for estimating g and estimate ĝ (X) by unpenalized least squares regression of Y on these selected variables in the subsample of observations with D =. Take the selected variables for estimating g 0 and estimate ĝ 0 (X) by unpenalized least squares regression of Y on these selected variables in the subsample of observations with D = 0. Take the selected variables for estimating e(x) and estimate ê(x). Form fitted values for g, g 0, and e for each observation in the data. Using these fitted values obtain ÂTE robust and estimate it s standard error. e. Estimate g ( ) and g 0 ( ) using lasso with penalty weights that are appropriate under homoscedasticity and with penalty parameter chosen by cross-validation. Which variables are selected to approximate each function? Do these variables make sense? Explain. Do these results differ appreciably from those in part b.? Should we conclude that the selected variables are the true variables in the sense that we have captured the correct model for E[Y X] and E[Y 0 X]? Explain. f. Estimate e( ) using l -penalized logistic regression with λ chosen by cross-validation. Which variables are selected? Do these variables make sense? Explain. Do these results differ appreciably from those in part c.? Should we conclude that the selected variables are the true variables in the sense that we have captured the correct model for E[D X]? Explain. g. Take the estimated models from e. and f. for g 0, g and e (i.e. just use the coefficient estimates that come directly out of the estimation) to form fitted values for g, g 0, and e for each observation in the data. Using these fitted values obtain ÂTE robust and estimate it s standard error. Are these results appreciably different from those obtained in part d.? Explain the significance of the similarity or difference. 6

Program Evaluation with High-Dimensional Data

Program Evaluation with High-Dimensional Data Program Evaluation with High-Dimensional Data Alexandre Belloni Duke Victor Chernozhukov MIT Iván Fernández-Val BU Christian Hansen Booth ESWC 215 August 17, 215 Introduction Goal is to perform inference

More information

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them. Sample Problems 1. True or False Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them. (a) The sample average of estimated residuals

More information

The risk of machine learning

The risk of machine learning / 33 The risk of machine learning Alberto Abadie Maximilian Kasy July 27, 27 2 / 33 Two key features of machine learning procedures Regularization / shrinkage: Improve prediction or estimation performance

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is Example The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is log p 1 p = β 0 + β 1 f 1 (y 1 ) +... + β d f d (y d ).

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

y Xw 2 2 y Xw λ w 2 2

y Xw 2 2 y Xw λ w 2 2 CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Potential Outcomes Model (POM)

Potential Outcomes Model (POM) Potential Outcomes Model (POM) Relationship Between Counterfactual States Causality Empirical Strategies in Labor Economics, Angrist Krueger (1999): The most challenging empirical questions in economics

More information

Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)

Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo) Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: 1304.0282 Victor MIT, Economics + Center for Statistics Co-authors: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)

More information



More information

Ultra High Dimensional Variable Selection with Endogenous Variables

Ultra High Dimensional Variable Selection with Endogenous Variables 1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High

More information

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Estimation - Theory Department of Economics University of Gothenburg December 4, 2014 1/28 Why IV estimation? So far, in OLS, we assumed independence.

More information

A General Framework for High-Dimensional Inference and Multiple Testing

A General Framework for High-Dimensional Inference and Multiple Testing A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional

More information

What s New in Econometrics. Lecture 1

What s New in Econometrics. Lecture 1 What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and

More information

Economics 583: Econometric Theory I A Primer on Asymptotics: Hypothesis Testing

Economics 583: Econometric Theory I A Primer on Asymptotics: Hypothesis Testing Economics 583: Econometric Theory I A Primer on Asymptotics: Hypothesis Testing Eric Zivot October 12, 2011 Hypothesis Testing 1. Specify hypothesis to be tested H 0 : null hypothesis versus. H 1 : alternative

More information

Exercise sheet 6 Models with endogenous explanatory variables

Exercise sheet 6 Models with endogenous explanatory variables Exercise sheet 6 Models with endogenous explanatory variables Note: Some of the exercises include estimations and references to the data files. Use these to compare them to the results you obtained with

More information

cxx Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c

cxx Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c Warm up D p IExrL9CxsYD Sglx.Ddl f E Luo dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Propensity Score Analysis with Hierarchical Data

Propensity Score Analysis with Hierarchical Data Propensity Score Analysis with Hierarchical Data Fan Li Alan Zaslavsky Mary Beth Landrum Department of Health Care Policy Harvard Medical School May 19, 2008 Introduction Population-based observational

More information

Introduction to Logistic Regression

Introduction to Logistic Regression Introduction to Logistic Regression Problem & Data Overview Primary Research Questions: 1. What are the risk factors associated with CHD? Regression Questions: 1. What is Y? 2. What is X? Did player develop

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information



More information

Econometrics - 30C00200

Econometrics - 30C00200 Econometrics - 30C00200 Lecture 11: Heteroskedasticity Antti Saastamoinen VATT Institute for Economic Research Fall 2015 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business

More information

High Dimensional Sparse Econometric Models: An Introduction

High Dimensional Sparse Econometric Models: An Introduction High Dimensional Sparse Econometric Models: An Introduction Alexandre Belloni and Victor Chernozhukov Abstract In this chapter we discuss conceptually high dimensional sparse econometric models as well

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Linear and Logistic Regression. Dr. Xiaowei Huang

Linear and Logistic Regression. Dr. Xiaowei Huang Linear and Logistic Regression Dr. Xiaowei Huang Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics

More information

A Course in Applied Econometrics. Lecture 2 Outline. Estimation of Average Treatment Effects. Under Unconfoundedness, Part II

A Course in Applied Econometrics. Lecture 2 Outline. Estimation of Average Treatment Effects. Under Unconfoundedness, Part II A Course in Applied Econometrics Lecture Outline Estimation of Average Treatment Effects Under Unconfoundedness, Part II. Assessing Unconfoundedness (not testable). Overlap. Illustration based on Lalonde

More information

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory

More information

CHAPTER 7. + ˆ δ. (1 nopc) + ˆ β1. =.157, so the new intercept is = The coefficient on nopc is.157.

CHAPTER 7. + ˆ δ. (1 nopc) + ˆ β1. =.157, so the new intercept is = The coefficient on nopc is.157. CHAPTER 7 SOLUTIONS TO PROBLEMS 7. (i) The coefficient on male is 87.75, so a man is estimated to sleep almost one and one-half hours more per week than a comparable woman. Further, t male = 87.75/34.33

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

Honest confidence regions for a regression parameter in logistic regression with a large number of controls

Honest confidence regions for a regression parameter in logistic regression with a large number of controls Honest confidence regions for a regression parameter in logistic regression with a large number of controls Alexandre Belloni Victor Chernozhukov Ying Wei The Institute for Fiscal Studies Department of

More information

Finding Relationships Among Variables

Finding Relationships Among Variables Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Slide Set 14 Inference Basded on the GMM Estimator. Econometrics Master in Economics and Finance (MEF) Università degli Studi di Napoli Federico II

Slide Set 14 Inference Basded on the GMM Estimator. Econometrics Master in Economics and Finance (MEF) Università degli Studi di Napoli Federico II Slide Set 14 Inference Basded on the GMM Estimator Pietro Coretto Econometrics Master in Economics and Finance (MEF) Università degli Studi di Napoli Federico II Version: Saturday 9 th

More information


DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Habilitationsvortrag: Machine learning, shrinkage estimation, and economic theory

Habilitationsvortrag: Machine learning, shrinkage estimation, and economic theory Habilitationsvortrag: Machine learning, shrinkage estimation, and economic theory Maximilian Kasy May 25, 218 1 / 27 Introduction Recent years saw a boom of machine learning methods. Impressive advances

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Arthur Lewbel Boston College December 2016 Abstract Lewbel (2012) provides an estimator

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: Naïve Bayes

More information

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do

More information

Testing Linear Restrictions: cont.

Testing Linear Restrictions: cont. Testing Linear Restrictions: cont. The F-statistic is closely connected with the R of the regression. In fact, if we are testing q linear restriction, can write the F-stastic as F = (R u R r)=q ( R u)=(n

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

high-dimensional inference robust to the lack of model sparsity

high-dimensional inference robust to the lack of model sparsity high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) Assistant Professor Department of Mathematics University of California,

More information

Machine learning, shrinkage estimation, and economic theory

Machine learning, shrinkage estimation, and economic theory Machine learning, shrinkage estimation, and economic theory Maximilian Kasy December 14, 2018 1 / 43 Introduction Recent years saw a boom of machine learning methods. Impressive advances in domains such

More information

Mostly Dangerous Econometrics: How to do Model Selection with Inference in Mind

Mostly Dangerous Econometrics: How to do Model Selection with Inference in Mind Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Settings Bonus Track: Genaralizations Econometrics: How to do Model Selection with Inference in Mind June 25, 2015,

More information


DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving

More information

More on Roy Model of Self-Selection

More on Roy Model of Self-Selection V. J. Hotz Rev. May 26, 2007 More on Roy Model of Self-Selection Results drawn on Heckman and Sedlacek JPE, 1985 and Heckman and Honoré, Econometrica, 1986. Two-sector model in which: Agents are income

More information

Answer Key: Problem Set 5

Answer Key: Problem Set 5 : Problem Set 5. Let nopc be a dummy variable equal to one if the student does not own a PC, and zero otherwise. i. If nopc is used instead of PC in the model of: colgpa = β + δ PC + β hsgpa + β ACT +

More information

arxiv: v1 [] 30 Jan 2017

arxiv: v1 [] 30 Jan 2017 Double/Debiased/Neyman Machine Learning of Treatment Effects by Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, and Whitney Newey arxiv:1701.08687v1 [] 30 Jan

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE Chapter 6. Panel Data Joan Llull Quantitative Statistical Methods II Barcelona GSE Introduction Chapter 6. Panel Data 2 Panel data The term panel data refers to data sets with repeated observations over

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria SOLUTION TO FINAL EXAM Friday, April 12, 2013. From 9:00-12:00 (3 hours) INSTRUCTIONS:

More information

Regression: Ordinary Least Squares

Regression: Ordinary Least Squares Regression: Ordinary Least Squares Mark Hendricks Autumn 2017 FINM Intro: Regression Outline Regression OLS Mathematics Linear Projection Hendricks, Autumn 2017 FINM Intro: Regression: Lecture 2/32 Regression

More information

Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs

Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs Ammon Washburn University of Arizona September 25, 2015 1 / 28 Introduction We will begin with basic Support Vector Machines (SVMs)

More information



More information

ECON 4160, Autumn term Lecture 1

ECON 4160, Autumn term Lecture 1 ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least

More information

Selective Inference for Effect Modification

Selective Inference for Effect Modification Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

The Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection

The Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection The Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection Daniel Coutinho Pedro Souza (Orientador) Marcelo Medeiros (Co-orientador) November 30, 2017 Daniel Martins Coutinho

More information

GMM - Generalized method of moments

GMM - Generalized method of moments GMM - Generalized method of moments GMM Intuition: Matching moments You want to estimate properties of a data set {x t } T t=1. You assume that x t has a constant mean and variance. x t (µ 0, σ 2 ) Consider

More information

Dealing With Endogeneity

Dealing With Endogeneity Dealing With Endogeneity Junhui Qian December 22, 2014 Outline Introduction Instrumental Variable Instrumental Variable Estimation Two-Stage Least Square Estimation Panel Data Endogeneity in Econometrics

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

A Measure of Robustness to Misspecification

A Measure of Robustness to Misspecification A Measure of Robustness to Misspecification Susan Athey Guido W. Imbens December 2014 Graduate School of Business, Stanford University, and NBER. Electronic correspondence: Graduate

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This

More information

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This

More information

Achieving Optimal Covariate Balance Under General Treatment Regimes

Achieving Optimal Covariate Balance Under General Treatment Regimes Achieving Under General Treatment Regimes Marc Ratkovic Princeton University May 24, 2012 Motivation For many questions of interest in the social sciences, experiments are not possible Possible bias in

More information

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation Bivariate Regression & Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate Linear Regression Line SPSS Output Interpretation Covariance ou already

More information

GARCH Models Estimation and Inference

GARCH Models Estimation and Inference GARCH Models Estimation and Inference Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 1 Likelihood function The procedure most often used in estimating θ 0 in

More information

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Multiple Regression. Peerapat Wongchaiwat, Ph.D. Peerapat Wongchaiwat, Ph.D. The Multiple Regression Model Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model

More information

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Arthur Lewbel Boston College Original December 2016, revised July 2017 Abstract Lewbel (2012)

More information

What s New in Econometrics? Lecture 14 Quantile Methods

What s New in Econometrics? Lecture 14 Quantile Methods What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression

More information

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators Thilo Klein University of Cambridge Judge Business School Session 4: Linear regression,

More information

Comprehensive Examination Quantitative Methods Spring, 2018

Comprehensive Examination Quantitative Methods Spring, 2018 Comprehensive Examination Quantitative Methods Spring, 2018 Instruction: This exam consists of three parts. You are required to answer all the questions in all the parts. 1 Grading policy: 1. Each part

More information

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables

More information

arxiv: v3 [] 9 May 2012

arxiv: v3 [] 9 May 2012 INFERENCE ON TREATMENT EFFECTS AFTER SELECTION AMONGST HIGH-DIMENSIONAL CONTROLS A. BELLONI, V. CHERNOZHUKOV, AND C. HANSEN arxiv:121.224v3 [] 9 May 212 Abstract. We propose robust methods for inference

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

y(x) = x w + ε(x), (1)

y(x) = x w + ε(x), (1) Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 Course objectives Making

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University

More information

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid Applied Economics Regression with a Binary Dependent Variable Department of Economics Universidad Carlos III de Madrid See Stock and Watson (chapter 11) 1 / 28 Binary Dependent Variables: What is Different?

More information

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting

More information

Next, we discuss econometric methods that can be used to estimate panel data models.

Next, we discuss econometric methods that can be used to estimate panel data models. 1 Motivation Next, we discuss econometric methods that can be used to estimate panel data models. Panel data is a repeated observation of the same cross section Panel data is highly desirable when it is

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Stat 602 Exam 1 Spring 2017 (corrected version)

Stat 602 Exam 1 Spring 2017 (corrected version) Stat 602 Exam Spring 207 (corrected version) I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed This is a very long Exam. You surely won't be able to

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information