Cross-Sectional Regression after Factor Analysis: Two Applications

Size: px
Start display at page:

Download "Cross-Sectional Regression after Factor Analysis: Two Applications"

Transcription

1 al Regression after Factor Analysis: Two Applications Joint work with Jingshu, Trevor, Art; Yang Song (GSB) May 7, 2016

2 Overview / 27

3 Outline / 27

4 Data matrix Y R n p Panel data. Transposable data. Modern datasets: usually high dimensional (both n, p 1). Two examples: 1 Gene expressions (row: cell; column: gene). 2 Mutual fund monthly returns (row: month; column: fund). 3 / 27

5 Gene discovery Which genes are associated with a treatment/condition? Let X R n 1 be the treatment vector. Simple linear regression: col j (Y) = α j X + ɛ j. Equivalently, Y n p = X n 1 α T p 1 + ɛ n p. Statistical significance of α j, multiple testing... 4 / 27

6 Mutual fund selection How skillful is a mutual fund manager? Let Z R n d be the well-known systemic risk factors. The Fama-French-Carhart four factor model: 1 Market Minus Risk Free; 2 Small [market capitalization] Minus Big; 3 High [book-to-market ratio] Minus Low; 4 Momentum. Simple linear regression for mutual fund j: col j (Y) = α j + β T j Z + ɛ j. Equivalently, Y n p = 1 n 1 α T p 1 + Z n d β T p d + ɛ n p. α j is usually regarded as the skill of manager j. 5 / 27

7 A common model In both examples, we can model the data matrix by Y n p = X n 1 α T p 1 + Z n d β T p d + ɛ. α is the parameter of interest. β is nuisance (not always included). ɛ is noise, assumed Gaussian and column-independent. In genomics testing, X is treatment and Z is other factors affecting Y. In mutual fund selection, X is intercept and Z contains the systemic risk factors. Standard statistical method: linear regression for each column. 6 / 27

8 Unmeasured variables Not all the adjustment covariates Z are always measured. In the biology example, Z can be gender, age, microarray platform, batch,... In the finance example, Z can be other systemic risk factors (hundreds are documented). 7 / 27

9 Is this a problem? Y n p = X n 1 α T p 1 + Z n d β T p d + ɛ. NO if Z X (α is unconfounded). YES if Z and X are dependent (α is confounded). 8 / 27

10 Unconfounded case Y n p = X n 1 α T p 1 + Z n d β T p d + ɛ. The least squares estimator ˆα is still unbiased, but dependent. Troublesome for multiple testing (FDR control) if the latent variables are ignored. Solution: estimate Z and β by factor analysis which give the dependency structure. 9 / 27

11 Confounded case Y n p = X n 1 α T p 1 + Z n d βp d T + ɛ. The least squares estimator ˆα is biased (by how much?). Assume Z n d = X n 1 γd 1 T + W and W X, then Y n p = X n 1 τ T p 1 + W n d β T p d + ɛ, where τ = α + βγ. The OLS estimator ˆα is unbiased for τ (the marginal effects), but not α. 10 / 27

12 Factor analysis Y n p = X n 1 α T p 1 + Z n d β T p d + ɛ = X n 1 τ T p 1 + W n d β T p d + ɛ β can be estimated from factor analysis: 1 Regress X out of Y; 2 Run factor analysis (e.g. PCA) on the residual matrix. 11 / 27

13 Cross-sectional regression Back to the decomposition of marginal effects: τ p 1 = α p 1 + β p d γ d 1. Now we have good estimate of τ and β, can we estimate α from this formula? There are p + d parameters but p equations, so NO...? Need additional assumptions for identifiability, like sparsity. Proposition: if α 0 (p d)/2 and β is good, then α is identifiable. Regress ˆτ on ˆβ with robust loss function (sparsity penalty on α). 12 / 27

14 Does sparsity make sense? Not always. Reasonable in our examples: 1 Most genes are most likely unrelated to the treatment. 2 Most mutual funds have no skill by economic game theory [Berk and Green, 2004]. 13 / 27

15 Entire procedure Three steps: Y n p = X n 1 α T p 1 + Z n d β T p d + ɛ. Row regression/regular regression/time-series regression/longitudinal regression... Factor analysis on residuals. Column regression/cross-sectional regression. 14 / 27

16 Outline / 27

17 A biology example: COPD study COPD = chronic obstructive pulmonary disease. Singh et al. [2011] tried to find genes associated with the severity of COPD (moderate or severe) N(0.024,2.6^2) density t statistics Distribution of t-statistics: overdispersed and skewed. 16 / 27

18 COPD data: severity as primary variable N(0, 1) 0.3 N(0, 1) density 0.2 density t statistics (a) Naive linear regression t statistics (b) After adjustment. ˆd = 1 [Onatski, 2010]. ˆγ 0.98, confounded variance of X is approximately 22%. Test of confounding: p-value / 27

19 COPD data: gender as primary variable Genes associated with gender should come from X /Y chromosomes (positive controls) N(0, 1) 0.3 N(0, 1) density 0.2 density t statistics (a) Naive linear regression t statistics (b) After adjustment. ˆγ 0.27, variance explained is approximately 3%. Test of confounding: p-value / 27

20 COPD data: gender as primary variable Can we control FDR? FDP LEAPP(RR) Naive Limma SVA Nominal FDR 19 / 27

21 Outline / 27

22 Mutual fund selection Two definitions of mutual fund skill: 1 The α in Capital Asset Pricing Model (CAPM) which uses just one market factor; 2 The α adjusted for known and unknown factors. I will call it α. Surprisingly, finance researchers find that most investors are chasing the CAPM-α, a Nobel prize winner but was introduced 50 years ago. 21 / 27

23 A simulation experiment At the beginning of every year from 1996 to 2014, find all the mutual funds that exist in the last 5 years. Estimate their CAPM-α and α using the 5 year data. Form decile groups based on the estimated α and α, compare their monthly returns in the next year. Note: I m actually using the Treynor index α sd(γ T j Z). 22 / 27

24 Top 10% Funds ER SR CAPM-α FFC-α AUM Monthly Flow α \ α (-1.96) (-2.06) α α (-0.69) (-1.00) α \ α (2.41) (1.97) Table : Performance of the top funds. ER is excessive return, SR is Sharpe ratio (µ/σ), AUM is asset under management. 23 / 27

25 1.5 cumulative log return strategy all funds α only α ~ only α and α ~ time Figure : Cumulative log-return. 24 / 27

26 15% 9 largest α largest treynor(α ~ ) average monthly return * 12 10% 5% 0% percentile 0 (0, 10] 1 (10, 20] 2 (20, 30] 3 (30, 40] 4 (40, 50] 5 (50, 60] 6 (60, 70] 7 (70, 80] 8 (80, 90] 9 (90, 100] year Figure : Return in the next 5 years of 10 deciles. 25 / 27

27 Outline / 27

28 Confounding is a common problem across domains. Sometimes it s helpful to think rows and columns in a similar way. Be wise when investing. 27 / 27

Confounder Adjustment in Multiple Hypothesis Testing

Confounder Adjustment in Multiple Hypothesis Testing in Multiple Hypothesis Testing Department of Statistics, Stanford University January 28, 2016 Slides are available at http://web.stanford.edu/~qyzhao/. Collaborators Jingshu Wang Trevor Hastie Art Owen

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Inference on Risk Premia in the Presence of Omitted Factors

Inference on Risk Premia in the Presence of Omitted Factors Inference on Risk Premia in the Presence of Omitted Factors Stefano Giglio Dacheng Xiu Booth School of Business, University of Chicago Center for Financial and Risk Analytics Stanford University May 19,

More information

CONFOUNDER ADJUSTMENT IN MULTIPLE HYPOTHESIS TESTING

CONFOUNDER ADJUSTMENT IN MULTIPLE HYPOTHESIS TESTING Submitted to the Annals of Statistics CONFOUNDER ADJUSTMENT IN MULTIPLE HYPOTHESIS TESTING By Jingshu Wang, Qingyuan Zhao, Trevor Hastie, Art B. Owen Stanford University We consider large-scale studies

More information

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory

More information

Factor Models for Asset Returns. Prof. Daniel P. Palomar

Factor Models for Asset Returns. Prof. Daniel P. Palomar Factor Models for Asset Returns Prof. Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) MAFS6010R- Portfolio Optimization with R MSc in Financial Mathematics Fall 2018-19, HKUST,

More information

Financial Econometrics Lecture 6: Testing the CAPM model

Financial Econometrics Lecture 6: Testing the CAPM model Financial Econometrics Lecture 6: Testing the CAPM model Richard G. Pierse 1 Introduction The capital asset pricing model has some strong implications which are testable. The restrictions that can be tested

More information

Linear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments

Linear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments by Gordon K. Smyth (as interpreted by Aaron J. Baraff) STAT 572 Intro Talk April 10, 2014 Microarray

More information

Identifying Financial Risk Factors

Identifying Financial Risk Factors Identifying Financial Risk Factors with a Low-Rank Sparse Decomposition Lisa Goldberg Alex Shkolnik Berkeley Columbia Meeting in Engineering and Statistics 24 March 2016 Outline 1 A Brief History of Factor

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology FE670 Algorithmic Trading Strategies Lecture 8. Robust Portfolio Optimization Steve Yang Stevens Institute of Technology 10/17/2013 Outline 1 Robust Mean-Variance Formulations 2 Uncertain in Expected Return

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

Equity risk factors and the Intertemporal CAPM

Equity risk factors and the Intertemporal CAPM Equity risk factors and the Intertemporal CAPM Ilan Cooper 1 Paulo Maio 2 1 Norwegian Business School (BI) 2 Hanken School of Economics BEROC Conference, Minsk Outline 1 Motivation 2 Cross-sectional tests

More information

Linear Factor Models and the Estimation of Expected Returns

Linear Factor Models and the Estimation of Expected Returns Linear Factor Models and the Estimation of Expected Returns Cisil Sarisoy a,, Peter de Goeij b, Bas J.M. Werker c a Department of Finance, CentER, Tilburg University b Department of Finance, Tilburg University

More information

Specification Errors, Measurement Errors, Confounding

Specification Errors, Measurement Errors, Confounding Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

arxiv: v2 [stat.me] 1 Mar 2019

arxiv: v2 [stat.me] 1 Mar 2019 A Factor-Adjusted Multiple Testing Procedure with Application to Mutual Fund Selection Wei Lan and Lilun Du arxiv:1407.5515v2 [stat.me] 1 Mar 2019 Southwestern University of Finance and Economics, and

More information

Statistical inference in Mendelian randomization: From genetic association to epidemiological causation

Statistical inference in Mendelian randomization: From genetic association to epidemiological causation Statistical inference in Mendelian randomization: From genetic association to epidemiological causation Department of Statistics, The Wharton School, University of Pennsylvania March 1st, 2018 @ UMN Based

More information

Multivariate Tests of the CAPM under Normality

Multivariate Tests of the CAPM under Normality Multivariate Tests of the CAPM under Normality Bernt Arne Ødegaard 6 June 018 Contents 1 Multivariate Tests of the CAPM 1 The Gibbons (198) paper, how to formulate the multivariate model 1 3 Multivariate

More information

Regression: Ordinary Least Squares

Regression: Ordinary Least Squares Regression: Ordinary Least Squares Mark Hendricks Autumn 2017 FINM Intro: Regression Outline Regression OLS Mathematics Linear Projection Hendricks, Autumn 2017 FINM Intro: Regression: Lecture 2/32 Regression

More information

Introduction to Computational Finance and Financial Econometrics Probability Theory Review: Part 2

Introduction to Computational Finance and Financial Econometrics Probability Theory Review: Part 2 Introduction to Computational Finance and Financial Econometrics Probability Theory Review: Part 2 Eric Zivot July 7, 2014 Bivariate Probability Distribution Example - Two discrete rv s and Bivariate pdf

More information

ZHAW Zurich University of Applied Sciences. Bachelor s Thesis Estimating Multi-Beta Pricing Models With or Without an Intercept:

ZHAW Zurich University of Applied Sciences. Bachelor s Thesis Estimating Multi-Beta Pricing Models With or Without an Intercept: ZHAW Zurich University of Applied Sciences School of Management and Law Bachelor s Thesis Estimating Multi-Beta Pricing Models With or Without an Intercept: Further Results from Simulations Submitted by:

More information

Econ671 Factor Models: Principal Components

Econ671 Factor Models: Principal Components Econ671 Factor Models: Principal Components Jun YU April 8, 2016 Jun YU () Econ671 Factor Models: Principal Components April 8, 2016 1 / 59 Factor Models: Principal Components Learning Objectives 1. Show

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

GMM - Generalized method of moments

GMM - Generalized method of moments GMM - Generalized method of moments GMM Intuition: Matching moments You want to estimate properties of a data set {x t } T t=1. You assume that x t has a constant mean and variance. x t (µ 0, σ 2 ) Consider

More information

Model Mis-specification

Model Mis-specification Model Mis-specification Carlo Favero Favero () Model Mis-specification 1 / 28 Model Mis-specification Each specification can be interpreted of the result of a reduction process, what happens if the reduction

More information

Modern Portfolio Theory with Homogeneous Risk Measures

Modern Portfolio Theory with Homogeneous Risk Measures Modern Portfolio Theory with Homogeneous Risk Measures Dirk Tasche Zentrum Mathematik Technische Universität München http://www.ma.tum.de/stat/ Rotterdam February 8, 2001 Abstract The Modern Portfolio

More information

Homogeneity Pursuit. Jianqing Fan

Homogeneity Pursuit. Jianqing Fan Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles

More information

Circling the Square: Experiments in Regression

Circling the Square: Experiments in Regression Circling the Square: Experiments in Regression R. D. Coleman [unaffiliated] This document is excerpted from the research paper entitled Critique of Asset Pricing Circularity by Robert D. Coleman dated

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

3 Comparison with Other Dummy Variable Methods

3 Comparison with Other Dummy Variable Methods Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction

More information

Notes on empirical methods

Notes on empirical methods Notes on empirical methods Statistics of time series and cross sectional regressions 1. Time Series Regression (Fama-French). (a) Method: Run and interpret (b) Estimates: 1. ˆα, ˆβ : OLS TS regression.

More information

R = µ + Bf Arbitrage Pricing Model, APM

R = µ + Bf Arbitrage Pricing Model, APM 4.2 Arbitrage Pricing Model, APM Empirical evidence indicates that the CAPM beta does not completely explain the cross section of expected asset returns. This suggests that additional factors may be required.

More information

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 GLS and FGLS Econ 671 Purdue University Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 In this lecture we continue to discuss properties associated with the GLS estimator. In addition we discuss the practical

More information

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 4 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 23 Recommended Reading For the today Serial correlation and heteroskedasticity in

More information

Network Connectivity and Systematic Risk

Network Connectivity and Systematic Risk Network Connectivity and Systematic Risk Monica Billio 1 Massimiliano Caporin 2 Roberto Panzica 3 Loriana Pelizzon 1,3 1 University Ca Foscari Venezia (Italy) 2 University of Padova (Italy) 3 Goethe University

More information

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed Laurent Jacob 1 laurent@stat.berkeley.edu Johann Gagnon-Bartsch 1 johann@berkeley.edu Terence

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Joint Probability Distributions

Joint Probability Distributions Joint Probability Distributions ST 370 In many random experiments, more than one quantity is measured, meaning that there is more than one random variable. Example: Cell phone flash unit A flash unit is

More information

Financial Econometrics Return Predictability

Financial Econometrics Return Predictability Financial Econometrics Return Predictability Eric Zivot March 30, 2011 Lecture Outline Market Efficiency The Forms of the Random Walk Hypothesis Testing the Random Walk Hypothesis Reading FMUND, chapter

More information

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test

More information

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) BIRS 016 1 HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) Malka Gorfine, Tel Aviv University, Israel Joint work with Li Hsu, FHCRC, Seattle, USA BIRS 016 The concept of heritability

More information

17 Factor Models and Principal Components

17 Factor Models and Principal Components 17 Factor Models and Principal Components 17.1 Dimension Reduction High-dimensional data can be challenging to analyze. They are difficult to visualize, need extensive computer resources, and often require

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

ORTHOGO ALIZED EQUITY RISK PREMIA SYSTEMATIC RISK DECOMPOSITIO. Rudolf F. Klein a,* and K. Victor Chow b,* Abstract

ORTHOGO ALIZED EQUITY RISK PREMIA SYSTEMATIC RISK DECOMPOSITIO. Rudolf F. Klein a,* and K. Victor Chow b,* Abstract ORTHOGO ALIZED EQUITY RIS PREMIA A D SYSTEMATIC RIS DECOMPOSITIO Rudolf F. lein a,* and. Victor Chow b,* Abstract To solve the dependency problem between factors, in the context of linear multi-factor

More information

Probabilities & Statistics Revision

Probabilities & Statistics Revision Probabilities & Statistics Revision Christopher Ting Christopher Ting http://www.mysmu.edu/faculty/christophert/ : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 January 6, 2017 Christopher Ting QF

More information

ASSET PRICING MODELS

ASSET PRICING MODELS ASSE PRICING MODELS [1] CAPM (1) Some notation: R it = (gross) return on asset i at time t. R mt = (gross) return on the market portfolio at time t. R ft = return on risk-free asset at time t. X it = R

More information

Inference with Transposable Data: Modeling the Effects of Row and Column Correlations

Inference with Transposable Data: Modeling the Effects of Row and Column Correlations Inference with Transposable Data: Modeling the Effects of Row and Column Correlations Genevera I. Allen Department of Pediatrics-Neurology, Baylor College of Medicine, Jan and Dan Duncan Neurological Research

More information

Exploring non linearities in Hedge Funds

Exploring non linearities in Hedge Funds Exploring non linearities in Hedge Funds An application of Particle Filters to Hedge Fund Replication Guillaume Weisang 1 Thierry Roncalli 2 1 Bentley University, Waltham, MA 2 Lyxor AM January 28-29,

More information

Ross (1976) introduced the Arbitrage Pricing Theory (APT) as an alternative to the CAPM.

Ross (1976) introduced the Arbitrage Pricing Theory (APT) as an alternative to the CAPM. 4.2 Arbitrage Pricing Model, APM Empirical evidence indicates that the CAPM beta does not completely explain the cross section of expected asset returns. This suggests that additional factors may be required.

More information

Econ 583 Final Exam Fall 2008

Econ 583 Final Exam Fall 2008 Econ 583 Final Exam Fall 2008 Eric Zivot December 11, 2008 Exam is due at 9:00 am in my office on Friday, December 12. 1 Maximum Likelihood Estimation and Asymptotic Theory Let X 1,...,X n be iid random

More information

Lecture 13. Simple Linear Regression

Lecture 13. Simple Linear Regression 1 / 27 Lecture 13 Simple Linear Regression October 28, 2010 2 / 27 Lesson Plan 1. Ordinary Least Squares 2. Interpretation 3 / 27 Motivation Suppose we want to approximate the value of Y with a linear

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

A Guide to Modern Econometric:

A Guide to Modern Econometric: A Guide to Modern Econometric: 4th edition Marno Verbeek Rotterdam School of Management, Erasmus University, Rotterdam B 379887 )WILEY A John Wiley & Sons, Ltd., Publication Contents Preface xiii 1 Introduction

More information

Financial Econometrics

Financial Econometrics Financial Econometrics Multivariate Time Series Analysis: VAR Gerald P. Dwyer Trinity College, Dublin January 2013 GPD (TCD) VAR 01/13 1 / 25 Structural equations Suppose have simultaneous system for supply

More information

Exam: high-dimensional data analysis February 28, 2014

Exam: high-dimensional data analysis February 28, 2014 Exam: high-dimensional data analysis February 28, 2014 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question (not the subquestions) on a separate piece of paper.

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Noise Fit, Estimation Error and a Sharpe Information Criterion: Linear Case

Noise Fit, Estimation Error and a Sharpe Information Criterion: Linear Case Noise Fit, Estimation Error and a Sharpe Information Criterion: Linear Case Dirk Paulsen 1 and Jakob Söhl 2 arxiv:1602.06186v2 [q-fin.st] 8 Sep 2017 1 John Street Capital 2 TU Delft September 8, 2017 Abstract

More information

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007) Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Christopher Ting Christopher Ting : christophert@smu.edu.sg : 688 0364 : LKCSB 5036 January 7, 017 Web Site: http://www.mysmu.edu/faculty/christophert/ Christopher Ting QF 30 Week

More information

In modern portfolio theory, which started with the seminal work of Markowitz (1952),

In modern portfolio theory, which started with the seminal work of Markowitz (1952), 1 Introduction In modern portfolio theory, which started with the seminal work of Markowitz (1952), many academic researchers have examined the relationships between the return and risk, or volatility,

More information

The Bond Pricing Implications of Rating-Based Capital Requirements. Internet Appendix. This Version: December Abstract

The Bond Pricing Implications of Rating-Based Capital Requirements. Internet Appendix. This Version: December Abstract The Bond Pricing Implications of Rating-Based Capital Requirements Internet Appendix This Version: December 2017 Abstract This Internet Appendix examines the robustness of our main results and presents

More information

ECON4515 Finance theory 1 Diderik Lund, 5 May Perold: The CAPM

ECON4515 Finance theory 1 Diderik Lund, 5 May Perold: The CAPM Perold: The CAPM Perold starts with a historical background, the development of portfolio theory and the CAPM. Points out that until 1950 there was no theory to describe the equilibrium determination of

More information

Bootstrap tests of mean-variance efficiency with multiple portfolio groupings

Bootstrap tests of mean-variance efficiency with multiple portfolio groupings Bootstrap tests of mean-variance efficiency with multiple portfolio groupings Sermin Gungor Bank of Canada Richard Luger Université Laval August 25, 2014 ABSTRACT We propose double bootstrap methods to

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work

More information

Markowitz Efficient Portfolio Frontier as Least-Norm Analytic Solution to Underdetermined Equations

Markowitz Efficient Portfolio Frontier as Least-Norm Analytic Solution to Underdetermined Equations Markowitz Efficient Portfolio Frontier as Least-Norm Analytic Solution to Underdetermined Equations Sahand Rabbani Introduction Modern portfolio theory deals in part with the efficient allocation of investments

More information

An overview of applied econometrics

An overview of applied econometrics An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical

More information

10.7 Fama and French Mutual Funds notes

10.7 Fama and French Mutual Funds notes 1.7 Fama and French Mutual Funds notes Why the Fama-French simulation works to detect skill, even without knowing the characteristics of skill. The genius of the Fama-French simulation is that it lets

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

Linear Factor Models and the Estimation of Expected Returns

Linear Factor Models and the Estimation of Expected Returns Linear Factor Models and the Estimation of Expected Returns Abstract Standard factor models in asset pricing imply a linear relationship between expected returns on assets and their exposures to one or

More information

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015 MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates

More information

TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS. BY DANIELA M. WITTEN 1 AND ROBERT TIBSHIRANI 2 Stanford University

TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS. BY DANIELA M. WITTEN 1 AND ROBERT TIBSHIRANI 2 Stanford University The Annals of Applied Statistics 2008, Vol. 2, No. 3, 986 1012 DOI: 10.1214/08-AOAS182 Institute of Mathematical Statistics, 2008 TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS BY DANIELA

More information

Information Choice in Macroeconomics and Finance.

Information Choice in Macroeconomics and Finance. Information Choice in Macroeconomics and Finance. Laura Veldkamp New York University, Stern School of Business, CEPR and NBER Spring 2009 1 Veldkamp What information consumes is rather obvious: It consumes

More information

Modeling Real Estate Data using Quantile Regression

Modeling Real Estate Data using Quantile Regression Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

1 Description of variables

1 Description of variables 1 Description of variables We have three possible instruments/state variables: dividend yield d t+1, default spread y t+1, and realized market volatility v t+1 d t is the continuously compounded 12 month

More information

Forecasting the term structure interest rate of government bond yields

Forecasting the term structure interest rate of government bond yields Forecasting the term structure interest rate of government bond yields Bachelor Thesis Econometrics & Operational Research Joost van Esch (419617) Erasmus School of Economics, Erasmus University Rotterdam

More information

Manual: R package HTSmix

Manual: R package HTSmix Manual: R package HTSmix Olga Vitek and Danni Yu May 2, 2011 1 Overview High-throughput screens (HTS) measure phenotypes of thousands of biological samples under various conditions. The phenotypes are

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Factor Investing using Penalized Principal Components

Factor Investing using Penalized Principal Components Factor Investing using Penalized Principal Components Markus Pelger Martin Lettau 2 Stanford University 2 UC Berkeley February 8th 28 AI in Fintech Forum 28 Motivation Motivation: Asset Pricing with Risk

More information

Heteroscedasticity and Autocorrelation

Heteroscedasticity and Autocorrelation Heteroscedasticity and Autocorrelation Carlo Favero Favero () Heteroscedasticity and Autocorrelation 1 / 17 Heteroscedasticity, Autocorrelation, and the GLS estimator Let us reconsider the single equation

More information

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash

More information

Ordinary Least Squares Regression

Ordinary Least Squares Regression Ordinary Least Squares Regression Goals for this unit More on notation and terminology OLS scalar versus matrix derivation Some Preliminaries In this class we will be learning to analyze Cross Section

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Causal inference in biomedical sciences: causal models involving genotypes. Mendelian randomization genes as Instrumental Variables

Causal inference in biomedical sciences: causal models involving genotypes. Mendelian randomization genes as Instrumental Variables Causal inference in biomedical sciences: causal models involving genotypes Causal models for observational data Instrumental variables estimation and Mendelian randomization Krista Fischer Estonian Genome

More information

Ch3. TRENDS. Time Series Analysis

Ch3. TRENDS. Time Series Analysis 3.1 Deterministic Versus Stochastic Trends The simulated random walk in Exhibit 2.1 shows a upward trend. However, it is caused by a strong correlation between the series at nearby time points. The true

More information

Deep Learning in Asset Pricing

Deep Learning in Asset Pricing Deep Learning in Asset Pricing Luyang Chen 1 Markus Pelger 1 Jason Zhu 1 1 Stanford University November 17th 2018 Western Mathematical Finance Conference 2018 Motivation Hype: Machine Learning in Investment

More information

STA 2201/442 Assignment 2

STA 2201/442 Assignment 2 STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

13. Parameter Estimation. ECE 830, Spring 2014

13. Parameter Estimation. ECE 830, Spring 2014 13. Parameter Estimation ECE 830, Spring 2014 1 / 18 Primary Goal General problem statement: We observe X p(x θ), θ Θ and the goal is to determine the θ that produced X. Given a collection of observations

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

11.433J / J Real Estate Economics Fall 2008

11.433J / J Real Estate Economics Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 11.433J / 15.021J Real Estate Economics Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. Recitation 3 Real

More information

Miloš Kopa. Decision problems with stochastic dominance constraints

Miloš Kopa. Decision problems with stochastic dominance constraints Decision problems with stochastic dominance constraints Motivation Portfolio selection model Mean risk models max λ Λ m(λ r) νr(λ r) or min λ Λ r(λ r) s.t. m(λ r) µ r is a random vector of assets returns

More information

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017 Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction

More information

1 Introduction to Generalized Least Squares

1 Introduction to Generalized Least Squares ECONOMICS 7344, Spring 2017 Bent E. Sørensen April 12, 2017 1 Introduction to Generalized Least Squares Consider the model Y = Xβ + ɛ, where the N K matrix of regressors X is fixed, independent of the

More information