Envelopes: Methods for Efficient Estimation in Multivariate Statistics

Size: px
Start display at page:

Download "Envelopes: Methods for Efficient Estimation in Multivariate Statistics"

Transcription

1 Envelopes: Methods for Efficient Estimation in Multivariate Statistics Dennis Cook School of Statistics University of Minnesota Collaborating at times with Bing Li, Francesca Chiaromonte, Zhihua Su, Inge Helland & Henry Zhang

2 Response envelopes in multivariate linear regression, (Cook, et al. 2010) Y i = α + βx i + ε i, i = 1,..., n Y R r : multivariate response X R p : non-stochastic predictors centered at 0 ε R r : normal errors, mean 0 and covariance Σ > 0 α R r : unknown intercept β R r p : unknown coefficients Goal: estimate β, prediction. MLE β Std of β is obtained by doing r univariate linear regressions, one for each response.

3 Rationale for envelopes Envelopes arise by asking if there are linear combinations of Y whose distribution is invariant under changes in X. Parameterize the MLM in terms of the smallest subspace E R r so that (P E = projection onto E, Q E = I P E ) Q E Y (X = x 1 ) Q E Y (X = x 2 ), (x 1, x 2 ) P E Y Q E Y X This implies that the impact of X on Y is concentrated in P E Y. We refer to P E Y and Q E Y informality as the material and immaterial parts of Y. (Li & Zhang, 2015: Generalized sparsity principle)

4 The conditions Q E Y X Q E Y and P E Y only if B := span(β) E, so E envelops B Q E Y X hold if and Σ = P E ΣP E + Q E ΣQ E, so E is a reducing subspace of Σ Formally, the intersection of all subspaces E with these properties is called the Σ-envelope of B and represented as E Σ (B) with u = dim(e Σ (B)).

5 Let the columns of the semi-orthogonal matrices Γ R r u and Γ 0 R r (r u) be bases for E Σ (B) and E Σ (B). Envelope Model: Y = α + ΓηX + ε, Σ = ΓΩΓ + Γ 0 Ω 0 Γ T 0. 0 u r. u = 0 means Y does not depend on X. u = r means all linear comb of Y depend on X and the model reduces to the standard model. Estimation via maximum likelihood with u determined by AIC, BIC,..., likelihood ratio testing, cross validation or a holdout sample. We are still interested in β and Σ, which depend on the envelope E Σ (B), but not on the particular basis Γ selected to represent it.

6 How do envelopes work? Multivariate regression with two responses, Y 1 and Y 2, and a single predictor, X = 0 or 1, to indicate two populations. Y = ( Y1 Y 2 ) = ( α1 α 2 ) ( β1 + β 2 ) ( ε1 X + ε 2 α 1 = E(Y 1 X = 0), β 1 = E(Y 1 X = 1) E(Y 1 X = 0), α 2 = E(Y 2 X = 0), β 2 = E(Y 2 X = 1) E(Y 2 X = 0). Standard estimators are obtained by substituting sample moments. )

7 Possible configurations for uncomplicated study Y 1 Y Y 2 0 Y 2

8 Inference for β 2 = E(Y 2 X = 1) E(Y 2 X = 0)

9 Inference for β 2 = E(Y 2 X = 1) E(Y 2 X = 0)

10 Inference for β 2 = E(Y 2 X = 1) E(Y 2 X = 0)

11 Schematic representation of an envelope analysis E Σ (B) EΣ(B)

12 Schematic representation of an envelope analysis E Σ (B) EΣ(B)

13 Schematic representation of an envelope analysis E Σ (B) EΣ(B)

14 Wheat protein data (Y 1, Y 2 ) are spectral intensities at two wave lengths for high (red) and low protein wheat. n = 50. SE s for 2 elements in β: Standard estimator: 8.6, 9.5 Envelope estimator: 0.4, 0.6 n 20, 000

15 Wheat protein data Y2 (2nd response) A Y2 (2nd response) EΣ(B) B2 B1 E Σ (B) Y1 (1st response) Standard analysis Y1 (1st response) Envelope analysis

16 Maximum Likelihood Estimation The estimated envelope ÊΣ(B) can be represented as Ê Σ (B) = arg min S (log P S S Y X P S 0 + log Q S S Y Q S 0 ), where 0 means the product of the non-zero eigenvalues and S is a u-dim subspace of R r. Let Γ be a basis for ÊΣ(B). Estimators of other parameters: β = P Γ βstd, Σ = P Γ ΣStd P Γ + Q Γ ΣStd Q Γ avar{ nvec[ β]} avar( nvec[ β Std ]) massive gains when Γ T ΣΓ Γ T 0 Σ 0Γ 0.

17 Cattle data Experiment: Two treatments, each assigned randomly to 30 cows. Weight measured at weeks 2, 4, 6,...,16, 18, 19. Do the treatment have a differential effect; if so, about when it is first apparent?

18 Ï Ø ½ ¼ ¾½¼ ¾ ¼ ¾ ¼ ¾ ¼ ¾ ¼ ½¼ ¼ ¼ ¼ Profile plot of cattle data ¼ ¾ ½¼ ½¾ ½ ½ ½ ¾¼ Ï Y i = α + βx i + ε i, X = 0, 1 β Std = Ȳ trt1 Ȳ trt2

19 Mean profile plot of cattle data max i ( β Std ) i /SE{( β Std ) i } 1.3. LRT stat. for β = 0 is about 27 on 10 df.

20 Fitted profile plots, after inferring that u = 5. From envelope fit, β i /SE( β i ) > 4.1 for i 10

21 Cattle weight, week 12 vs week Envelope Γ T Y 340 Weight on week Γ 0 T Y E S E Weight on week 12

22 Brain Volumes for Alzheimer s patients Y holds log volumes of 93 different regions of the brains of n = 309 female Alzheimer s patients of varying age X 1 and log intracerebroventricular volumes X 2, (Zhu, et al. 2015) We inferred that u = 1 so the envelope model becomes Y = α + Γ{η 1 X 1 + η 2 X 2 } + ε, Since û = 1, we can easily plot Γ T Y vs the two predictors.

23 Γ T Y vs. Age T Y Age

24 Other applications in linear models Partial response envelopes Partial response envelopes for prediction Predictor envelopes Response-predictor envelopes Envelope version of reduced rank regression Tensor envelopes Bayesian envelopes

25 Beyond linear models, (Cook & Zhang, 2015) Suppose we have an an asymptotically normal estimator θ of θ R p, n( θ θ) N(0, V(θ)). The estimator can often be improved by projecting it onto a root-n consistent estimator of the V(θ)-envelope of span(θ). 1 Reproduces known envelope methods, and applicable to GLMs. 2 Links envelopes to a pre-specified estimator, MLE, robust estimator, OLS,... Likelihood not required. 3 V(θ) can now depend on the parameter being estimated, plus perhaps nuisance parameters. 4 Envelopes and Aster models (Eck, et al. poster session)

26 Computing for linear model applications: MatLab toolbox: Papers: dennis/. Thank you!

27 General Context

28 Other envelope application in MLMs Partial response envelopes for part of β = (β 1, β 2 ). (Su and Cook, Biometrika, 2011) Y = α + β 1 X 1 + β 2 X 2 + ε = α + ΓηX 1 + β 2 X 2 + ε Σ = ΓΩΓ T + Γ 0 Ω 0 Γ T 0 Simultaneous envelopes for reducing X and Y (Cook and Zhang, Technometrics, to appear) Y = α + βx + ε = α + ΓηΦ T X + ε Σ = ΓΩΓ T + Γ 0 Ω 0 Γ T 0 Σ X = Φ Φ T + Φ 0 0 Φ T 0

29 Scaled predictor envelopes, when predictors are in different scales. (Su and Cook, submitted) Y = α + η T Φ T Λ 1 X + ε, Σ X = ΛΦ Φ T Λ + ΛΦ 0 0 Φ T 0 Λ, Λ = diag(1, λ 2,..., λ p ) Scaled response envelopes, when responses are in different scales. (Su and Cook, Biometrika, 2013) Inner envelopes, when envelopes don t offer improvement. (Su and Cook, Biometrika, 2012) based on the largest reducing subspace of Σ that is contained within span(β). Heteroscedastic envelopes for comparing multivariate means in populations with different covariance matrices. (Su and Cook, Statistica Sinica, 2013).

30 Egyptian Skulls 4 measurements Y in cm on 30 male skulls in each of 5 epochs, 4000, 3300, 1850, 200 BC & 150 AD, included as indicators X. (Thomson, A. and Randall-Maciver, R., 1905, Ancient Races of the Thebaid, Oxford University Press.) Y = α + β 3300 X 1 + β 1850 X 2 + β 200 X 3 + β 150 X 4 + ε We inferred that u = 1 so the envelope model becomes Y = α + Γ{η 3300 X 1 + η 1850 X 2 + η 200 X 3 + η 150 X 4 } + ε, Since û = 1, we can easily plot Γ T Y vs epoch.

31 ½¼ ¾¼ ¼ ¼ ¼ Skull Boxplots vs Epoch Ì ¹ ¼¼¼ ¹ ¼¼ ¹¾¼¼¼ ¹ ¼¼ ½¼¼¼ Ê

32 Table: Estimated coefficients from cattle data. Week B B/se(B) β β/se( β) se(b)/se( β) We would need n 1500 for OLS to match the envelope results.

33 The standard MLE is β Std = (5.5, 4.8) T with bootstrap standard errors (4.2, 4.4) T, while the envelope estimate is β = (5.4, 5.1) T with bootstrap standard errors (1.12, 1.07) T. About 1500 observations would be needed for a likelihood analysis to yield the standard errors from an envelope analysis with 60 observations.

34 Intro. MLR 2Pop. MLE Cattle BrVol Other Heights of Boys and Girls Finis Extras Skulls MSE

35 Heights of Boys and Girls Ages 13 and EΣ(B) E Σ (B) Age Age13 Ω = 1.57 and Ω 0 = SE ratios for sm/em: 8.49 and 8.61.

36 Heights of Boys and Girls Ages 17 and EΣ(B) 190 Age E Σ (B) Age17 Ω = and Ω 0 = SE ratios for sm/em: 1.01 and 0.99

37 Heights of Boys and Girls: Bootstrap SEs Table: Bootstrap and estimated asymptotic standard errors of the two elements in β under the standard model (SM) and envelope model (EM). Response SM BSM EM BEM SM/EM BSM/BEM Age Age Age Age

38 SIMPLS algorithm for Φ (de Jong, 1993) Set w 0 = 0 and let φ k = (w 0,..., w k ) R p k. Then given φ k, the next vector w k+1 is constructed as S k = span(s X Φk ) w k+1 = l max (Q Sk S XY S T XY Q S k ) Φ k+1 = (w 0,..., w k, w k+1 ) for k = 1,..., m 1. m, the number of components, is chosen by cross-validation or a hold-out sample. Then Φ = Φ m. Envelope connection: With known m, span( φ m ) is a n-consistent estimator of the ΣX -envelope of span(β T ), E ΣX (B ), where B = span(β T ) and m = dim(e ΣX (B )).

39 Alternatively, we can use an envelope estimator for the same tasks: Y = α + η T {φ T X} + ε Σ X = φ φ T + φ 0 0 φ T 0 Σ = Σ β = β Std P Ṱ φ(s X ) where φ = arg min S {log P S S X Y P S 0 + log Q S S X Q S 0 } and S is an m-dim subspace of R p.

40 Estimation Let (B, B 0 ) denote an orthogonal basis of R p, where B R p q, B 0 R p (p q) and span(b) E M (B). Then v E B T 0 MB 0 (B T 0 B) implies that B 0 v E M (B).

41 1 Set initial value g 0 = G 0 = 0. 2 For k = 0,..., u 1, g k+1 R q is obtained direction in the envelope E V(θ) (span(θ)) as follows, 1 Let G k = (g 1,..., g k ) if k 1 and let (G k, G 0k ) be an orthogonal basis for R q. 2 Define the stepwise objective function L k (w) = log(w T A k w) + log(w T B 1 k w), (1) where A k = G T 0k V(θ)G 0k, B k = G T 0k V(θ)G 0k + G T 0k θ θ T G 0k and w R q k. 3 Solve w k+1 = arg min w L k (w) subject to a length constraint w T w = 1. 4 Define g k+1 = G 0k w k+1 to be the unit length (k + 1)-th stepwise direction.

42 Envelopes and MSE Y 1 y 0 0 Y 2

43 Y Y 2

44 E Σ (B) EΣ(B)

Algorithms for Envelope Estimation II

Algorithms for Envelope Estimation II Algorithms for Envelope Estimation II arxiv:1509.03767v1 [stat.me] 12 Sep 2015 R. Dennis Cook, Liliana Forzani and Zhihua Su September 15, 2015 Abstract We propose a new algorithm for envelope estimation,

More information

Foundations for Envelope Models and Methods

Foundations for Envelope Models and Methods 1 2 3 Foundations for Envelope Models and Methods R. Dennis Cook and Xin Zhang October 6, 2014 4 5 6 7 8 9 10 11 12 13 14 Abstract Envelopes were recently proposed by Cook, Li and Chiaromonte (2010) as

More information

Envelopes for Efficient Multivariate Parameter Estimation

Envelopes for Efficient Multivariate Parameter Estimation Envelopes for Efficient Multivariate Parameter Estimation A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Xin Zhang IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

More information

Supplementary Materials for Tensor Envelope Partial Least Squares Regression

Supplementary Materials for Tensor Envelope Partial Least Squares Regression Supplementary Materials for Tensor Envelope Partial Least Squares Regression Xin Zhang and Lexin Li Florida State University and University of California, Bereley 1 Proofs and Technical Details Proof of

More information

Renvlp: An R Package for Efficient Estimation in Multivariate Analysis Using Envelope Models

Renvlp: An R Package for Efficient Estimation in Multivariate Analysis Using Envelope Models Renvlp: An R Package for Efficient Estimation in Multivariate Analysis Using Envelope Models Minji Lee University of Florida Zhihua Su University of Florida Abstract The envelope models can achieve substantial

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2016-0037R2 Title Fast envelope algorithms Manuscript ID SS-2016.0037 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202016.0037 Complete List of Authors

More information

Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods

Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods Subho Majumdar School of Statistics, University of Minnesota Envelopes in Chemometrics August 4, 2014 1 / 23 Motivation

More information

Fast envelope algorithms

Fast envelope algorithms 1 2 Fast envelope algorithms R. Dennis Cook and Xin Zhang 3 4 5 6 7 8 9 10 11 12 13 14 Abstract In this paper, we develop new fast algorithms for envelope estimation that are stable and can be used in

More information

ENVELOPE MODELS FOR PARSIMONIOUS AND EFFICIENT MULTIVARIATE LINEAR REGRESSION

ENVELOPE MODELS FOR PARSIMONIOUS AND EFFICIENT MULTIVARIATE LINEAR REGRESSION Statistica Sinica 20 (2010), 927-1010 ENVELOPE MODELS FOR PARSIMONIOUS AND EFFICIENT MULTIVARIATE LINEAR REGRESSION R. Dennis Cook 1, Bing Li 2 and Francesca Chiaromonte 2 1 University of Minnesota and

More information

Model-free Envelope Dimension Selection

Model-free Envelope Dimension Selection Model-free Envelope Dimension Selection Xin Zhang and Qing Mai arxiv:1709.03945v2 [stat.me] 15 Sep 2017 Abstract An envelope is a targeted dimension reduction subspace for simultaneously achieving dimension

More information

Estimation of Multivariate Means with. Heteroscedastic Errors Using Envelope Models

Estimation of Multivariate Means with. Heteroscedastic Errors Using Envelope Models 1 Estimation of Multivariate Means with Heteroscedastic Errors Using Envelope Models Zhihua Su and R. Dennis Cook University of Minnesota Abstract: In this article, we propose envelope models that accommodate

More information

Package Renvlp. R topics documented: January 18, 2018

Package Renvlp. R topics documented: January 18, 2018 Type Package Title Computing Envelope Estimators Version 2.5 Date 2018-01-18 Author Minji Lee, Zhihua Su Maintainer Minji Lee Package Renvlp January 18, 2018 Provides a general routine,

More information

Supplementary Materials for Fast envelope algorithms

Supplementary Materials for Fast envelope algorithms 1 Supplementary Materials for Fast envelope algorithms R. Dennis Cook and Xin Zhang 1 Proof for Proposition We first prove the following: F(A 0 ) = log A T 0 MA 0 + log A T 0 (M + U) 1 A 0 (A1) log A T

More information

ESTIMATION OF MULTIVARIATE MEANS WITH HETEROSCEDASTIC ERRORS USING ENVELOPE MODELS

ESTIMATION OF MULTIVARIATE MEANS WITH HETEROSCEDASTIC ERRORS USING ENVELOPE MODELS Statistica Sinica 23 (2013), 213-230 doi:http://dx.doi.org/10.5705/ss.2010.240 ESTIMATION OF MULTIVARIATE MEANS WITH HETEROSCEDASTIC ERRORS USING ENVELOPE MODELS Zhihua Su and R. Dennis Cook University

More information

(Received April 2008; accepted June 2009) COMMENT. Jinzhu Jia, Yuval Benjamini, Chinghway Lim, Garvesh Raskutti and Bin Yu.

(Received April 2008; accepted June 2009) COMMENT. Jinzhu Jia, Yuval Benjamini, Chinghway Lim, Garvesh Raskutti and Bin Yu. 960 R. DENNIS COOK, BING LI AND FRANCESCA CHIAROMONTE Johnson, R. A. and Wichern, D. W. (2007). Applied Multivariate Statistical Analysis. Sixth Edition. Pearson Prentice Hall. Jolliffe, I. T. (2002).

More information

Groupwise Envelope Models for Imaging Genetic Analysis

Groupwise Envelope Models for Imaging Genetic Analysis BIOMETRICS xxx, 1 21 DOI: xxx xxx xxx Groupwise Envelope Models for Imaging Genetic Analysis Yeonhee Park 1,, Zhihua Su 2, and Hongtu Zhu 1, 1 Department of Biostatistics, The University of Texas MD Anderson

More information

Again consider the multivariate linear model (1.1), but now allowing the predictors to be stochastic. Restating it for ease of reference,

Again consider the multivariate linear model (1.1), but now allowing the predictors to be stochastic. Restating it for ease of reference, Chapter 4 Predictor Envelopes In Chapter 1 we considered reduction of Y, relying on the notion of material and immaterial information for motivation and using E Σ (B) with B =span(β) as the reduction construct.

More information

Simultaneous envelopes for multivariate linear regression

Simultaneous envelopes for multivariate linear regression 1 2 3 4 Simultaneous envelopes for multivariate linear regression R. Dennis Cook and Xin Zhang December 2, 2013 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Abstract We introduce envelopes for simultaneously

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Dimension Reduction in Abundant High Dimensional Regressions

Dimension Reduction in Abundant High Dimensional Regressions Dimension Reduction in Abundant High Dimensional Regressions Dennis Cook University of Minnesota 8th Purdue Symposium June 2012 In collaboration with Liliana Forzani & Adam Rothman, Annals of Statistics,

More information

Parsimonious Tensor Response Regression

Parsimonious Tensor Response Regression Parsimonious Tensor Response Regression Lexin Li and Xin Zhang University of California at Bereley; and Florida State University Abstract Aiming at abundant scientific and engineering data with not only

More information

Matrix-Variate Regressions and Envelope Models

Matrix-Variate Regressions and Envelope Models Matrix-Variate Regressions and Envelope Models arxiv:1605.01485v2 [stat.me] 30 Jul 2017 Shanshan Ding Department of Applied Economics and Statistics, University of Delaware and R. Dennis Cook School of

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

arxiv: v1 [stat.me] 30 Jan 2015

arxiv: v1 [stat.me] 30 Jan 2015 Parsimonious Tensor Response Regression Lexin Li and Xin Zhang arxiv:1501.07815v1 [stat.me] 30 Jan 2015 University of California, Bereley; and Florida State University Abstract Aiming at abundant scientific

More information

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St.

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St. Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Abstract This article, which is based on an Interface tutorial, presents an overview of regression

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

PLS. theoretical results for the chemometrics use of PLS. Liliana Forzani. joint work with R. Dennis Cook

PLS. theoretical results for the chemometrics use of PLS. Liliana Forzani. joint work with R. Dennis Cook PLS theoretical results for the chemometrics use of PLS Liliana Forzani Facultad de Ingeniería Química, UNL, Argentina joint work with R. Dennis Cook Example in chemometrics A concrete situation could

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Regression Graphics. R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108

Regression Graphics. R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Abstract This article, which is based on an Interface tutorial, presents an overview of regression

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as ST 51, Summer, Dr. Jason A. Osborne Homework assignment # - Solutions 1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available

More information

Tensor Envelope Partial Least Squares Regression

Tensor Envelope Partial Least Squares Regression Tensor Envelope Partial Least Squares Regression Xin Zhang and Lexin Li Florida State University and University of California Berkeley Abstract Partial least squares (PLS) is a prominent solution for dimension

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

Multivariate Regression (Chapter 10)

Multivariate Regression (Chapter 10) Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate

More information

A BAYESIAN APPROACH FOR ENVELOPE MODELS. BY KSHITIJ KHARE 1,SUBHADIP PAL AND ZHIHUA SU 2 University of Florida

A BAYESIAN APPROACH FOR ENVELOPE MODELS. BY KSHITIJ KHARE 1,SUBHADIP PAL AND ZHIHUA SU 2 University of Florida The Annals of Statistics 2017, Vol. 45, No. 1, 196 222 DOI: 10.1214/16-AOS1449 Institute of Mathematical Statistics, 2017 A BAYESIAN APPROACH FOR ENVELOPE MODELS BY KSHITIJ KHARE 1,SUBHADIP PAL AND ZHIHUA

More information

Well-developed and understood properties

Well-developed and understood properties 1 INTRODUCTION TO LINEAR MODELS 1 THE CLASSICAL LINEAR MODEL Most commonly used statistical models Flexible models Well-developed and understood properties Ease of interpretation Building block for more

More information

Fused estimators of the central subspace in sufficient dimension reduction

Fused estimators of the central subspace in sufficient dimension reduction Fused estimators of the central subspace in sufficient dimension reduction R. Dennis Cook and Xin Zhang Abstract When studying the regression of a univariate variable Y on a vector x of predictors, most

More information

Regression: Ordinary Least Squares

Regression: Ordinary Least Squares Regression: Ordinary Least Squares Mark Hendricks Autumn 2017 FINM Intro: Regression Outline Regression OLS Mathematics Linear Projection Hendricks, Autumn 2017 FINM Intro: Regression: Lecture 2/32 Regression

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Sufficient Dimension Reduction for Longitudinally Measured Predictors

Sufficient Dimension Reduction for Longitudinally Measured Predictors Sufficient Dimension Reduction for Longitudinally Measured Predictors Ruth Pfeiffer National Cancer Institute, NIH, HHS joint work with Efstathia Bura and Wei Wang TU Wien and GWU University JSM Vancouver

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

Total Least Squares Approach in Regression Methods

Total Least Squares Approach in Regression Methods WDS'08 Proceedings of Contributed Papers, Part I, 88 93, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS Total Least Squares Approach in Regression Methods M. Pešta Charles University, Faculty of Mathematics

More information

STAT 730 Chapter 5: Hypothesis Testing

STAT 730 Chapter 5: Hypothesis Testing STAT 730 Chapter 5: Hypothesis Testing Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 28 Likelihood ratio test def n: Data X depend on θ. The

More information

4 Multiple Linear Regression

4 Multiple Linear Regression 4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

Principal Fitted Components for Dimension Reduction in Regression

Principal Fitted Components for Dimension Reduction in Regression Statistical Science 2008, Vol. 23, No. 4, 485 501 DOI: 10.1214/08-STS275 c Institute of Mathematical Statistics, 2008 Principal Fitted Components for Dimension Reduction in Regression R. Dennis Cook and

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Interactions between Binary & Quantitative Predictors

Interactions between Binary & Quantitative Predictors Interactions between Binary & Quantitative Predictors The purpose of the study was to examine the possible joint effects of the difficulty of the practice task and the amount of practice, upon the performance

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work. Assignment 1 Math 5341 Linear Algebra Review Give complete answers to each of the following questions Show all of your work Note: You might struggle with some of these questions, either because it has

More information

A Selective Review of Sufficient Dimension Reduction

A Selective Review of Sufficient Dimension Reduction A Selective Review of Sufficient Dimension Reduction Lexin Li Department of Statistics North Carolina State University Lexin Li (NCSU) Sufficient Dimension Reduction 1 / 19 Outline 1 General Framework

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

Towards a Regression using Tensors

Towards a Regression using Tensors February 27, 2014 Outline Background 1 Background Linear Regression Tensorial Data Analysis 2 Definition Tensor Operation Tensor Decomposition 3 Model Attention Deficit Hyperactivity Disorder Data Analysis

More information

Indirect multivariate response linear regression

Indirect multivariate response linear regression Biometrika (2016), xx, x, pp. 1 22 1 2 3 4 5 6 C 2007 Biometrika Trust Printed in Great Britain Indirect multivariate response linear regression BY AARON J. MOLSTAD AND ADAM J. ROTHMAN School of Statistics,

More information

Generalized, Linear, and Mixed Models

Generalized, Linear, and Mixed Models Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Course topics (tentative) The role of random effects

Course topics (tentative) The role of random effects Course topics (tentative) random effects linear mixed models analysis of variance frequentist likelihood-based inference (MLE and REML) prediction Bayesian inference The role of random effects Rasmus Waagepetersen

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Approximate Likelihoods

Approximate Likelihoods Approximate Likelihoods Nancy Reid July 28, 2015 Why likelihood? makes probability modelling central l(θ; y) = log f (y; θ) emphasizes the inverse problem of reasoning y θ converts a prior probability

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 An Introduction to Multilevel Models PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 Today s Class Concepts in Longitudinal Modeling Between-Person vs. +Within-Person

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

An R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM

An R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM An R Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM Lloyd J. Edwards, Ph.D. UNC-CH Department of Biostatistics email: Lloyd_Edwards@unc.edu Presented to the Department

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Mixed models Yan Lu March, 2018, week 8 1 / 32 Restricted Maximum Likelihood (REML) REML: uses a likelihood function calculated from the transformed set

More information

Forecasting 1 to h steps ahead using partial least squares

Forecasting 1 to h steps ahead using partial least squares Forecasting 1 to h steps ahead using partial least squares Philip Hans Franses Econometric Institute, Erasmus University Rotterdam November 10, 2006 Econometric Institute Report 2006-47 I thank Dick van

More information

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Andreas Groll 1 and Gerhard Tutz 2 1 Department of Statistics, University of Munich, Akademiestrasse 1, D-80799, Munich,

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify

More information

Linear models Analysis of Covariance

Linear models Analysis of Covariance Esben Budtz-Jørgensen April 22, 2008 Linear models Analysis of Covariance Confounding Interactions Parameterizations Analysis of Covariance group comparisons can become biased if an important predictor

More information

Random and Mixed Effects Models - Part III

Random and Mixed Effects Models - Part III Random and Mixed Effects Models - Part III Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Quasi-F Tests When we get to more than two categorical factors, some times there are not nice F tests

More information

ECON 4160, Lecture 11 and 12

ECON 4160, Lecture 11 and 12 ECON 4160, 2016. Lecture 11 and 12 Co-integration Ragnar Nymoen Department of Economics 9 November 2017 1 / 43 Introduction I So far we have considered: Stationary VAR ( no unit roots ) Standard inference

More information

Vector Auto-Regressive Models

Vector Auto-Regressive Models Vector Auto-Regressive Models Laurent Ferrara 1 1 University of Paris Nanterre M2 Oct. 2018 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information

Elements of Multivariate Time Series Analysis

Elements of Multivariate Time Series Analysis Gregory C. Reinsel Elements of Multivariate Time Series Analysis Second Edition With 14 Figures Springer Contents Preface to the Second Edition Preface to the First Edition vii ix 1. Vector Time Series

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Linear models Analysis of Covariance

Linear models Analysis of Covariance Esben Budtz-Jørgensen November 20, 2007 Linear models Analysis of Covariance Confounding Interactions Parameterizations Analysis of Covariance group comparisons can become biased if an important predictor

More information