PLS. theoretical results for the chemometrics use of PLS. Liliana Forzani. joint work with R. Dennis Cook

Size: px
Start display at page:

Download "PLS. theoretical results for the chemometrics use of PLS. Liliana Forzani. joint work with R. Dennis Cook"

Transcription

1 PLS theoretical results for the chemometrics use of PLS Liliana Forzani Facultad de Ingeniería Química, UNL, Argentina joint work with R. Dennis Cook

2 Example in chemometrics A concrete situation could be that y is a chemical variable (protein content or fat content) and x = (x 1,..., x p ) are absorptions or reflectances measured at p different wavelengths using some kind of spectroscopic instrument. We will have available simultaneous measurements of x and y on n chemical samples (the calibration set), and we want to use these measurements to predict y from x measurements on new specimens.

3 Setting in chemometrics Goal: predict a random variable y from p random variables x = (x 1,..., x p ). Statistical model: linear regression with ɛ N(0, σ 2 ɛ ), σ 2 ɛ = σ 2 y σ T Σ 1 σ, σ 2 y = var(y), Σ = cov(x), σ = cov(x, y). y = µ y + β T (x µ x ) + ɛ (1)

4 Setting in chemometrics Goal: predict a random variable y from p random variables x = (x 1,..., x p ). Statistical model: linear regression with ɛ N(0, σ 2 ɛ ), σ 2 ɛ = σ 2 y σ T Σ 1 σ, σ 2 y = var(y), Σ = cov(x), σ = cov(x, y). y = µ y + β T (x µ x ) + ɛ (1) Least square solution for β = Σ 1 σ.

5 Algorithm PLS started as an algorithm to avoid (when n < p) the problem of inverting the covariance matrix Σ to get the estimator of β = Σ 1 σ. Let us call that estimator ˆβ P LS.

6 Algorithm PLS started as an algorithm to avoid (when n < p) the problem of inverting the covariance matrix Σ to get the estimator of β = Σ 1 σ. Let us call that estimator ˆβ P LS. It was set in motion by Herman Wold in the late 1960s to address problems in path modeling, and was adapted in 1977 by Svante Wold for prediction in chemometrics. It was an algorithm, easy to compute that worked pretty well in chemometrics even if p > n.

7 PLS algorithm. Martens and Naes (1989) The algorithm in this version is as follows: Choose a d (there are ways to choose d).

8 PLS algorithm. Martens and Naes (1989) The algorithm in this version is as follows: Choose a d (there are ways to choose d). Compute Ŝ = {ˆσ, ˆΣˆσ,..., ˆΣ d 1ˆσ} with ˆσ and ˆΣ the sample version of σ and Σ.

9 PLS algorithm. Martens and Naes (1989) The algorithm in this version is as follows: Choose a d (there are ways to choose d). Compute Ŝ = {ˆσ, ˆΣˆσ,..., ˆΣ d 1ˆσ} with ˆσ and ˆΣ the sample version of σ and Σ. Choose ˆβ span(ŝ) such that it gives the minimum square error for Y X ˆβ

10 PLS algorithm. Martens and Naes (1989) The algorithm in this version is as follows: Choose a d (there are ways to choose d). Compute Ŝ = {ˆσ, ˆΣˆσ,..., ˆΣ d 1ˆσ} with ˆσ and ˆΣ the sample version of σ and Σ. Choose ˆβ span(ŝ) such that it gives the minimum square error for Y X ˆβ d? d = p?

11 PLS algorithm. Martens and Naes (1989) The algorithm in this version is as follows: Choose a d (there are ways to choose d). Compute Ŝ = {ˆσ, ˆΣˆσ,..., ˆΣ d 1ˆσ} with ˆσ and ˆΣ the sample version of σ and Σ. Choose ˆβ span(ŝ) such that it gives the minimum square error for Y X ˆβ d? d = p? Is there convergence of the algorithm? where?

12 PLS works The chemometrics community was using PLS for calibration since then

13 PLS works The chemometrics community was using PLS for calibration since then Chemometricians tend not to address population PLS models or regression coefficients, but instead deal directly with predictions resulting from PLS algorithms.

14 PLS works The chemometrics community was using PLS for calibration since then Chemometricians tend not to address population PLS models or regression coefficients, but instead deal directly with predictions resulting from PLS algorithms. The method works even for n < p, but there was no consistent theory to support the claim of why it was working and where is going

15 PLS works The chemometrics community was using PLS for calibration since then Chemometricians tend not to address population PLS models or regression coefficients, but instead deal directly with predictions resulting from PLS algorithms. The method works even for n < p, but there was no consistent theory to support the claim of why it was working and where is going Statistic community did not pay attention to PLS (maybe this is why there were no asymptotics)

16 PLS works The chemometrics community was using PLS for calibration since then Chemometricians tend not to address population PLS models or regression coefficients, but instead deal directly with predictions resulting from PLS algorithms. The method works even for n < p, but there was no consistent theory to support the claim of why it was working and where is going Statistic community did not pay attention to PLS (maybe this is why there were no asymptotics) The PLS tradition is perhaps more akin to conventions in machine learning or data science than it is to statistical customs. There is now a vast literature on PLS within chemometrics, some of it refining and extending the methodology and some of it affirming the methodology like the paper PLS works. by Bro and Eldén (2009)

17 Constraint in the parameters. Helland. An statistician appear Helland in 1990 realized that the algorithm was a plugging estimator for β of the form with β PLS = S(S T ΣS) 1 S T σ (2) S = {σ, Σσ,..., Σ d 1 σ} i.e. ˆβ PLS = Ŝ(ŜT ˆΣŜ) 1 Ŝ T ˆσ (3) As a consequence for p fixed ˆβ PLS consistent estimator of β PLS There was still a mystery about the shape of β PLS from (2).

18 But the statisticians did show up Mystery dissapaear: Model in the population: (Cook, Helland and Su, 2013) Idea: when d = 1, PLS in the population β = σ(σ T Σσ) 1 σ T σ.

19 But the statisticians did show up Mystery dissapaear: Model in the population: (Cook, Helland and Su, 2013) Idea: when d = 1, PLS in the population β = σ(σ T Σσ) 1 σ T σ. 1 β = cσ 2 Let us recall that β = Σ 1 σ 3 (1) and (2) together Σ 1 σ = cσ or Σσ = cσ, 4 Then, σ is one of the eigenvectors of Σ

20 But the statisticians did show up Mystery dissapaear: Model in the population: (Cook, Helland and Su, 2013) Idea: when d = 1, PLS in the population β = σ(σ T Σσ) 1 σ T σ. 1 β = cσ 2 Let us recall that β = Σ 1 σ 3 (1) and (2) together Σ 1 σ = cσ or Σσ = cσ, 4 Then, σ is one of the eigenvectors of Σ Moreover, if d > 1 PLS in the population means that β cuts only d eigenvectors of Σ and therefore β can be envelope by a span of d eigenvectors of Σ

21 More about constraints. MLE Cook, Helland and Zu (2013): informally β only cut a few eigenvectors of Σ. Formally there exists Γ R p u with u p such that the columns of Γ are u eigenvectors of Σ (not necessary the first ones) and β = ΓU for some U R u 1 and since β = Σ 1 σ we have Σ = ΓΛ Γ Γ T + Γ 0 Λ Γ0 Γ T 0 since Γ are eigenvectors of Σ β = Γ(Γ T ΣΓ) 1 Γσ Γ = S?, remember that β PLS = S(S T ΣS) 1 Sσ.

22 More about constraints. MLE Cook, Helland and Zu (2013): informally β only cut a few eigenvectors of Σ. Formally there exists Γ R p u with u p such that the columns of Γ are u eigenvectors of Σ (not necessary the first ones) and β = ΓU for some U R u 1 and since β = Σ 1 σ we have Σ = ΓΛ Γ Γ T + Γ 0 Λ Γ0 Γ T 0 since Γ are eigenvectors of Σ β = Γ(Γ T ΣΓ) 1 Γσ Γ = S?, remember that β PLS = S(S T ΣS) 1 Sσ. They found the MLE for β PLS and show and prove that for p fixed, n. Efficiency.

23 But the Chemometrics community used it for p increasing!!! Setting: p > n. MLE, does not exists. No hope The ˆβ PLS algorithm works if d < min{p, n}, and works (pretty well) when n < p. Recall ˆβ PLS = Ŝ(ŜT ˆΣŜ) 1 Ŝˆσ.

24 But the Chemometrics community used it for p increasing!!! Setting: p > n. MLE, does not exists. No hope The ˆβ PLS algorithm works if d < min{p, n}, and works (pretty well) when n < p. Recall ˆβ PLS = Ŝ(ŜT ˆΣŜ) 1 Ŝˆσ. In view of the apparent success that PLS has had in chemometrics and elsewhere, we might anticipate that it has reasonable statistical properties in high-dimensional regression.

25 But the statisticians did show up again

26 But the statisticians did show up again but with bad news

27 But the statisticians did show up again but with bad news Chun and Keles (2010) provided a piece of the puzzle by showing that, within a certain modeling framework, the PLS estimator of the coefficient vector in linear regression is inconsistent unless p/n 0. They then used this as motivation for their development of

28 But the statisticians did show up again but with bad news Chun and Keles (2010) provided a piece of the puzzle by showing that, within a certain modeling framework, the PLS estimator of the coefficient vector in linear regression is inconsistent unless p/n 0. They then used this as motivation for their development of sparse version of PLS.

29 A dilema The Chun- Keles result poses a dilemma.

30 A dilema The Chun- Keles result poses a dilemma. On the one hand, decades of experience support PLS as a useful method, but its inconsistency when p/n c > 0 casts doubt on its usefulness in high-dimensional regression, which is one of the contexts in which PLS undeniably stands out by virtue of its wide spread application.

31 A dilema The Chun- Keles result poses a dilemma. On the one hand, decades of experience support PLS as a useful method, but its inconsistency when p/n c > 0 casts doubt on its usefulness in high-dimensional regression, which is one of the contexts in which PLS undeniably stands out by virtue of its wide spread application. There are several possible explanations for this conflict, including

32 A dilema The Chun- Keles result poses a dilemma. On the one hand, decades of experience support PLS as a useful method, but its inconsistency when p/n c > 0 casts doubt on its usefulness in high-dimensional regression, which is one of the contexts in which PLS undeniably stands out by virtue of its wide spread application. There are several possible explanations for this conflict, including consistency does not always signal the value of a method in practice, the literature is largely wrong about the value of PLS, and the modeling construct used by Chun and Keles does not adequately reflect the range of applications in which PLS is employed.

33 Model in Chun and Keles s paper The model for x (the predictor) is given by x y = µ x + Θν y + ω, (4) where ν R d, ν N(0, I d ), Θ R p d, ω R p, ν N(0, π 2 I p ).

34 Model in Chun and Keles s paper The model for x (the predictor) is given by x y = µ x + Θν y + ω, (4) where ν R d, ν N(0, I d ), Θ R p d, ω R p, ν N(0, π 2 I p ). As a consequence, x y Θ T x, and thus d linear combinations Θ T x carry all of the information that x has about y.

35 Model in Chun and Keles s paper The model for x (the predictor) is given by x y = µ x + Θν y + ω, (4) where ν R d, ν N(0, I d ), Θ R p d, ω R p, ν N(0, π 2 I p ). As a consequence, x y Θ T x, and thus d linear combinations Θ T x carry all of the information that x has about y. The variance of x can be expressed as Σ = ΘΘ T + π 2 I p = H(Θ T Θ + π 2 I d )H T + π 2 Q H, where H = Θ(Θ T Θ) 1/2 is a semi-orthogonal basis matrix for span(θ).

36 Assumptions in Chun and Kele s paper They ask the columns of Θ to be orthogonal with bounded norms that converge as sequences. As a consequence Σ is bounded Σ = ΘΘ T + π 2 I p = H(Θ T Θ + π 2 I d )H T + π 2 Q H

37 Assumptions in Chun and Kele s paper They ask the columns of Θ to be orthogonal with bounded norms that converge as sequences. As a consequence Σ is bounded Σ = ΘΘ T + π 2 I p = H(Θ T Θ + π 2 I d )H T + π 2 Q H But in spectroscopy data it seems entirely plausible that notable signal comes from many wavelengths, not just a few. When this happens many rows of Θ are non-zero in such a way that p i=1 θ i 2 diverges and we are not in Chun and Kele s assumptions for non-consistency.

38 Assumptions in Chun and Kele s paper They ask the columns of Θ to be orthogonal with bounded norms that converge as sequences. As a consequence Σ is bounded Σ = ΘΘ T + π 2 I p = H(Θ T Θ + π 2 I d )H T + π 2 Q H But in spectroscopy data it seems entirely plausible that notable signal comes from many wavelengths, not just a few. When this happens many rows of Θ are non-zero in such a way that p i=1 θ i 2 diverges and we are not in Chun and Kele s assumptions for non-consistency. As a conclusion Chun and Kele s paper effectively imposes sparsity to get non-consistency.

39 Wold, remember who he was? Sparsity vs Abundance. A quote from Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection by Wold, Kettaneh and Tjessem, Who is Wold?

40 Wold, remember who he was? Sparsity vs Abundance. A quote from Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection by Wold, Kettaneh and Tjessem, Who is Wold? In situations with many variables, more than say 50 or 100, there is a strong temptation to drastically reduce the number of variables in the model. This temptation is further strengthened by the regression tradition to reduce the variables as far as possible to get the X matrix well conditioned. As discussed below, however, this reduction of variables often removes information, makes the interpretation misleading and increases the risk of spurious models. An often better alternative than variable reduction is to divide the variables into conceptually meaningful blocks and then apply hierarchical multi-block PLS (or PC) models. These ideas were presented by Wold, Martens and co-workers around 1986, but in rather obscure papers.

41 Wold, remember who he was? Sparsity vs Abundance. A quote from Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection by Wold, Kettaneh and Tjessem, Who is Wold? In situations with many variables, more than say 50 or 100, there is a strong temptation to drastically reduce the number of variables in the model. This temptation is further strengthened by the regression tradition to reduce the variables as far as possible to get the X matrix well conditioned. As discussed below, however, this reduction of variables often removes information, makes the interpretation misleading and increases the risk of spurious models. An often better alternative than variable reduction is to divide the variables into conceptually meaningful blocks and then apply hierarchical multi-block PLS (or PC) models. These ideas were presented by Wold, Martens and co-workers around 1986, but in rather obscure papers. With multivariate projection models such as PLS and PCA, however, the situation is different. These methods work well also with many variables even when the number of observations, N, is small. in fact, the larger the number of relevant variables, the more precise are the scores t (and u in PLS), because they have the characteristics of weighted averages of all the X- or Y - variables and an average is more precise the larger is the number of elements forming the basis of the average. There is therefore no real need for keeping the number of variables small; only really unimportant variables should be deleted to stabilize the model and its predictions.

42 And the statisticians did show up again

43 And the statisticians did show up again but now with good news Let us assume we are under the same model x y = µ x + Θν y + ω, (5) where ν R d, ν N(0, I d ), Θ R p d, ω R p, ν N(0, π 2 I p ).

44 And the statisticians did show up again but now with good news Let us assume we are under the same model x y = µ x + Θν y + ω, (5) where ν R d, ν N(0, I d ), Θ R p d, ω R p, ν N(0, π 2 I p ). Again Σ = ΘΘ T + π 2 I p = H(Θ T Θ + π 2 I d )H T + π 2 Q H

45 And the statisticians did show up again but now with good news Let us assume we are under the same model x y = µ x + Θν y + ω, (5) where ν R d, ν N(0, I d ), Θ R p d, ω R p, ν N(0, π 2 I p ). Again Σ = ΘΘ T + π 2 I p = H(Θ T Θ + π 2 I d )H T + π 2 Q H Then the rate of convergence for the prediction error square, using PLS, is of order p ( p i=1 θ i 2 )n

46 Consequences Order of convergence for the prediction error square: p ( p i=1 θ i 2 )n Chun and Kele s case: p i=1 θ i 2 bounded we have consistency only if p/n 0

47 Consequences Order of convergence for the prediction error square: p ( p i=1 θ i 2 )n Chun and Kele s case: p i=1 θ i 2 bounded we have consistency only if p/n 0 If p i=1 θ i 2 p α, the order of convergence of the prediction square error is p(1 α) n.

48 Consequences Order of convergence for the prediction error square: p ( p i=1 θ i 2 )n Chun and Kele s case: p i=1 θ i 2 bounded we have consistency only if p/n 0 If p i=1 θ i 2 p α, the order of convergence of the prediction square error is p(1 α) n. When the maximum amount of information is accumulated ( p i=1 θ i 2 p) we have the traditional n consistency

49 More? Yes, we have general result of consistency (not only for the model presented in Chun and Kele s paper) The consistency of the prediction and rate of convergence depends more or less on the ratio between the information that new predictors contribute about y and the amount of noise they contribute

50 Simulation n = p/2 x y = µ x + Θν y + ω, (6) The columns of Θ were constructed to be orthogonal with the diagonal elements diag(θ T Θ) = (4p a, p a ), a = 1/2, 3/4, 1, and diag(θ T Θ) = (4c, c) = c(4p 0, p 0 ) where c is constant.

51 Simulation n = p/2 x y = µ x + Θν y + ω, (6) The columns of Θ were constructed to be orthogonal with the diagonal elements diag(θ T Θ) = (4p a, p a ), a = 1/2, 3/4, 1, and diag(θ T Θ) = (4c, c) = c(4p 0, p 0 ) where c is constant. The theoretical result for this case indicate D N = O p ( φ) with p φ = n ( p i=1 θ i 2 ). Here p i=1 θ i 2 p a with a = 1, 3/4, 1/2 and 0.

52 Theoretical result: D N = O p ( p n ( p i=1 θ ) Only no i 2 convergence for the case of diag(θ T Θ) c

53 Theoretical result: D N = O p ( p n ( p i=1 θ ) Only no i 2 convergence for the case of diag(θ T Θ) c

54 Tetracycline data Goicoechea and Olivieri (1999) used PLS to develop a predictor of tetracycline concentration in human blood. The 50 training samples were constructed by spiking blank sera with various amounts of tetracycline in the range 0 4 µg ml 1. A validation set of 57 samples was constructed in the same way. For each sample, the values of the predictors were determined by measuring fluorescence intensity at p = 101 equally spaced points in the range nm. The authors determined using leave-one-out cross validation that the best predictions of the training data were obtained with d = 4 linear combinations of the original 101 predictors.

55 Tetracycline data We use these data to illustrate the behavior of PLS predictions in Chemometrics as the number of predictors increases. We used PLS with d = 4 to predict the validation data based on p equally spaced spectra, with p ranging between 10 and 101. For those five values of p we compute the root mean squared error(mse).

56 MSE for tetracycline data for different values of p Telatively steep drop in MSE for small p, say less than 30, and a slow but steady decrease in MSE thereafter.

57 Thanks!

Dimension Reduction in Abundant High Dimensional Regressions

Dimension Reduction in Abundant High Dimensional Regressions Dimension Reduction in Abundant High Dimensional Regressions Dennis Cook University of Minnesota 8th Purdue Symposium June 2012 In collaboration with Liliana Forzani & Adam Rothman, Annals of Statistics,

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

EXTENDING PARTIAL LEAST SQUARES REGRESSION

EXTENDING PARTIAL LEAST SQUARES REGRESSION EXTENDING PARTIAL LEAST SQUARES REGRESSION ATHANASSIOS KONDYLIS UNIVERSITY OF NEUCHÂTEL 1 Outline Multivariate Calibration in Chemometrics PLS regression (PLSR) and the PLS1 algorithm PLS1 from a statistical

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics

Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics Prof. Carolina Caetano For a while we talked about the regression method. Then we talked about the linear model. There were many details, but

More information

Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors

Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors Kristoffer Hellton Department of Mathematics, University of Oslo May 12, 2015 K. Hellton (UiO) Distribution

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Generalized Least Squares for Calibration Transfer. Barry M. Wise, Harald Martens and Martin Høy Eigenvector Research, Inc.

Generalized Least Squares for Calibration Transfer. Barry M. Wise, Harald Martens and Martin Høy Eigenvector Research, Inc. Generalized Least Squares for Calibration Transfer Barry M. Wise, Harald Martens and Martin Høy Eigenvector Research, Inc. Manson, WA 1 Outline The calibration transfer problem Instrument differences,

More information

Envelopes: Methods for Efficient Estimation in Multivariate Statistics

Envelopes: Methods for Efficient Estimation in Multivariate Statistics Envelopes: Methods for Efficient Estimation in Multivariate Statistics Dennis Cook School of Statistics University of Minnesota Collaborating at times with Bing Li, Francesca Chiaromonte, Zhihua Su, Inge

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods

Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods Subho Majumdar School of Statistics, University of Minnesota Envelopes in Chemometrics August 4, 2014 1 / 23 Motivation

More information

10 Model Checking and Regression Diagnostics

10 Model Checking and Regression Diagnostics 10 Model Checking and Regression Diagnostics The simple linear regression model is usually written as i = β 0 + β 1 i + ɛ i where the ɛ i s are independent normal random variables with mean 0 and variance

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

1 What does the random effect η mean?

1 What does the random effect η mean? Some thoughts on Hanks et al, Environmetrics, 2015, pp. 243-254. Jim Hodges Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota USA 55414 email: hodge003@umn.edu October 13, 2015

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

Explaining Correlations by Plotting Orthogonal Contrasts

Explaining Correlations by Plotting Orthogonal Contrasts Explaining Correlations by Plotting Orthogonal Contrasts Øyvind Langsrud MATFORSK, Norwegian Food Research Institute. www.matforsk.no/ola/ To appear in The American Statistician www.amstat.org/publications/tas/

More information

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

New method for the determination of benzoic and. sorbic acids in commercial orange juices based on

New method for the determination of benzoic and. sorbic acids in commercial orange juices based on New method for the determination of benzoic and sorbic acids in commercial orange juices based on second-order spectrophotometric data generated by a ph gradient flow injection technique (Supporting Information)

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

Chapter 12 REML and ML Estimation

Chapter 12 REML and ML Estimation Chapter 12 REML and ML Estimation C. R. Henderson 1984 - Guelph 1 Iterative MIVQUE The restricted maximum likelihood estimator (REML) of Patterson and Thompson (1971) can be obtained by iterating on MIVQUE,

More information

Functional Latent Feature Models. With Single-Index Interaction

Functional Latent Feature Models. With Single-Index Interaction Generalized With Single-Index Interaction Department of Statistics Center for Statistical Bioinformatics Institute for Applied Mathematics and Computational Science Texas A&M University Naisyin Wang and

More information

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015 MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 GLS and FGLS Econ 671 Purdue University Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 In this lecture we continue to discuss properties associated with the GLS estimator. In addition we discuss the practical

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University Chemometrics Matti Hotokka Physical chemistry Åbo Akademi University Linear regression Experiment Consider spectrophotometry as an example Beer-Lamberts law: A = cå Experiment Make three known references

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Optimization Problems

Optimization Problems Optimization Problems The goal in an optimization problem is to find the point at which the minimum (or maximum) of a real, scalar function f occurs and, usually, to find the value of the function at that

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Name Solutions Linear Algebra; Test 3. Throughout the test simplify all answers except where stated otherwise.

Name Solutions Linear Algebra; Test 3. Throughout the test simplify all answers except where stated otherwise. Name Solutions Linear Algebra; Test 3 Throughout the test simplify all answers except where stated otherwise. 1) Find the following: (10 points) ( ) Or note that so the rows are linearly independent, so

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Supplementary Materials for Tensor Envelope Partial Least Squares Regression

Supplementary Materials for Tensor Envelope Partial Least Squares Regression Supplementary Materials for Tensor Envelope Partial Least Squares Regression Xin Zhang and Lexin Li Florida State University and University of California, Bereley 1 Proofs and Technical Details Proof of

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Basics of Multivariate Modelling and Data Analysis

Basics of Multivariate Modelling and Data Analysis Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 2. Overview of multivariate techniques 2.1 Different approaches to multivariate data analysis 2.2 Classification of multivariate techniques

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Principal Components Theory Notes

Principal Components Theory Notes Principal Components Theory Notes Charles J. Geyer August 29, 2007 1 Introduction These are class notes for Stat 5601 (nonparametrics) taught at the University of Minnesota, Spring 2006. This not a theory

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1 Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA International Journal of Pure and Applied Mathematics Volume 19 No. 3 2005, 359-366 A NOTE ON BETWEEN-GROUP PCA Anne-Laure Boulesteix Department of Statistics University of Munich Akademiestrasse 1, Munich,

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Sufficient Dimension Reduction for Longitudinally Measured Predictors

Sufficient Dimension Reduction for Longitudinally Measured Predictors Sufficient Dimension Reduction for Longitudinally Measured Predictors Ruth Pfeiffer National Cancer Institute, NIH, HHS joint work with Efstathia Bura and Wei Wang TU Wien and GWU University JSM Vancouver

More information

Final Exam. Economics 835: Econometrics. Fall 2010

Final Exam. Economics 835: Econometrics. Fall 2010 Final Exam Economics 835: Econometrics Fall 2010 Please answer the question I ask - no more and no less - and remember that the correct answer is often short and simple. 1 Some short questions a) For each

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen June 6, 2013 1 Motivation Problem: Many clinical covariates which are important to a certain medical

More information

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE LECTURE NOTE #NEW 6 PROF. ALAN YUILLE 1. Introduction to Regression Now consider learning the conditional distribution p(y x). This is often easier than learning the likelihood function p(x y) and the

More information

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Linear Models DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Linear regression Least-squares estimation

More information

Statistics 910, #15 1. Kalman Filter

Statistics 910, #15 1. Kalman Filter Statistics 910, #15 1 Overview 1. Summary of Kalman filter 2. Derivations 3. ARMA likelihoods 4. Recursions for the variance Kalman Filter Summary of Kalman filter Simplifications To make the derivations

More information

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017 CPSC 340: Machine Learning and Data Mining Stochastic Gradient Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation code

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

LECTURE NOTE #10 PROF. ALAN YUILLE

LECTURE NOTE #10 PROF. ALAN YUILLE LECTURE NOTE #10 PROF. ALAN YUILLE 1. Principle Component Analysis (PCA) One way to deal with the curse of dimensionality is to project data down onto a space of low dimensions, see figure (1). Figure

More information

Chemometrics: Classification of spectra

Chemometrics: Classification of spectra Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

Linear regression. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Linear regression. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Linear regression DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall15 Carlos Fernandez-Granda Linear models Least-squares estimation Overfitting Example:

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

ECE 275A Homework 6 Solutions

ECE 275A Homework 6 Solutions ECE 275A Homework 6 Solutions. The notation used in the solutions for the concentration (hyper) ellipsoid problems is defined in the lecture supplement on concentration ellipsoids. Note that θ T Σ θ =

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

9.1 Orthogonal factor model.

9.1 Orthogonal factor model. 36 Chapter 9 Factor Analysis Factor analysis may be viewed as a refinement of the principal component analysis The objective is, like the PC analysis, to describe the relevant variables in study in terms

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Introduction to Simple Linear Regression

Introduction to Simple Linear Regression Introduction to Simple Linear Regression Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Introduction to Simple Linear Regression 1 / 68 About me Faculty in the Department

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

ReducedPCR/PLSRmodelsbysubspaceprojections

ReducedPCR/PLSRmodelsbysubspaceprojections ReducedPCR/PLSRmodelsbysubspaceprojections Rolf Ergon Telemark University College P.O.Box 2, N-9 Porsgrunn, Norway e-mail: rolf.ergon@hit.no Published in Chemometrics and Intelligent Laboratory Systems

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods Techniques for Dimensionality Reduction PCA and Other Matrix Factorization Methods Outline Principle Compoments Analysis (PCA) Example (Bishop, ch 12) PCA as a mixture model variant With a continuous latent

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Sparse Approximation and Variable Selection

Sparse Approximation and Variable Selection Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

Nearest Neighbor Gaussian Processes for Large Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information