arxiv: v2 [math.st] 31 Oct 2017

Size: px
Start display at page:

Download "arxiv: v2 [math.st] 31 Oct 2017"

Transcription

1 arxiv: v2 [math.st] 31 Oct 2017 Unbiased Shrinkage Estimation Jann Spiess Working Paper This version: October 31, 2017 Abstract Shrinkage estimation usually reduces variance at the cost of bias. But when we care only about some parameters of a model, I show that we can reduce variance without incurring bias if we have additional information about the distribution of covariates. In a linear regression model with homoscedastic Normal noise, I consider shrinkage estimation of the nuisance parameters associated with control variables. For at least three control variables and exogenous treatment, I establish that the standard least-squares estimator is dominated with respect to squared-error loss in the treatment effect even among unbiased estimators and even when the target parameter is low-dimensional. I construct the dominating estimator by a variant of James Stein shrinkage in a high-dimensional Normal-means problem. It can be interpreted as an invariant generalized Bayes estimator with an uninformative (improper Jeffreys prior in the target parameter. Introduction Many inference tasks have the following feature: the researcher wants to obtain a high-quality estimate of a set of target parameters (for example, a Jann Spiess, Department of Economics, Harvard University, jspiess@fas.harvard.edu. I thank Gary Chamberlain, Maximilian Kasy, Carl Morris, and Jim Stock for insightful conversations, and seminar participants at Harvard for helpful comments. 1

2 set of treatment effects in an RCT, but also estimates a number of nuisance parameters she does not care about separately (for example, coefficients on control variables. In these cases, can we reduce variance in the estimation of a target parameter without inducing bias by shrinking in the estimation of possibly high-dimensional nuisance parameters? In a linear regression model with homoscedastic, Normal noise, I show that a natural application of James Stein shrinkage to the parameters associated with at least three control variables reduces loss in the possibly low-dimensional treatment effect parameter without producing bias provided that treatment is random. The proposed estimator effectively averages between regression models with and without control variables, similar to the Hansen (2016 modelaveraging estimator and coinciding up to a degrees-of-freedom correction with the corresponding Mallows estimator from Hansen (2007. For the specific choice of shrinkage, I contribute three finite-sample properties: First, I note that by averaging over the distribution of controls we obtain dominance of the shrinkage estimator even for low-dimensional target parameters, unlike other available results that require a loss function that is at least three-dimensional. Second, I establish that the resulting estimator remains unbiased under exogeneity of treatment. Third, I conceptualize it as a two-step estimator with a first-stage prediction component. Fourth, I show that it can be seen as a natural, invariant generalized Bayes estimator with respect to a partially improper prior corresponding to uninformativeness in the target parameter. The linear regression model is set up in Section 1. Section 2 proposes the estimator and establishes loss improvement relative to a benchmark OLS estimator provided treatment is exogenous. Section 3 motivates the estimator as an invariant generalized Bayes estimator (with respect to an improper prior in a suitably transformed many-means problem. 2

3 1 Linear Regression Setup I consider estimation of the structural parameter β R k in the canonical linear regression model Y i = α+x i β +W i γ +U i (1 from n iid observations (Y i,x i,w i, where X i R m are the regressors of interest, W i R k control variables, and U i R is homoscedastic, Normal noise. α is an intercept, 1 and γ is a nuisance parameter. To obtain identification of β in Equation (1, I assume that U i is orthogonal to X i and W i (no omitted variables. Throughout this document, I write upper-case letters for random variables (such as Y i and lower-case letters for fixed values (such as when I condition on X i = x i. When I suppress indices, I refer to the associated vector or matrix of observations, e.g. Y R n is the vector of outcome variables Y i and X R n m is the matrix with rows X i. 2 Two-Step Partial Shrinkage Estimator By assumption there are control variables W available with Y X=x,W=w N(1α+xβ +wγ,σ 2 I n where σ 2 need not be known. We care about the (possibly high-dimensional nuisance parameter γ only in so far as it helps us to estimate the (typically low-dimensional target parameter β, which is our object of interest. 2.1 A canonical form that preserves structure Given x R n m and w R n k, where we assume that (1,x,w has full rank 1 + m + k n, let q = (q 1,q x,q w,q r R n n orthonormal where 1 We could alternatively include a constant regressor in X i and subsume α in β. I choose to treat α separately since I will focus on the loss in estimating β, ignoring the performance in recovering the intercept α. 3

4 q 1 R n,q x R n m,q w R n k such that 1 is in the linear subspace of R n spanned by q 1 R n (that is, q 1 {1/ n, 1/ n}, the columns of (1,x are in the space spanned by the columns of (q 1,q x, and the columns of (1,x,w are in the space spanned by the columns of (q 1,q x,q w. (Such a basis exists, for example, by an iterated singular value decomposition. Then, q 1 1α+q 1 xβ +q 1 wγ Y = q Y X=x,W=w N q x xβ +q x wγ q w wγ,σ2 I n. 0 n 1 m k Writing Y x, Y w,y r for the appropriate subvectors of Y, we find, in particular, that Yx µ x +aµ w Yw X=x,W=w N µ w,σ 2 I n 1 Yr 0 n 1 m k where µ x = q xxβ R m, µ w = q wwγ R k, and a = q xw(q ww 1 R m k. 2 In transforming linear regression to this Normal-means problem, as well as in partitioning the coefficient vector into two groups, for only one of which I will propose shrinkage, I follow Sclove ( Two-step estimator Conditional on X=x,W=w and given an estimator ˆµ w = ˆµ w (Yw,Y r of µ w, a natural estimator of µ x is ˆµ x = ˆµ x (Yx,Y w,y r = Yx aˆµ w. An estimator of β is obtained by setting ˆβ = (q xx 1ˆµ x. (The linear least-squares estimator for β is obtained from ˆµ w = Y w. A natural loss function for ˆβ that represents prediction loss units is the weighted loss(ˆβ β (x q x q x x(ˆβ β = ˆµ x µ x 2. We can therefore focus on the (conditional expected squared- 2 Alternatively, we could have denoted by µ x the mean of Y x. However, by separating out µ x from aµ w I feel that the role of µ w as a relevant nuisance parameter becomes more transparent. 4

5 error loss in estimating µ x, for which we find E[ ˆµ x µ x 2 X=x,W=w] = mσ 2 + E[ ˆµ w µ w 2 a a X=x,W=w] with the seminorm v a a = v a av on R k. For high-dimensional µ w (k 3, a natural estimator ˆµ w with low expected squared-error loss is a shrinkage estimator of the form ˆµ w = CYw with scalar C, such as the James and Stein (1961 estimator for which C = 1 (k 2 Y r 2 (or its positive part. While improving with respect to (n m k+1 Yw 2 expected squared-error loss (a a = const. I k, this specific estimator may yield higher (conditional expected loss in µ x when the implied loss function for µ w deviates from squared-error loss (a a const. I k, so the loss function is not invariant under rotations. We will show below that it is still appropriate in the case of independence of treatment and control. 2.3 From conditional to unconditional loss For conditional inference it is known that the least-squares estimator is admissible for estimating β provided m 2 and inadmissible provided m 3 no matter what the dimensionality k of the nuisance parameter γ is (James and Stein, 1961, as the rank of the loss function is decisive. The above construction does not provide a counter-example to this result: the rank of a a = (w q w 1 w q x q xw(q ww 1 is at most m, so for m 2, ˆµ w = Y w remains admissible for the loss function on the right. While we could achieve improvements for m 3 through shrinkage in ˆµ w and/or directly in ˆµ x our interest is in the case where m is low and k is high. Conditional on X=x,W=w we can thus not hope to achieve improvements that hold for any (β,γ, but we can still hope that shrinkage estimation of µ w yields better estimates of β on average over draws of the data. To this end, assume that vec(w X=x N(vec(1α W +xβ W,Σ W I n (that is, W i X=x iid N(1α W +x i β W,Σ W. Here, Σ W R k k is symmetric 5

6 positive-definite (but not necessarily known. α W R 1 k,β W R m k describe the conditional expectation of control variables given the regressors X=x. The case where x and W are orthogonal (β W = O m k and controls W thus not required for identification will play a special role below. Given X=x, assume (q 1,q x is deterministic, and fix q such that q = (q 1,q x,q R n n is orthonormal. Note that vec((q x,q W X=x N = ( vec (( q x xβ W O n 1 m k,σ W I n 1 In particular, q x W q W. It follows with (( (q x,q q Y X=x,W=w N xxβ +q xwγ q wγ,σ 2 I n 1 that indeed q x(y,w = q (Y,W. Conditional on W=w in the above derivation, aˆµ w aµ w = q xwˆγ q xwγ for ˆγ = (q ww 1ˆµ w a function of q w and (Y w,y r = (q wq,q rq (q Y, so ˆγ = ˆγ(q Y,q w. Assuming measurability, ˆγ(q Y,q W (q xy,q xw. Now writing ˆγ = ˆγ(q y,q w this implies that =. E[ ˆµ w µ w 2 a a X=x,q (Y,W = q (y,w] = ˆγ γ 2 E[W q xq x W X=x] with E[W q x q x W X=x] = β W x q x q x xβ W + mσ W of full rank k. For the expectation of the implied ˆβ, we find E[ˆβ X=x,q (Y,W = q (y,w] = β β W(ˆγ γ. We obtain the following characterization of conditional bias and squarederror loss of the implied estimator ˆβ: Lemma 1 (Properties of the two-step estimator. Let (Ỹ, W be jointly 6

7 distributed according as vec( W N(0 k(n 1 m,σ W I n 1 m, Ỹ W = w N( wγ,σ 2 I n 1 m, and write Ẽ for the corresponding expectation operator. For any measurable estimator ˆγ : R n m 1 R n m 1 k R k with Ẽ[ ˆγ(Ỹ, W 2 ] <, the estimator ˆβ(y,w = (q x x 1 q x y (q x x 1 q x wˆγ(q y,q w defined for convenience for fixed x, has conditional bias E[ˆβ(Y,W X=x] β = β W (Ẽ[ˆγ(Ỹ, W] γ and expected (prediction-norm loss E[ ˆβ(Y,W β 2 x q xq xx X=x] = mσ2 + Ẽ[ ˆγ(Ỹ, W γ 2 φ ] for φ = β W x q x q xxβ W +mσ W. Note that this lemma does not rely on n 1 + m + k, and indeed generalizes to the case n > 1+m for any k 1, including k > n. 2.4 Exogenous treatment We consider the special case where treatment is exogenous, and thus β W = O m k. This assumption could be justified, for example, in a randomized trial. Note that in this case in addition to the linear least-squares estimator in the long regression that includes controls W another natural unbiased (conditional on X=x estimator is available, namely the coefficient (q x x 1 q x Y in the short regression without controls. The long and short regression represent special (edge cases in the class of two-step estimators introduced above, which are all unbiased in that sense under the exogeneity assumption: Corollary 1 (A class of unbiased two-step estimators. If β W = O m k then 7

8 for any ˆγ and ˆβ as in Lemma 1 E[ˆβ(Y,W X=x] = β. Furthermore, E[ ˆβ(Y,W β 2 x q xq x x X=x] = mẽ[(ỹ0 W 0ˆγ((Ỹi, W i n 1 m i=1 2 ] for (Ỹi, W i n 1 m i=0 iid with W i N(0 k,σ W, Ỹ i W i = w i N( w i γ,σ2 (here, (Ỹi, W i n 1 m i=1 is the training sample and (Ỹ0, W 0 an additional test point drawn from the same distribution. This corollary clarifies that the class of natural estimators derived above are unbiased conditional on X=x (but not necessarily on X=x, W =w jointly, with expected loss equal to the expected out-of-sample prediction loss in a prediction problem where the prediction function w 0 w 0ˆγ is trained on n 1 m iid draws, and evaluated on an additional, independent draw (Ỹ0, W 0 from the same distribution. The long and short regressions are included as the special cases ˆγ( w,ỹ = ( w w 1 w ỹ and ˆγ 0 k, respectively. The covariates in training and test sample follow the same distribution, which suggests an estimator that is invariant to rotations in the corresponding k-means problem. Indeed, the dominating estimator I construct in the following results is of the form ˆµ w = (1 p Y r 2 Y w 2 Y w, where the standard James and Stein (1961 estimator (for unnknown σ 2 is recovered at p = k 2 n m k+1. Theorem 1 (Inadmissibility of OLS among unbiased estimators. Maintain β W = O m k. Denote by (ˆα OLS, ˆβ OLS,ˆγ OLS the coefficients and by SSR = Y 1ˆα OLS Xˆβ OLS Wˆγ OLS 2 the sum of squared residuals in a linear least-squares regression of Y on an 1, X, and W. Write h = I n 1 n 1 n /n (the annihilator matrix with respect to the intercept. Assume that k 3 and n m+k +2. Then, the two-step estimator ˆβ = (X hx 1 X h(y Wˆγ with ( p SSR ˆγ = 1 ˆγ OLS 2 W h(i X(X hx 1 X hw ˆγ OLS 8

9 ( 2(k 2 where p 0, n m k+2 is unbiased for β given X=x and dominates in the sense that E[ ˆβ β 2 X hx X=x] < E[ ˆβ OLS β 2 X hx X=x]. ˆβ OLS Proof. The OLS estimator in the theorem corresponds to ˆγ OLS (ỹ, w = ( w w 1 ỹ w in Lemma 1, which yields the maximum-likelihood estimator ˆγ OLS (Ỹ, W for γ given data vec( W N(0 k(n 1 m,σ W I n 1 m, Ỹ W = w N( wγ,σ 2 I n 1 m. By Baranchik (1973, this maximum-likelihood estimator is inadmissible with respect to the risk Ẽ[ ˆγ γ 2 Σ W ] and thus for Ẽ[ ˆγ γ 2 φ in Lemma 1, as φ = mσ W for β W = O m k. However, Baranchik (1973 also includes an intercept that is estimated, but does not enter the loss function. To formally use the result for our case without intercept in the first-step prediction exercise, I construct an augmented problem such that the dominance result in the augmented problem implies the theorem. To this end, let vec(w a N(0 k(n m,σ W I n m, Y a W a = w a N(w a γ,σ 2 I n m. (which has one additional sample point, and could without loss include intercepts in W a, Y a. By Baranchik (1973, Theorem 1, the estimator ˆγ a = ( 1 p (Y a h a Y a ˆγ a,ols 2 (W a h a W a ˆγ a,ols 2 (W a h a W a ˆγ a,ols strictly dominates ˆγ a,ols = ((W a h a W a 1 (W a h a Z a, where h a = I n m 1 n m 1 n m /(n m, in the sense that E a [(ˆγ a γ Σ W (ˆγ a γ] < E a [(ˆγ a,ols γ Σ W (ˆγ a,ols γ] 9

10 ( for any γ R k 2(k 2, provided that p 0, n m k+2 with k 3 and n m k +2. We now show that this implies dominance of ˆγ(Ỹ, W for ( ˆγ(ỹ, w = 1 pỹ ỹ ˆγ OLS (ỹ, w 2 w w ˆγ OLS (ỹ, w 2 w w ˆγ OLS (ỹ, w in the original problem. Letq a R (n m (n m 1 be such that(q a,1 n m /(n m is orthonormal (that is, the columns of q a complete 1 n m /(n m to an orthonormal basis of R m n. This implies that q a (q a = h a and (q a q a = I n m 1. Then, (q a (Y a,w a d = (Ỹ, W. In particular, ((Y a h a (Y a,(y a h a (W a,(w a h a (W a d = (Ỹ Ỹ,Ỹ W, W W and thus (ˆγ a,ˆγ a,ols d = (ˆγ(Ỹ, W,ˆγ OLS (Ỹ, W. We have thus established Ẽ[(ˆγ(Ỹ, W γ Σ W (ˆγ(Ỹ, W γ] < Ẽ[(ˆγOLS (Ỹ, W γ OLS Σ W (ˆγ (Ỹ, W γ]. Note that ˆγ OLS (ỹ, w = ( w w 1 ỹ w in Lemma 1 does indeed yield ˆγ OLS and ˆβ OLS in the theorem, and that this extends to ˆγ and ˆβ by (1 p y 2 ˆγ(q y,q w = q q ˆγ OLS (... 2 w q q w ˆγ OLS (... 2 ˆγ OLS (... w q q w with q q = h(i x(x hx 1 x h and SSR = Y 1ˆα OLS Xˆβ OLS Wˆγ OLS 2 = Y Wˆγ OLS 2 h(i X(X hx 1 X h = Y 2 h(i X(X hx 1 X h WˆγOLS 2 h(i X(X hx 1 X h = Y 2 h(i X(X hx 1 X h ˆγOLS 2 W h(i X(X hx 1 X hw. Unbiasedness and dominance follow with β W = O m k in Lemma 1. 10

11 Note that the result extends to the positive-part analog for which the shrinkage factor is set to zero whenever the expression is negative. For m = 1, the following dominance is immediate: Corollary 2 (A non-contradiction of Gauss Markov. For exogenous treatment, m = 1, k 3, and n k + 3, there exists an estimator ˆβ with E[ˆβ X=x] = β and Var(ˆβ X=x < Var(ˆβ OLS X=x. The assumption of exogenous treatment is essential for this result, as dropping conditioning on W and restricting interest to β would not suffice to break optimality of linear least-squares. 3 Invariance Properties and Bayesian Interpretation Starting with the transformations in Section 2.3, we consider the decision problem of estimating β (equivalently, µ x. Guided by the treatment of a linear panel-data model in Chamberlain and Moreira (2009, I develop the specific estimator proposed in Theorem 1 as (the empirical Bayes version of an invariant Bayes estimator with respect to a partially uninformative (improper Jeffreys prior. 3.1 Decision problem set-up In this section, we condition on X throughout and assume that covariates W are Normally distributed given X. Writing W x = q x W,W = q W,Y x = q x Y,Y = q Y, the transformation developed in Section 2.3 yields the joint distribution ( W x W = ( µ W O s k ( ( Yx µ x +Wx Y = γ W γ +V W Σ 1/2 W (V W ij iid N(0,1 +V Y σ 2 (V Y i iid N(0,1 (2 where Σ 1/2 W is the unique symmetric positive-definite square-root of the symmetric positive-definite matrix Σ W, and V W and V Y are independent. Here, 11

12 in addition to µ x = q xxβ, also µ W = q xxβ W, and s = n m 1. I write Z = R m+s R (m+s k for the sample space from which (Y,W is drawn according to this P θ, where I parametrize θ = (µ x,γ Θ = R m R k. (I take σ 2,Σ W,µ W to be constants. The action space is A = R m, from which an estimate of µ x is chosen. As the loss function L : Θ A R I take squared-error loss L(θ,a = µ x a 2. An estimator ˆβ : Z A from the previous section is a feasible decision rule in this decision problem. 3.2 A set of transformations For an element g = (g µ,g x,g W,g in the (product group G = R m O(m O(k O(s, where R m denotes the group of real numbers with addition (neutral element 0 and O(k the group of ortho-normal matrices in R k k with matrix multiplication (neutral element I k, consider the following set of transformations (which are actions of G on Z,Θ,A: Sample space: m Z : G Z Z, (g,(y x,y,w x,w (g x y x +g µ,g y,g x w x Σ 1/2 W g WΣ 1/2 W,g w Σ 1/2 W g WΣ 1/2 W Parameter space: m Θ : G Θ Θ, (g,(µ x,γ (g x µ x +g µ,σ 1/2 W g WΣ 1/2 W γ Action space: m A : G A A,(g,a g x a+g µ For exogenous treatment, these transformations are tied together by leaving model and loss invariant. Indeed, the following is immediate from Equation 2: 3 Proposition 1 (Invariance of model and loss. For µ W = O m k : 3 Alternatively, we could have treated µ W as an element of the parameter space and extend the analysis to the case of endogenous treatment. Adding (g,µ W g xµ WΣ 1/2 W g WΣ 1/2 W to the action on the parameter space would have retained invariance. 12

13 1. The model is invariant: m Z (g,(y,w P mθ (g,θ for all g G. 2. The loss is invariant: L(m Θ (g,θ,m A (g,a = L(θ,a for all g G. 3.3 An invariant Bayes estimator... By Proposition 1, a natural (generalized Bayes estimator of µ x is derived from an improper prior on θ that is invariant under the action of G on Θ, as this will yield a decision rule d : Z A that is invariant in the sense that d(m Z (g,(y,w = m A (g,d((y,w for all (g,(y,w G Z. This implies for µ x as an improper prior the Haar measure with respect to the translation action (i.e. up to a multiplicative constant the σ-finite Lebesgue measure on R m, and for γ a prior that is uniform on ellipsoids γ Σ W γ = ω. Taking ω χ 2 τ 2 m with some τ > 0 yields the prior γ N(0,τ2 Σ 1 W. With a product prior for θ, the resulting generalized Bayes estimator for µ x which minimizes posterior loss conditional on the data is E[µ x Y = y,w = w] = y x w x E[γ Y = y,w = w] = y x w x (w w +σ 2 Σ W /τ 2 1 w y and a specific empirical Bayes implementation Replacing Σ W by the specific sample analog W W /s, we obtain the estimator Y x sτ2 Similarly assuming that W sτ 2 +σ 2 x ((W W 1 (W Y. γ N(0,sτ 2 W W sτ, an unbiased estimator of 2 (given W is sτ 2 +σ 2 C = 1 (Y (I s W ((W W 1 (W Y /(s k (Y W ((W W 1 (W Y /(k 2. This estimator corresponds to the estimator from Theorem 1 at p = k 2 k 2 n m k 1 s k =. By construction, it retains the invariance of the associated generalized Bayes estimator. This is not specific to this value of p: Proposition 2 (Invariance of estimator. For any p, the estimator ˆβ from Theorem 1 is invariant with respect to the above actions of G. 13

14 Conclusion A natural application of James Stein shrinkage to control variables in a Normal linear model consistently reduces expected prediction error without introducing bias in the treatment parameter of interest provided treatment is random. In this case, the linear least-squares estimator is thus inadmissible even among unbiased estimators. In a companion paper (Spiess, 2017, I show how shrinkage in at least four instrumental variables in a canonical structural form provides consistent bias improvement over the two-stage least-squares estimator. Together, these results suggests different roles of overfitting in control and instrumental variable coefficients, respectively: while overfitting to control variables induces variance, overfitting to instrumental variables in the first stage of a two-stage least-squares procedure induces bias. References Baranchik, A. J. (1973. Inadmissibility of Maximum Likelihood Estimators in Some Multiple Regression Problems with Three or More Independent Variables. The Annals of Statistics, 1(2: Chamberlain, G. and Moreira, M. J. (2009. Decision Theory Applied to a Linear Panel Data Model. Econometrica, 77(1: Hansen, B. E. (2007. Least Squares Model Averaging. Econometrica, 75(4: Hansen, B. E. (2016. Efficient shrinkage in parametric models. Journal of Econometrics, 190(1: James, W. and Stein, C. (1961. Estimation with quadratic loss. Fourth Berkeley Symposium. Sclove, S. L. (1968. Improved estimators for coefficients in linear regression. Journal of the American Statistical Association, 63(322:

15 Spiess, J. (2017. Bias Reduction in Instrumental Variable Estimation through First-Stage Shrinkage. arxiv preprint arxiv:

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility American Economic Review: Papers & Proceedings 2016, 106(5): 400 404 http://dx.doi.org/10.1257/aer.p20161082 Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility By Gary Chamberlain*

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Lecture 6: Geometry of OLS Estimation of Linear Regession

Lecture 6: Geometry of OLS Estimation of Linear Regession Lecture 6: Geometry of OLS Estimation of Linear Regession Xuexin Wang WISE Oct 2013 1 / 22 Matrix Algebra An n m matrix A is a rectangular array that consists of nm elements arranged in n rows and m columns

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Multivariate Regression Analysis

Multivariate Regression Analysis Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is Multinomial Data The multinomial distribution is a generalization of the binomial for the situation in which each trial results in one and only one of several categories, as opposed to just two, as in

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

3. Linear Regression With a Single Regressor

3. Linear Regression With a Single Regressor 3. Linear Regression With a Single Regressor Econometrics: (I) Application of statistical methods in empirical research Testing economic theory with real-world data (data analysis) 56 Econometrics: (II)

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017 Summary of Part II Key Concepts & Formulas Christopher Ting November 11, 2017 christopherting@smu.edu.sg http://www.mysmu.edu/faculty/christophert/ Christopher Ting 1 of 16 Why Regression Analysis? Understand

More information

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2) RNy, econ460 autumn 04 Lecture note Orthogonalization and re-parameterization 5..3 and 7.. in HN Orthogonalization of variables, for example X i and X means that variables that are correlated are made

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Bayes Estimators & Ridge Regression

Bayes Estimators & Ridge Regression Readings Chapter 14 Christensen Merlise Clyde September 29, 2015 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) How Good are Estimators? Quadratic loss

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 17, 2012 Outline Heteroskedasticity

More information

Statistical Inference with Monotone Incomplete Multivariate Normal Data

Statistical Inference with Monotone Incomplete Multivariate Normal Data Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 1/4 Statistical Inference with Monotone Incomplete Multivariate Normal Data This talk is based on joint work with my wonderful

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Advanced Econometrics I

Advanced Econometrics I Lecture Notes Autumn 2010 Dr. Getinet Haile, University of Mannheim 1. Introduction Introduction & CLRM, Autumn Term 2010 1 What is econometrics? Econometrics = economic statistics economic theory mathematics

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

General Linear Model: Statistical Inference

General Linear Model: Statistical Inference Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Reference: Davidson and MacKinnon Ch 2. In particular page

Reference: Davidson and MacKinnon Ch 2. In particular page RNy, econ460 autumn 03 Lecture note Reference: Davidson and MacKinnon Ch. In particular page 57-8. Projection matrices The matrix M I X(X X) X () is often called the residual maker. That nickname is easy

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17 Estimating Estimable Functions of β Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 7 The Response Depends on β Only through Xβ In the Gauss-Markov or Normal Theory Gauss-Markov Linear

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Lecture 7 October 13

Lecture 7 October 13 STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Empirical Economic Research, Part II

Empirical Economic Research, Part II Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction

More information

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance: 8. PROPERTIES OF LEAST SQUARES ESTIMATES 1 Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = 0. 2. The errors are uncorrelated with common variance: These assumptions

More information

The Multiple Regression Model Estimation

The Multiple Regression Model Estimation Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:

More information

Nonstationary Panels

Nonstationary Panels Nonstationary Panels Based on chapters 12.4, 12.5, and 12.6 of Baltagi, B. (2005): Econometric Analysis of Panel Data, 3rd edition. Chichester, John Wiley & Sons. June 3, 2009 Agenda 1 Spurious Regressions

More information

Bayesian methods in economics and finance

Bayesian methods in economics and finance 1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

Online Appendix FIXED EFFECTS, INVARIANCE, AND SPATIAL VARIATION IN INTERGENERATIONAL MOBILITY. Gary Chamberlain Harvard University ABSTRACT

Online Appendix FIXED EFFECTS, INVARIANCE, AND SPATIAL VARIATION IN INTERGENERATIONAL MOBILITY. Gary Chamberlain Harvard University ABSTRACT Online Appendix 10 October 2014 Revised, 11 January 2016 FIXED EFFECTS, INVARIANCE, AND SPATIAL VARIATION IN INTERGENERATIONAL MOBILITY Gary Chamberlain Harvard University ABSTRACT Chetty, Hendren, Kline,

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key Introduction to Econometrics Midterm Examination Fall 2005 Answer Key Please answer all of the questions and show your work Clearly indicate your final answer to each question If you think a question is

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

The risk of machine learning

The risk of machine learning / 33 The risk of machine learning Alberto Abadie Maximilian Kasy July 27, 27 2 / 33 Two key features of machine learning procedures Regularization / shrinkage: Improve prediction or estimation performance

More information

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Sensitivity of GLS estimators in random effects models

Sensitivity of GLS estimators in random effects models of GLS estimators in random effects models Andrey L. Vasnev (University of Sydney) Tokyo, August 4, 2009 1 / 19 Plan Plan Simulation studies and estimators 2 / 19 Simulation studies Plan Simulation studies

More information

Advanced Econometrics

Advanced Econometrics Based on the textbook by Verbeek: A Guide to Modern Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna May 16, 2013 Outline Univariate

More information

Econometrics II - EXAM Answer each question in separate sheets in three hours

Econometrics II - EXAM Answer each question in separate sheets in three hours Econometrics II - EXAM Answer each question in separate sheets in three hours. Let u and u be jointly Gaussian and independent of z in all the equations. a Investigate the identification of the following

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) The Simple Linear Regression Model based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #2 The Simple

More information

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Tak Wai Chau February 20, 2014 Abstract This paper investigates the nite sample performance of a minimum distance estimator

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

Lecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973]

Lecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973] Stats 300C: Theory of Statistics Spring 2018 Lecture 20 May 18, 2018 Prof. Emmanuel Candes Scribe: Will Fithian and E. Candes 1 Outline 1. Stein s Phenomenon 2. Empirical Bayes Interpretation of James-Stein

More information

Carl N. Morris. University of Texas

Carl N. Morris. University of Texas EMPIRICAL BAYES: A FREQUENCY-BAYES COMPROMISE Carl N. Morris University of Texas Empirical Bayes research has expanded significantly since the ground-breaking paper (1956) of Herbert Robbins, and its province

More information

Eco517 Fall 2014 C. Sims FINAL EXAM

Eco517 Fall 2014 C. Sims FINAL EXAM Eco517 Fall 2014 C. Sims FINAL EXAM This is a three hour exam. You may refer to books, notes, or computer equipment during the exam. You may not communicate, either electronically or in any other way,

More information

A Bayesian Treatment of Linear Gaussian Regression

A Bayesian Treatment of Linear Gaussian Regression A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,

More information

Advanced Quantitative Methods: ordinary least squares

Advanced Quantitative Methods: ordinary least squares Advanced Quantitative Methods: Ordinary Least Squares University College Dublin 31 January 2012 1 2 3 4 5 Terminology y is the dependent variable referred to also (by Greene) as a regressand X are the

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Birkbeck Working Papers in Economics & Finance

Birkbeck Working Papers in Economics & Finance ISSN 1745-8587 Birkbeck Working Papers in Economics & Finance Department of Economics, Mathematics and Statistics BWPEF 1809 A Note on Specification Testing in Some Structural Regression Models Walter

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel

More information

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij = 20. ONE-WAY ANALYSIS OF VARIANCE 1 20.1. Balanced One-Way Classification Cell means parametrization: Y ij = µ i + ε ij, i = 1,..., I; j = 1,..., J, ε ij N(0, σ 2 ), In matrix form, Y = Xβ + ε, or 1 Y J

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) 1 2 Model Consider a system of two regressions y 1 = β 1 y 2 + u 1 (1) y 2 = β 2 y 1 + u 2 (2) This is a simultaneous equation model

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

Studentization and Prediction in a Multivariate Normal Setting

Studentization and Prediction in a Multivariate Normal Setting Studentization and Prediction in a Multivariate Normal Setting Morris L. Eaton University of Minnesota School of Statistics 33 Ford Hall 4 Church Street S.E. Minneapolis, MN 55455 USA eaton@stat.umn.edu

More information

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003. A Note on Bootstraps and Robustness Tony Lancaster, Brown University, December 2003. In this note we consider several versions of the bootstrap and argue that it is helpful in explaining and thinking about

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

ESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE

ESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE Statistica Sinica 9 2009, 885-903 ESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE Lawrence D. Brown and Linda H. Zhao University of Pennsylvania Abstract: Many multivariate Gaussian models

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

WLS and BLUE (prelude to BLUP) Prediction

WLS and BLUE (prelude to BLUP) Prediction WLS and BLUE (prelude to BLUP) Prediction Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 21, 2018 Suppose that Y has mean X β and known covariance matrix V (but Y need

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

STK-IN4300 Statistical Learning Methods in Data Science

STK-IN4300 Statistical Learning Methods in Data Science Outline of the lecture STK-I4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no Model Assessment and Selection Cross-Validation Bootstrap Methods Methods using Derived Input

More information

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality. 88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

The Simple Regression Model. Part II. The Simple Regression Model

The Simple Regression Model. Part II. The Simple Regression Model Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Quantile Regression for Panel/Longitudinal Data

Quantile Regression for Panel/Longitudinal Data Quantile Regression for Panel/Longitudinal Data Roger Koenker University of Illinois, Urbana-Champaign University of Minho 12-14 June 2017 y it 0 5 10 15 20 25 i = 1 i = 2 i = 3 0 2 4 6 8 Roger Koenker

More information