STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.
|
|
- Job Waters
- 5 years ago
- Views:
Transcription
1 STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015
2 Linear Regression Review
3 Linear Regression Review y = β 0 + β 1 x + ɛ Suppose that there is some global linear relationship between height and weight in the population. For example Weight = β 0 + β 1 Height + ɛ Obviously height and weight don t fall on a perfectly straight line: there is some random noise/deviation from the line (ɛ is a random variable with E(ɛ) = 0 and V ar(ɛ) = σ 2 ). Moreover, we don t observe everyone in the population, and so there is no way we can figure out what β 0 and β 1 are, but what we can do is get estimates from a sample.
4 Linear Regression Review We observe a sample (blue points) and fit the best line through the sample. The red fitted line corresponds to ŷ = ˆβ 0 + ˆβ 1 x where ˆβ 0 and ˆβ 1 are estimated from our sample using least squares (finding the β values that minimize the distance from the observed points to the line)
5 Linear Regression Review We showed that we can write our linear regression model in matrix form: Y = Xβ + ɛ and the least squares estimate of β is given by ˆβ = arg min β Y Xβ 2 2 = (X T X) 1 X T Y We further showed that ˆβ was unbiased: E( ˆβ) = β and that its variance is given by V ar( ˆβ) = σ 2 (X T X) 1
6 Linear Regression Review The residuals are then defined to be the observed difference between the true y (blue dot) and the fitted ŷ (corresponding point on the red line): e i = y i ŷ i = y i ˆβ 0 ˆβ 1 x i which is different from the unobservable error ɛ i = y i β 0 β 1 x i (this corresponds to the deviation from the observed y i to the true population line, which we don t know!)
7 Multivariate Random Variables
8 Multivariate Random Variables Suppose we have n random variables: Y 1,..., Y n, such that E(Y i ) = µ and Cov(Y i, Y j ) = σ ij. Suppose that we want to consider the Y i s jointly as a single multivariate random variable, Y Y = (Y 1,..., Y n ) Then this multivariate random variable Y has mean vector and covariance matrix: mean vector: E(Y) = µ = (µ 1,..., µ n ) σ 2 1 σ 1,2... σ 1,n σ 2,1 σ σ 2,n covariance matrix: Cov(Y) = Σ Y,Y =.... σ n,1 σ n,2... σn 2
9 Multivariate Random Variables E(Y) = µ Cov(Y) = Σ Suppose we have the following linear transformation of Y: Z = b + AY where b is a fixed vector and A a fixed matrix. What is the mean vector and covariance matrix of Z? E(Z) = E(b + AY) = b + AE(Y) = b + Aµ Cov(Z) = Cov(b + AY) = Cov(AY) = ACov(Y)A T = AΣA T
10 Multivariate Random Variables Suppose that Y is a multivariate random vector: Y = (Y 1,..., Y n ). Recall that for scalars, a quadratic transformation of y is given by x = ay 2 R. The equivalent form for vectors is X = Y T AY R This is referred to as a random quadratic form in Y
11 Multivariate Random Variables X = Y T AY R is a random quadratic form ia Y How can we calculate the expected value of the quadratic form of a random variable? We can use the following facts 1. Trace and expectation are both linear operators (and so can be interchanged). 2. X = Y T AY R is a real number (not a matrix or vector), and if x R then tr(x) = x. 3. If A and B are matrices of appropriate dimension, then tr(ab) = tr(ba) where the trace of a matrix is the sum of its diagonal entries
12 Multivariate Random Variables So 1. Trace and expectation are both linear operators (and so can be interchanged). 2. If x R then tr(x) = x. 3. If A and B are matrices, then tr(ab) = tr(ba) E [ Y T AY ] = E [ tr(y T AY) ] fact (2) = E [ tr(ayy T ) ] fact (3) = tr ( E [ AYY T ]) fact (1) = tr ( AE [ YY T ]) fact (1) = tr ( A ( Σ + µµ T )) calculating expectation = tr(aσ) + tr(aµµ T ) fact (1) = tr(aσ) + tr(µ T Aµ) fact (3) = tr(aσ) + µ T Aµ fact (2)
13 Example: Residual sum of squares The expected value of a quadratic form of a random variable is given by: E [ Y T AY ] = tr(aσ) + µ T Aµ We can use this to show that an unbiased estimate for σ 2 is given by ˆσ 2 = RSS n p where is in quadratic form RSS = e 2 2 = e T e
14 Example: Residual sum of squares Recall that the residuals are defined by e = Y Ŷ = Y X ˆβ = Y X(X T X) 1 X T Y = (I X(X T X) 1 X T )Y = P X Y Where P X := I X(X T X) 1 X T is a projection matrix onto the space orthogonal to X (check: P X X = 0). Note that a projection matrix is a matrix, A, that satisfies A T = A AA = A 2 = A
15 Example: Residual sum of squares e = (I X(X T X) 1 X T )Y = P T X Y Recall that the residual sum of squares was given by RSS = e 2 2 = e T e = (P X Y ) T (P X Y ) = Y T P T X P X Y = Y T P X Y Thus, since E [ Y T AY ] = tr(aσ) + µ T Aµ, the expected RSS is given by E(RSS) = E(Y T P X Y ) = σ 2 tr(p X ) + E(Y ) T P X E(Y ) (Σ = σ 2 I n ) = σ 2 (n p) How did we get that last equality!?
16 Example: Residual sum of squares E(RSS) = σ 2 tr(p X ) + E(Y ) T P X E(Y ) = σ 2 (n p) To get the last equality, we use our newly acquired knowledge of projection matrices and the trace operator: is a projection matrix orthogonal to X, thus P X E(Y ) = P X Xβ = 0, so P X E(Y ) T P X E(Y ) = 0 The trace of an m m identity matrix is m, so tr(p X ) = tr(i n X(X T X) 1 X T ) = n tr(x(x T X) 1 X T ) (linearity of trace) = n tr((x T X) 1 X T X) (tr(ab) = tr(ba)) = n tr(i p ) = n p
17 Example: Residual sum of squares Note that we have shown that which tells us that is an unbiased estimate of σ 2. E(RSS) = σ 2 (n p) ˆσ 2 = RSS n p
18 Prediction
19 Prediction Suppose we want to predict/fit the responses, Y 1,..., Y n corresponding to the observed predictors, x 1,..., x n (each x i is a 1 p vector), which form the rows of the design matrix X. Then our predicted Y i s (note that these are the same Y s used to define the model) are given by Ŷ i = x i ˆβ = xi (X T X) 1 X T Y Since these Ŷi s are random variables, we might be interested in calculating the variance of these predictions.
20 Prediction Our predicted Y i s are given by Ŷ i = x i ˆβ = xi (X T X) 1 X T Y The variance of the Ŷi s are given by V ar(ŷi) = V ar(x i (X T X) 1 X T Y ) = x i (X T X) 1 X T V ar(y ) X(X T X) 1 x T i = σ 2 x i (X T X) 1 X T X(X T X) 1 x T i = σ 2 x i (X T X) 1 x T i
21 Prediction V ar(ŷi) = σ 2 x i (X T X) 1 x T i Thus the average variance for across the n observations is: 1 n n i=1 V ar(ŷi) = σ2 n n x i (X T X) 1 x T i i=1 = σ2 n tr(x(xt X) 1 X T ) = σ2 n tr((xt X) 1 X T X) = σ2 n tr(i p) = σ 2 p n So the more variables we have in our model, the more variable our fitted/predicted values will be!
22 Logistic Regression
23 Logistic Regression For linear regression, we have Y = Xβ + ɛ and Y is continuous, such that Y i x i N(β T x i, σ 2 ) x i is the ith row/observation in X We want to use logistic regression if Y doesn t take continuous values, but is instead binary, i.e. Y i {0, 1}. The above linear model no longer makes sense, since a linear combination of our (continuous) x s is very unlikely to be equal to 0 or 1. First step: think of a distribution for Y i x i that makes more sense for binary Y...
24 Logistic Regression Since Y i is either 0 or 1 we can model it as a Bernoulli random variable Y i x i Bernoulli(p(x i, β)) where the probability that Y i = 1 is given by some function of our predictors, p(x i, β). We need our probability, p(x i, β) to be in [0, 1] and for it to depend on the linear combination of our predictors, β T x i, in some way. We choose: eβt x i p(x i, β) = 1 + e βt x i This is a nice choice since β T x i 0 implies that p(x i, β) 0.5 so Y i = 1 is most likely β T x i 0 implies that p(x i, β) 0.5 so Y i = 0 is most likely
25 Logistic Regression We have the following logistic regression model: ( ) Y i x i Bernoulli e βt x i 1 + e βt x i If we had a new sample x and we knew the β vector, we could calculate the probability that the corresponding Y = 1: eβt x P (Y = 1 x) = p(x, β) = 1 + e βt x If this probability is greater than 0.5, we would set Ŷ = 1. Otherwise, we would set Ŷ = 0. But we don t know β: we need to estimate the unknown β vector (just like with linear regression).
26 Logistic Regression Let s try to calculate the MLE for β. The likelihood function is given by lik(β) = = = n P (y i X i = x i, β) i=1 ( n i=1 e βt x i 1 + e βt x i ( ) n i=1 e βt x i y i 1 + e βt x i ) yi ( e βt x i ) 1 yi So that the log-likelihood is given by l(β) = n i=1 ) β T x i y i log (1 + e βt x i
27 Logistic Regression The log-likelihood is given by l(β) = n i=1 ) β T x i y i log (1 + e βt x i Differentiating with respect to β gives l(β) = = n x i y i x ie βt x i 1 + e βt x i n x i (y i p(x i, β)) i=1 i=1 = X T (y p) where p = (p(x 1, β), p(x 2, β),..., p(x n, β)). Setting l(β) = 0 cannot yield a closed form solution for β, so we need to use an iterative approach such as Newton s method.
28 Logistic Regression Newton s method: Suppose that we want to find x such that f(x) = 0. Then we could use the iterative approximation given by x n+1 = x n f(x n) f (x n ) and the more iterations we conduct, the closer we get to the root. We need to find the vector β such that l(β) = 0. To do this, we need to conduct a vector version of Newton s method: ( β n+1 = β n 2 ) 1 l(β) β β T l(β)
29 Logistic Regression Newton s method for scalars: Find x such that f(x) = 0 by x n+1 = x n f(x n) f (x n ) Vector version of Newton s method for β: β n+1 = β n ( 2 ) 1 l(β) β β T l(β) f(x) = l(x) is the vector-version of the first derivative f (x) = 2 l(x) x x T is the vector-version of the second derivative
30 Logistic Regression The first derivative (gradient vector) of l(β) with respect to the vector β: 2 l(β) β β T = l(β) = 2 l β 0 β 0 l β 0 l β 1. l β p The second derivative (Hessian matrix) of l(β) with respect to the vector β: 2 l β 0 β l β 0 β p 2 l β 1 β 0. 2 l β p β 0 2 l β 1 β l β 1 β p.. 2 l β p β l β p β p.
31 Logistic Regression Recall that we were trying to find the MLE for β in the logistic regression using Newton s method β n+1 = β n We already showed that and one can show that ( 2 ) 1 l(β) β β T l(β) (β) = X T (y p) 2 l(β) β β T = XT W X where W is a diagonal matrix whose diagonal entries are given by W ii = V ar(y i ) = p(x i, β)(1 p(x i, β))
32 Logistic Regression So our iterative procedure becomes β n+1 = β n + (X T W X) 1 X T (y p) = (X T W X) 1 X T W (Xβ n + W 1 (y p)) = (X T W X) 1 X T W z where z = Xβ n + W 1 (y p) Recall the least squares linear regression β estimator ˆβ = (X T X) 1 X T y and our iterative logistic regression estimator β n+1 = (X T W X) 1 X T W z We say that the estimate has been reweighted by W, and so this estimator is referred to as iteratively reweighted least squares
33 The δ-method
34 The δ-method The δ-method tells us that if we have a sequence of random variables, X n, such that n(xn θ) N(0, σ 2 ) (1) then, for any differentiable and non-zero function, g( ), we have that n(g(xn ) g(θ)) N(0, σ 2 (g (θ)) 2 ) (2) The most common reason we use the δ-method is to identify the variance of a function of a random variable.
35 Example: The δ-method Suppose that X n Binom(n, p). Then since Xn n is the mean of n iid Bernoulli(p) random variables, the cental limit theorem tells us that ( ) Xn n n p N(0, p(1 p)) and that V ar( Xn n ) = p(1 p) n δ-method: for any non-zero, differentiable function g( ), we have ( ( ) ) Xn n g g(p) N(0, p(1 p)(g (p)) 2 ) n For example, we can use this to calculate V ar ( log ( X n n )).
36 Example: The δ-method n ( g ( Xn n ) ) g(p) N(0, p(1 p)(g (p)) 2 ) For example, we can use this to calculate V ar ( log ( X n n )). Select So differentiating gives g(x) = log (x) g (x) = 1 x and the δ-method says: ( ( ) ) Xn n log log(p) n N ( 0, ) p(1 p) p 2
37 Example: The δ-method The δ-method says: ( n log n ( Xn ) ) ( log(p) N 0, so we can see that the variance of log ( ) X n n is ( ( )) Xn V ar log = 1 p n pn ) p(1 p) p 2
Maximum Likelihood Estimation
Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter
More informationMLES & Multivariate Normal Theory
Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate
More informationMatrix Approach to Simple Linear Regression: An Overview
Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix
More information3 Multiple Linear Regression
3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationSTA 2201/442 Assignment 2
STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution
More informationMIT Spring 2015
Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)
More informationLecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices
Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is
More informationSTAT5044: Regression and Anova. Inyoung Kim
STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix
More informationEcon 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )
Econ 60 Matrix Differentiation Let a and x are k vectors and A is an k k matrix. a x a x = a = a x Ax =A + A x Ax x =A + A x Ax = xx A We don t want to prove the claim rigorously. But a x = k a i x i i=
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationPh.D. Qualifying Exam Friday Saturday, January 3 4, 2014
Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently
More informationChapter 5 Matrix Approach to Simple Linear Regression
STAT 525 SPRING 2018 Chapter 5 Matrix Approach to Simple Linear Regression Professor Min Zhang Matrix Collection of elements arranged in rows and columns Elements will be numbers or symbols For example:
More informationLecture 16 Solving GLMs via IRWLS
Lecture 16 Solving GLMs via IRWLS 09 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due next class problem set 6, November 18th Goals for today fixed PCA example
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationMATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests
1/16 MATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests Dominique Guillot Departments of Mathematical Sciences University of Delaware February 17, 2016 Statistical
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More information3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.
7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of
More informationChapter 1. Linear Regression with One Predictor Variable
Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationMa 3/103: Lecture 24 Linear Regression I: Estimation
Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the
More informationSTA 2101/442 Assignment Four 1
STA 2101/442 Assignment Four 1 One version of the general linear model with fixed effects is y = Xβ + ɛ, where X is an n p matrix of known constants with n > p and the columns of X linearly independent.
More informationCh4. Distribution of Quadratic Forms in y
ST4233, Linear Models, Semester 1 2008-2009 Ch4. Distribution of Quadratic Forms in y 1 Definition Definition 1.1 If A is a symmetric matrix and y is a vector, the product y Ay = i a ii y 2 i + i j a ij
More informationLinear Algebra Review
Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and
More informationTopic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form
Topic 7 - Matrix Approach to Simple Linear Regression Review of Matrices Outline Regression model in matrix form - Fall 03 Calculations using matrices Topic 7 Matrix Collection of elements arranged in
More informationNotes on Random Vectors and Multivariate Normal
MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution
More informationMultiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar
Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part X Factor analysis When we have data x (i) R n that comes from a mixture of several Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting,
More informationPreliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38
Preliminaries Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 1 / 38 Notation for Scalars, Vectors, and Matrices Lowercase letters = scalars: x, c, σ. Boldface, lowercase letters
More information4 Multiple Linear Regression
4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ
More informationSTA 2101/442 Assignment 3 1
STA 2101/442 Assignment 3 1 These questions are practice for the midterm and final exam, and are not to be handed in. 1. Suppose X 1,..., X n are a random sample from a distribution with mean µ and variance
More information5.2 Expounding on the Admissibility of Shrinkage Estimators
STAT 383C: Statistical Modeling I Fall 2015 Lecture 5 September 15 Lecturer: Purnamrita Sarkar Scribe: Ryan O Donnell Disclaimer: These scribe notes have been slightly proofread and may have typos etc
More informationMatrix Algebra, Class Notes (part 2) by Hrishikesh D. Vinod Copyright 1998 by Prof. H. D. Vinod, Fordham University, New York. All rights reserved.
Matrix Algebra, Class Notes (part 2) by Hrishikesh D. Vinod Copyright 1998 by Prof. H. D. Vinod, Fordham University, New York. All rights reserved. 1 Converting Matrices Into (Long) Vectors Convention:
More informationPart IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015
Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)
More informationSTA442/2101: Assignment 5
STA442/2101: Assignment 5 Craig Burkett Quiz on: Oct 23 rd, 2015 The questions are practice for the quiz next week, and are not to be handed in. I would like you to bring in all of the code you used to
More informationRegression Models - Introduction
Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,
More informationProperties of the least squares estimates
Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationSTAT 151A: Lab 1. 1 Logistics. 2 Reference. 3 Playing with R: graphics and lm() 4 Random vectors. Billy Fang. 2 September 2017
STAT 151A: Lab 1 Billy Fang 2 September 2017 1 Logistics Billy Fang (blfang@berkeley.edu) Office hours: Monday 9am-11am, Wednesday 10am-12pm, Evans 428 (room changes will be written on the chalkboard)
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationLecture 13: Simple Linear Regression in Matrix Format
See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 13: Simple Linear Regression in Matrix Format 36-401, Section B, Fall 2015 13 October 2015 Contents 1 Least Squares in Matrix
More informationSampling Distributions
Merlise Clyde Duke University September 8, 2016 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1
MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationLecture 9 SLR in Matrix Form
Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R
More informationSTAT Homework 8 - Solutions
STAT-36700 Homework 8 - Solutions Fall 208 November 3, 208 This contains solutions for Homework 4. lease note that we have included several additional comments and approaches to the problems to give you
More informationST 740: Linear Models and Multivariate Normal Inference
ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /
More information18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013
18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are
More informationLecture Notes on Different Aspects of Regression Analysis
Andreas Groll WS 2012/2013 Lecture Notes on Different Aspects of Regression Analysis Department of Mathematics, Workgroup Financial Mathematics, Ludwig-Maximilians-University Munich, Theresienstr. 39,
More informationXβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =
The Gauss-Markov Linear Model y Xβ + ɛ y is an n random vector of responses X is an n p matrix of constants with columns corresponding to explanatory variables X is sometimes referred to as the design
More informationLinear Models Review
Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign
More informationRegression. ECO 312 Fall 2013 Chris Sims. January 12, 2014
ECO 312 Fall 2013 Chris Sims Regression January 12, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License What
More informationBANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1
BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)
More informationBasic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:
8. PROPERTIES OF LEAST SQUARES ESTIMATES 1 Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = 0. 2. The errors are uncorrelated with common variance: These assumptions
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationSTAT 135 Lab 3 Asymptotic MLE and the Method of Moments
STAT 135 Lab 3 Asymptotic MLE and the Method of Moments Rebecca Barter February 9, 2015 Maximum likelihood estimation (a reminder) Maximum likelihood estimation Suppose that we have a sample, X 1, X 2,...,
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationRestricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model
Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationGeneralized linear models
Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................
More informationSummer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.
Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationThe Multivariate Normal Distribution 1
The Multivariate Normal Distribution 1 STA 302 Fall 2014 1 See last slide for copyright information. 1 / 37 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2
More informationWeighted Least Squares I
Weighted Least Squares I for i = 1, 2,..., n we have, see [1, Bradley], data: Y i x i i.n.i.d f(y i θ i ), where θ i = E(Y i x i ) co-variates: x i = (x i1, x i2,..., x ip ) T let X n p be the matrix of
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationRidge Regression. Flachs, Munkholt og Skotte. May 4, 2009
Ridge Regression Flachs, Munkholt og Skotte May 4, 2009 As in usual regression we consider a pair of random variables (X, Y ) with values in R p R and assume that for some (β 0, β) R +p it holds that E(Y
More informationStat 206: Sampling theory, sample moments, mahalanobis
Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because
More informationInverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1
Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is
More informationSTAT 540: Data Analysis and Regression
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State
More information2018 2019 1 9 sei@mistiu-tokyoacjp http://wwwstattu-tokyoacjp/~sei/lec-jhtml 11 552 3 0 1 2 3 4 5 6 7 13 14 33 4 1 4 4 2 1 1 2 2 1 1 12 13 R?boxplot boxplotstats which does the computation?boxplotstats
More informationMAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik
MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,
More informationTopic 2: Logistic Regression
CS 4850/6850: Introduction to Machine Learning Fall 208 Topic 2: Logistic Regression Instructor: Daniel L. Pimentel-Alarcón c Copyright 208 2. Introduction Arguably the simplest task that we can teach
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 15 Outline 1 Fitting GLMs 2 / 15 Fitting GLMS We study how to find the maxlimum likelihood estimator ˆβ of GLM parameters The likelihood equaions are usually
More informationMultivariate Analysis and Likelihood Inference
Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density
More information3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices.
3d scatterplots You can also make 3d scatterplots, although these are less common than scatterplot matrices. > library(scatterplot3d) > y par(mfrow=c(2,2)) > scatterplot3d(y,highlight.3d=t,angle=20)
More informationREGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University
REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.
More informationRegression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood
Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing
More informationThis model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that
Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationThe Multivariate Normal Distribution 1
The Multivariate Normal Distribution 1 STA 302 Fall 2017 1 See last slide for copyright information. 1 / 40 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2
More informationAsymptotic Theory. L. Magee revised January 21, 2013
Asymptotic Theory L. Magee revised January 21, 2013 1 Convergence 1.1 Definitions Let a n to refer to a random variable that is a function of n random variables. Convergence in Probability The scalar a
More informationLinear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77
Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical
More informationAdministration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books
STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,
More information. a m1 a mn. a 1 a 2 a = a n
Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by
More informationLinear models. Linear models are computationally convenient and remain widely used in. applied econometric research
Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationLast lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton
EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares
More informationLecture 6: Linear models and Gauss-Markov theorem
Lecture 6: Linear models and Gauss-Markov theorem Linear model setting Results in simple linear regression can be extended to the following general linear model with independently observed response variables
More information