Weighted Least Squares I
|
|
- Ezra O’Neal’
- 5 years ago
- Views:
Transcription
1 Weighted Least Squares I for i = 1, 2,..., n we have, see [1, Bradley], data: Y i x i i.n.i.d f(y i θ i ), where θ i = E(Y i x i ) co-variates: x i = (x i1, x i2,..., x ip ) T let X n p be the matrix of covariates with rows x T i parameter of interest: β = (β 1, β 2,..., β p ), p < n θ i = E(Y i x i ) = β T x i V ar(y i x i ) = v i (φ) has a known form, which doesn t depend on β, v i (φ) s are not all the same and φ is known want to estimate β ignoring the underlying density, one could use the Weighted Least Squares estimator: β W LS = arg min β ( ) 2 v i (φ) 1 Y i β T x i February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 1
2 WLS II one could also use the Maximum Likelihood Estimator: β MLE = arg max β log(l(β)) = arg max β for WLS we solve the following normal equation: log ( ) f(y i β T x i ) ( ) v i (φ) 1 Y i β T x i x ij = 0, j = 1, 2,..., p (1) for MLE we solve the following system of equations: β j log ( ) f(y i β T x i ) = 0, j = 1, 2,..., p (2) for certain choice of f( ), β W LS = β MLE, what are those? February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 2
3 NEF of Distributions: I NEF stands for Natural Exponential Family a NEF looks like: f(y θ) = h(y) exp[p (θ)y Q(θ)] where θ = E(Y ) and range of Y doesn t depend on θ consider f(y θ)dy = 1 or h(y) exp[p (θ)y Q(θ)]dy = 1 and assume differentiation under the integral sign is possible apply d dθ to both sides of the above to get: θ = E(Y ) = Q (θ) P (θ), why? apply d2 dθ 2 to both sides of the above to get: V ar(y ) = 1 P (θ), why? February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 3
4 WLS and MLE I if f(y i β T x i ) all come from a NEF, then β W LS = β MLE sketch of proof: log = β j ( ) f(y i β T x i ) log ( ) f(y i β T x i ) = = = = = {log(h(y i )) + P (β T x i )Y i Q(β T x i )} {P (θ i )x ij Y i Q (θ i )x ij } ( ) P (θ i ) Y i Q (θ i ) P (θ i ) x ij v i (φ) 1 (Y i E(Y i x i )) x ij ( ) v i (φ) 1 Y i β T x i x ij February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 4
5 WLS and MLE II so equation (2) boils down to solving for: ( ) v i (φ) 1 Y i β T x i x ij = 0, j = 1, 2,..., p the above is exactly same as equation (1), Q.E.D. note the solutions to above equations also satisfies (how?): ( X T W X ) βw LS = X T W Y = β W LS = ( X T W X ) 1 X T W Y where W is diagonal with (W ) ii = v i (φ) 1 February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 5
6 Example I Heteroskedastic Least Squares: for i = 1, 2,..., n we have, i.n.i.d Y i x i Normal 1 (θ i, σ 2 k(x i )), for some known constant σ 2 and a known function k( ) with k : R p (0, ) θ i = E(Y i x i ) = β T x i want to estimate β so we take diagonal W such that (W ) ii = 1/(σ 2 k(x i )) and β W LS = ( X T W X ) 1 ( X T W Y ) now β W LS = β MLE because for Normal distribution comes from a NEF: 2 3 h i y exp 2 Normal 1 (θ, σ 2 2σ k(x); y) = 2 k(x i ) p exp θ θ 2 6 2πσ2 k(x i ) 4 σ 2 y k(x i ) 2σ 2 k(x i ) 7 5 {z } {z } {z } h(y) P (θ) Q(θ) February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 6
7 Iteratively Reweighted Least Squares I suppose in the previous setting, for a known non-linear function m(, ) with first derivative we have: θ i = m(β, x i ) want to estimate β ignoring the underlying density, one uses the Iteratively Reweighted Least Squares estimator: β IRLS = arg min β v i (φ) 1 (Y i m(β, x i )) 2 one can show under this set up, as well, β IRLS = β MLE the proof is very similar to the proof of β W LS = β MLE, which we did before, left as an assignment problem February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 7
8 IRLS II here we need to solve the following normal equation: v i (φ) 1 (Y i m(β, x i )) β j m(β, x i ) = 0, j = 1, 2,..., p (3) the problem is the normal equations (3) are not easily solved for β one could use the NR algorithm, instead we are going to use something different February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 8
9 IRLS III a new iterative route: let current update be β n 1 linearize the problem using Taylor expansion: m(β, x i ) m( b β n 1, x i ) + β β b T h n 1 β m( β b i n 1, x i ) now solve the simpler problem: bβ n = arg min β nx j v i (φ) 1 Y i m( β b n 1, x i ) + β b T n 1 h β T β m( β b i ff 2 n 1, x i ) h β m( β b i n 1, x i ) February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 9
10 IRLS IV the simpler problem can be solved with the following normal equations: nx j v i (φ) 1 Y i m( β b n 1, x i ) + β b T n 1 h β m( β b i n 1, x i ) h β T β m( β b i ff n 1, x i ) m( β b β n 1, x i ) = 0, j = 1, 2,..., p (4) j now take: ( X b n 1 ) ij = m( β b β n 1, x i ) j 8 ( W c < v i (φ) 1 if i = j n 1 ) ij = : 0 otherwise ( Y b n 1 ) i = Y i m( β b n 1, x i ) February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 10
11 IRLS V equation (4) amounts to solving (why?): X T n 1Ŵn 1Ŷn 1 = ( XT n 1 Ŵ n 1 Xn 1 ) ( βn β n 1 ) = β n β n 1 = ( XT n 1 Ŵ n 1 Xn 1 ) 1 XT n 1 Ŵ n 1 Ŷ n 1 = β n = β n 1 + ( XT n 1 Ŵ n 1 Xn 1 ) 1 XT n 1 Ŵ n 1 Ŷ n 1 (5) so the second term above looks like the WLS solution of regressing Ŷn 1 on X n 1 with weights Ŵn 1 and we iterate this procedure and hence the name the IRLS algorithm: start with properly chosen initial β 0 and apply the above updating scheme (until convergence) to get β 0 β 1 β 2 β IRLS February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 11
12 IRLS VI note from equation (5) it looks like a NR type update, this is a so called Newton Raphson like algorithm IRLS may or may not converge depending on starting values, much like NR February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 12
13 Example I Heteroskedastic Non-linear Least Squares: for i = 1, 2,..., n we have, i.n.i.d Y i x i Normal 1 (θ i, σ 2 k(x i )), for some known constant σ 2 and a known function k( ) with k : R p (0, ) θ i = E(Y i x i ) = m(β, x i ), for a known non-linear function m( ) with first derivative want to estimate β here for computing β IRLS (= β MLE, why?), we will need: ( X n 1 ) ij = m( β β n 1, x i ) j 1/(σ k(x i )) if i = j (Ŵn 1) ij = 0 otherwise (Ŷn 1) i = Y i m( β n 1, x i ) February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 13
14 IRLS and Scoring I consider the Generalized Linear Model (GLM) set up (a quick recap): random component: f(y i θ i ) come from a NEF, θ i = E(Y i x i ) systematic component: call η i = β T x i, also called the linear predictor link function: an invertible function g( ) such that η i = g(θ i ) with first derivative let V ar(y i x i ) = v i (β, φ), for some known parameter φ want to estimate β going to use scoring to find the MLE: β MLE February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 14
15 IRLS and Scoring II the log likelihood and it s derivative or the score: log (f(y i θ i )) = = β j log (f(y i θ i )) = = {log(h(y i )) + P (θ i )Y i Q(θ i )} (6) β j {P (θ i )Y i Q(θ i )} v i (β, φ) 1 (Y i E(Y i x i )) d i x ij = u j, say, (why?) here d i := θ i η i, i and d i, u i both are functions of β February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 15
16 IRLS and Scoring III if v(, ) doesn t depend on β (assume it from now on), then the information matrix entries simplify to: I(β) kj = E [ ] u j β k = v i (φ) 1 d 2 i x ij x ik (why?) in case v(, ) does depend on β, one needs carefully compute the information matrix entries on a case-by-case basis February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 16
17 IRLS and Scoring IV define: so we have (why?): (X) ij = x ij = v i (φ) 1 d 2 i ( β n 1 ) if i = j ij 0 otherwise ( ( Rn 1 )i = Y i g 1 ( β T n 1x i )) / d i ( β n 1 ) (Ŵn 1 ) I( β n 1 ) = X T Ŵ n 1 X T ( u( β n 1 ) = v i (φ) 1 Y i g 1 ( β T ) n 1x i ) d i ( β n 1 )x ij = X T Ŵ n 1 Rn 1 February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 17
18 IRLS and Scoring V now the scoring update satisfies: β n = β n 1 + [ I( β n 1 )] 1 u( βn 1 ) = β n = β n 1 + (X T Ŵ n 1 X T ) 1 X T Ŵ n 1 Rn 1 so, scoring updates for the MLE is reduces to some IRLS updates for the NEF densities February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 18
19 Example I Logistic Regression: for i = 1, 2,..., n we have, Y i x i i.n.i.d Bernoulli(θ i ) we have η i = β T x i also, η i = g(θ i ) = log ( θi 1 θ i ), the well known logit transform note if we take η i = g(θ i ) = Φ 1 (θ i ), the well known probit transform, then we will have the probit regression model (here Φ 1 ( ) is the inverse cdf of the Normal 1 (0, 1) distribution) what will be the expressions for Ŵn 1 and Ŷn 1 in this case? February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 19
20 References [1] Edwin L. Bradley. The equivalence of maximum likelihood and weighted least squares estimates in the exponential family. Journal of the American Statistical Association, 68: , [2] A. Charnes, E. L. Frome, and P. L. Yu. The equivalence of generalized least squares and maximum likelihood estimates in the exponential family. Journal of the American Statistical Association, 71: , [3] P. J. Green. Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives (with discussion). Journal of the Royal Statistical Society, Series B, Methodological, 46: , February 17, 2006 c Gopi Goswami (goswami@stat.harvard.edu) Page 20
MIT Spring 2016
Generalized Linear Models MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Generalized Linear Models 1 Generalized Linear Models 2 Generalized Linear Model Data: (y i, x i ), i = 1,..., n where y i : response
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More informationLinear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52
Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two
More informationLecture 16 Solving GLMs via IRWLS
Lecture 16 Solving GLMs via IRWLS 09 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due next class problem set 6, November 18th Goals for today fixed PCA example
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 15 Outline 1 Fitting GLMs 2 / 15 Fitting GLMS We study how to find the maxlimum likelihood estimator ˆβ of GLM parameters The likelihood equaions are usually
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationSTA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random
STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationFall 2003: Maximum Likelihood II
36-711 Fall 2003: Maximum Likelihood II Brian Junker November 18, 2003 Slide 1 Newton s Method and Scoring for MLE s Aside on WLS/GLS Application to Exponential Families Application to Generalized Linear
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationLinear and logistic regression
Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationCox regression: Estimation
Cox regression: Estimation Patrick Breheny October 27 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/19 Introduction The Cox Partial Likelihood In our last lecture, we introduced the Cox partial
More informationSTAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.
STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationGeneralized Linear Models 1
Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter
More informationGeneralized linear models
Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationLecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)
Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:
More informationSCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.
SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression
More informationAsymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands
Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationGeneralized Linear Models I
Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.
More informationAnswer Key for STAT 200B HW No. 8
Answer Key for STAT 200B HW No. 8 May 8, 2007 Problem 3.42 p. 708 The values of Ȳ for x 00, 0, 20, 30 are 5/40, 0, 20/50, and, respectively. From Corollary 3.5 it follows that MLE exists i G is identiable
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More informationSB1a Applied Statistics Lectures 9-10
SB1a Applied Statistics Lectures 9-10 Dr Geoff Nicholls Week 5 MT15 - Natural or canonical) exponential families - Generalised Linear Models for data - Fitting GLM s to data MLE s Iteratively Re-weighted
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationGeneralized Estimating Equations
Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationInformation in a Two-Stage Adaptive Optimal Design
Information in a Two-Stage Adaptive Optimal Design Department of Statistics, University of Missouri Designed Experiments: Recent Advances in Methods and Applications DEMA 2011 Isaac Newton Institute for
More informationECE531 Lecture 8: Non-Random Parameter Estimation
ECE531 Lecture 8: Non-Random Parameter Estimation D. Richard Brown III Worcester Polytechnic Institute 19-March-2009 Worcester Polytechnic Institute D. Richard Brown III 19-March-2009 1 / 25 Introduction
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationNormalising constants and maximum likelihood inference
Normalising constants and maximum likelihood inference Jakob G. Rasmussen Department of Mathematics Aalborg University Denmark March 9, 2011 1/14 Today Normalising constants Approximation of normalising
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationMH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution
MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More information2 Nonlinear least squares algorithms
1 Introduction Notes for 2017-05-01 We briefly discussed nonlinear least squares problems in a previous lecture, when we described the historical path leading to trust region methods starting from the
More informationTime Series Analysis
Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Regression based methods, 1st part: Introduction (Sec.
More informationChap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More informationNow consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.
Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationLogistic Regression and Generalized Linear Models
Logistic Regression and Generalized Linear Models Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/2 Topics Generative vs. Discriminative models In
More informationMATH Generalized Linear Models
MATH 523 - Generalized Linear Models Pr. David A. Stephens Course notes by Léo Raymond-Belzile Leo.Raymond-Belzile@mail.mcgill.ca The current version is that of July 31, 2018 Winter 2013, McGill University
More informationIterative Reweighted Least Squares
Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationML estimation: Random-intercepts logistic model. and z
ML estimation: Random-intercepts logistic model log p ij 1 p = x ijβ + υ i with υ i N(0, συ) 2 ij Standardizing the random effect, θ i = υ i /σ υ, yields log p ij 1 p = x ij β + σ υθ i with θ i N(0, 1)
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 18 Outline 1 Logistic regression for Binary data 2 Poisson regression for Count data 2 / 18 GLM Let Y denote a binary response variable. Each observation
More informationBayesian Logistic Regression
Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative
More informationBayesian Multivariate Logistic Regression
Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationModeling Binary Outcomes: Logit and Probit Models
Modeling Binary Outcomes: Logit and Probit Models Eric Zivot December 5, 2009 Motivating Example: Women s labor force participation y i = 1 if married woman is in labor force = 0 otherwise x i k 1 = observed
More informationLinear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model
Regression: Part II Linear Regression y~n X, 2 X Y Data Model β, σ 2 Process Model Β 0,V β s 1,s 2 Parameter Model Assumptions of Linear Model Homoskedasticity No error in X variables Error in Y variables
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More information,..., θ(2),..., θ(n)
Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.
More informationEXTENDING PARTIAL LEAST SQUARES REGRESSION
EXTENDING PARTIAL LEAST SQUARES REGRESSION ATHANASSIOS KONDYLIS UNIVERSITY OF NEUCHÂTEL 1 Outline Multivariate Calibration in Chemometrics PLS regression (PLSR) and the PLS1 algorithm PLS1 from a statistical
More information1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation
1 Outline. 1. Motivation 2. SUR model 3. Simultaneous equations 4. Estimation 2 Motivation. In this chapter, we will study simultaneous systems of econometric equations. Systems of simultaneous equations
More informationSTA 216, GLM, Lecture 16. October 29, 2007
STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationSGN Advanced Signal Processing Project bonus: Sparse model estimation
SGN 21006 Advanced Signal Processing Project bonus: Sparse model estimation Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 12 Sparse models Initial problem: solve
More informationComputational methods for mixed models
Computational methods for mixed models Douglas Bates Department of Statistics University of Wisconsin Madison March 27, 2018 Abstract The lme4 package provides R functions to fit and analyze several different
More informationMultinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is
Multinomial Data The multinomial distribution is a generalization of the binomial for the situation in which each trial results in one and only one of several categories, as opposed to just two, as in
More informationModels, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley
Models, Testing, and Correction of Heteroskedasticity James L. Powell Department of Economics University of California, Berkeley Aitken s GLS and Weighted LS The Generalized Classical Regression Model
More informationA Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,
A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More informationGeneralized Linear Models
Generalized Linear Models David Rosenberg New York University April 12, 2015 David Rosenberg (New York University) DS-GA 1003 April 12, 2015 1 / 20 Conditional Gaussian Regression Gaussian Regression Input
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationChapter 4: Asymptotic Properties of the MLE (Part 2)
Chapter 4: Asymptotic Properties of the MLE (Part 2) Daniel O. Scharfstein 09/24/13 1 / 1 Example Let {(R i, X i ) : i = 1,..., n} be an i.i.d. sample of n random vectors (R, X ). Here R is a response
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationSome explanations about the IWLS algorithm to fit generalized linear models
Some explanations about the IWLS algorithm to fit generalized linear models Christophe Dutang To cite this version: Christophe Dutang. Some explanations about the IWLS algorithm to fit generalized linear
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationMatrix Approach to Simple Linear Regression: An Overview
Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix
More informationGeneralized Linear Models (1/29/13)
STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability
More informationLecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016
Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1
MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical
More informationLecture 17: Likelihood ratio and asymptotic tests
Lecture 17: Likelihood ratio and asymptotic tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f
More informationLikelihoods for Generalized Linear Models
1 Likelihoods for Generalized Linear Models 1.1 Some General Theory We assume that Y i has the p.d.f. that is a member of the exponential family. That is, f(y i ; θ i, φ) = exp{(y i θ i b(θ i ))/a i (φ)
More informationGeneralized Linear Models and Exponential Families
Generalized Linear Models and Exponential Families David M. Blei COS424 Princeton University April 12, 2012 Generalized Linear Models x n y n β Linear regression and logistic regression are both linear
More informationThe equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model
Applied and Computational Mathematics 2014; 3(5): 268-272 Published online November 10, 2014 (http://www.sciencepublishinggroup.com/j/acm) doi: 10.11648/j.acm.20140305.22 ISSN: 2328-5605 (Print); ISSN:
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject
More informationESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS
ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, N.C.,
More informationD-optimal Designs for Factorial Experiments under Generalized Linear Models
D-optimal Designs for Factorial Experiments under Generalized Linear Models Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago Joint research with Abhyuday
More informationMachine Learning Lecture Notes
Machine Learning Lecture Notes Predrag Radivojac February 2, 205 Given a data set D = {(x i,y i )} n the objective is to learn the relationship between features and the target. We usually start by hypothesizing
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationIntroduction An approximated EM algorithm Simulation studies Discussion
1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random
More informationPh.D. Qualifying Exam Friday Saturday, January 3 4, 2014
Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently
More information