Introduction to General and Generalized Linear Models

Size: px
Start display at page:

Download "Introduction to General and Generalized Linear Models"

Transcription

1 Introduction to General and Generalized Linear Models Mixed effects models - Part IV Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby January 2011 Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

2 This lecture General mixed effects models Laplace approximation Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

3 General mixed effects models General mixed effects models Let us now look at methods to deal with nonlinear and non-normal mixed effects models. In general it will be impossible to obtain closed form solutions and hence numerical methods must be used. Estimation and inference will be based on likelihood principles. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

4 General mixed effects models General mixed effects models The general mixed effects model can be represented by its likelihood function: L M (θ;y) = L(θ;u,y)du R q where y is the observed random variables, θ is the model parameters to be estimated, and U is the q unobserved random variables or effects. The likelihood function L is the joint likelihood of both the observed and the unobserved random variables. The likelihood function for estimating θ is the marginal likelihood L M obtained by integrating out the unobserved random variables. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

5 General mixed effects models General mixed effects models The integral shown on the previous slide is generally difficult to solve if the number of unobserved random variables is more than a few, i.e. for large values of q. A large value of q significantly increases the computational demands due to the product rule which states that if an integral is sampled in m points per dimension to evaluate it, the total number of samples needed is m q, which rapidly becomes infeasible even for a limited number of random effects. The likelihood function gives a very broad definition of mixed models: the only requirement for using mixed modeling is to define a joint likelihood function for the model of interest. In this way mixed modeling can be applied to any likelihood based statistical modeling. Examples of applications are linear mixed models (LMM) and nonlinear mixed models (NLMM), generalized linear mixed models, but also models based on Markov chains, ODEs or SDEs. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

6 Hierarchical models General mixed effects models As for the Gaussian linear mixed models it is useful to formulate the model as a hierarchical model containing a first stage model f Y u (y;u,β) which is a model for the data given the random effects, and a second stage model f U (u;ψ) which is a model for the random effects. The total set of parameters is θ = (β,ψ). Hence the joint likelihood is given as L(β,Ψ;u,y) = f Y u (y;u,β)f U (u;ψ) Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

7 Hierarchical models General mixed effects models To obtain the likelihood for the model parameters (β, Ψ) the unobserved random effects are again integrated out. The likelihood function for estimating (β, Ψ) is as before the marginal likelihood L M (β,ψ;y) = L(β,Ψ;u,y)du R q where q is the number of random effects, and β and Ψ are the parameters to be estimated. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

8 General mixed effects models Grouping structures and nested effects For nonlinear mixed models where no closed form solution to the likelihood function is available it is necessary to invoke some form of numerical approximation to be able to estimate the model parameters. The complexity of this problem is mainly dependent on the dimensionality of the integration problem which in turn is dependent on the dimension of U and in particular the grouping structure in the data for the random effects. These structures include a single grouping, nested grouping, partially crossed and crossed random effects. For problems with only one level of grouping the marginal likelihood can be simplified as L M (β,ψ;y) = M i=1 R q i f Y ui (y;u i,β)f Ui (u i ;Ψ)du i where q i is the number of random effects for group i and M is the number of groups. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

9 General mixed effects models Grouping structures and nested effects Instead of having to solve an integral of dimension q it is only necessary to solve M smaller integrals of dimension q i. In typical applications there is often just one or only a few random effects for each group, and this thus greatly reduces the complexity of the integration problem. If the data has a nested grouping structure a reduction of the dimensionality of the integral similar to that shown on the previous slide can be performed. An example of a nested grouping structure is data collected from a number of schools, a number of classes within each school and a number of students from each class. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

10 General mixed effects models Grouping structures and nested effects If the nonlinear mixed model is extended to include any structure of random effects such as crossed or partially crossed random effects it is required to evaluate the full multi-dimensional integral Estimation in these models can efficiently be handled using the multivariate Laplace approximation, which only samples the integrand in one point common to all dimensions. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

11 The Laplace approximation For a given set of model parameters θ the joint log-likelihood l(θ,u,y) = log(l(θ,u,y)) is approximated by a second order Taylor approximation around the optimum ũ = û θ of the log-likelihood function w.r.t. the unobserved random variables u, i.e. l(θ,u,y) l(θ,ũ,y) 1 2 (u ũ)t H(ũ)(u ũ) where the first-order term of the Taylor expansion disappears since the expansion is done around the optimum ũ and H(ũ) = l uu(θ,u,y) u=ũ is the negative Hessian of the joint log-likelihood evaluated at ũ which will simply be referred to as the Hessian. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

12 The Laplace approximation Using the approximation in the Laplace approximation of the marginal log-likelihood becomes l M,LA (θ,y) = log exp ( l(θ,ũ,y) 1 R q 2 (u ũ)t H(ũ)(u ũ) ) du = l(θ,ũ,y) 1 2 log H(ũ) 2π Yhe integral is eliminated by transforming it to an integration of a multivariate Gaussian density with mean ũ and covariance H 1 (ũ). Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

13 The Laplace approximation The Laplace likelihood only approximates the marginal likelihood for mixed models with nonlinear random effects and thus maximizing the Laplace likelihood will result in some amount of error in the resulting estimates. It can be shown that joint log-likelihood converges to a quadratic function of the random effect for increasing number of observations per random effect and thus that the Laplace approximation is asymptotically exact. In practical applications the accuracy of the Laplace approximation may still be of concern, but often improved numerical approximation of the marginal likelihood (such as Gaussian quadrature) may easily be computationally infeasible to perform. Another option for improving the accuracy is Importance sampling. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

14 Two-level hierarchical model For the two-level or hierarchical model it is readily seen that the joint log-likelihood is l(θ,u,y) = l(β,ψ,u,y) = logf Y u (y;u,β)+logf U (u;ψ) which implies that the Laplace approximation becomes l M,LA (θ,y) = logf Y u (y;ũ,β)+logf U (ũ;ψ) 1 2 log H(ũ) 2π It is clear that as long as a likelihood function of the random effects and model parameters can be defined it is possible to use the Laplace likelihood for estimation in a mixed model framework. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

15 Gaussian second stage model Let us assume that the second stage model is zero mean Gaussian, i.e. u N(0,Ψ) which means that the random effect distribution is completely described by its covariance matrix Ψ. In this case the Laplace likelihood in becomes l M,LA (θ,y) = logf Y u (y;ũ,β) 1 2 log Ψ 1 2ũT Ψ 1 ũ 1 2 log H(ũ) where it is seen that we still have no assumptions on the first stage model f Y u (y;u,β). Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

16 Gaussian second stage model If we furthermore assume that the first stage model is Gaussian Y U = u N(µ(β,u),Σ) then the Laplace likelihood can be further specified. For the hierarchical Gaussian model it is rather easy to obtain a numerical approximation of the Hessian H at the optimum, ũ H(ũ) µ u Σ 1 µ T u +Ψ 1 where µ u is the partial derivative with respect to u. The approximation in is called Gauss-Newton approximation In some contexts estimation using this approximation is also called the First Order Conditional Estimation (FOCE) method. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

17 Automatic differentiation A simpel and efficient way to use the Laplace transformation technique outlined is via the open source software package AD Model Builder, which takes advantages of automatic differentiation. Any calculation done via a computer program can be broken down to a long chain of simple operations like +,,, /, exp, log, sin, cos, tan,, and so on. It is simple to write down the analytical derivative of each of these operations by themselves. If our log-likelihood function l consisted of only a few of these simple operations, then it would be tractable to use the chain rule (f g) (x) = f (g(x))g (x) to find the analytical gradient l of the log-likelihood function l. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

18 Automatic differentiation Automatic differentiation is a technique where the chain rule is used by the computer program itself. When the program evaluates the log-likelihood it keeps track of all the operations used along the way, and then runs the program backwards (reverse mode automatic differentiation) and uses the chain rule to update the derivatives one simple operation at a time. Automatic differentiation is accurate, and the computational cost of evaluating the gradient is surprisingly low. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

19 Automatic differentiation Theorem The computational cost of evaluating the gradient of the log-likelihood l with reverse mode automatic differentiation is less than four times the computational cost of evaluating the log-likelihood function l itself. This holds no matter how many parameters the model contain. It is surprising that computational cost does not depend on how many parameters the model contain. There is however a practical concern. The computational cost mentioned above is measured in the number of operations, but reverse mode automatic differentiation requires all the intermediate variables in the calculation of the negative log-likelihood to be stored in the computer s memory, so if the calculation is lengthy, for instance consisting of a long iterative procedure, then the memory requirements can be enormous. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

20 Automatic differentiation combined with the Laplace approximation Finding the gradient of the Laplace approximation of the marginal log-likelihood is challenging, because the approximation itself includes the result of a function minimization, and not just a straightforward sequence of simple operations. It is however possible, but requires up to third order derivatives to be computed internally by clever successive application of automatic differentiation. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

21 Importance sampling Importance sampling is a re-weighting technique for approximating integrals w.r.t. a density f by simulation in cases where it is not feasible to simulate from the distribution with density f. Instead it uses samples from a different distribution with density g, where the support of g includes the support of f. For general mixed effects models it is possible to simulate from the distribution with density proportional to the second order Taylor approximation L(θ,û θ,y ) = exp { l(θ,û θ,y ) 1 2 (u û θ) T ( l uu (θ,u,y) u=û θ )(u û θ ) } as, apart from a normalization constant, it is the density φûθ,ˆv θ (u) of a multivariate normal with mean û θ and covariance ˆV θ = H 1 (û θ ) = ( l uu (θ,u,y ) u=û θ ) 1. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

22 Importance sampling Laplace approximation The integral to be approximated can be rewritten as: L(θ,u,Y ) L M (θ,y ) = L(θ,u,Y )du = φûθ,ˆv θ (u) φ û θ,ˆv θ (u)du. So if u (i), i = 1,...,N is simulated from the multivariate normal distribution with mean û θ and covariance ˆV θ, then the integral can be approximated by the mean of the importance weights L M (θ,y ) = 1 N L(θ,u (i),y ) φûθ,ˆv θ (u (i) ) Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

23 AD Model Builder AD Model Builder is a programming language that builds on C++. It includes helper functions for reading in data, defining model parameters, and implementing and optimizing the negative log-likelihood function. The central feature is automatic differentiation (AD), which is implemented in such a way that the user rarely has to think about it at all. AD Model Builder can be used for fixed effects models, but in addition it includes Laplace approximation and importance sampling for dealing with general mixed effects models. AD Model Builder is developed by Dr. Dave Fournier and was a commercial product for many years. Recently AD Model Builder has been placed in the public domain (see Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Mixed effects models - Part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Automated Likelihood Based Inference for Stochastic Volatility Models using AD Model Builder. Oxford, November 24th 2008 Hans J.

Automated Likelihood Based Inference for Stochastic Volatility Models using AD Model Builder. Oxford, November 24th 2008 Hans J. Automated Likelihood Based Inference for Stochastic Volatility Models using AD Model Builder Oxford, November 24th 2008 Hans J. Skaug (with Jun Yu and David Fournier) Outline AD Model Builder ADMB Foundation

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant

More information

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

More information

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.

More information

Uncertainty quantification for Wavefield Reconstruction Inversion

Uncertainty quantification for Wavefield Reconstruction Inversion Uncertainty quantification for Wavefield Reconstruction Inversion Zhilong Fang *, Chia Ying Lee, Curt Da Silva *, Felix J. Herrmann *, and Rachel Kuske * Seismic Laboratory for Imaging and Modeling (SLIM),

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Time Series Analysis

Time Series Analysis Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Regression based methods, 1st part: Introduction (Sec.

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs Outline Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team UseR!2009,

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

Nonlinear Optimization for Optimal Control Part 2. Pieter Abbeel UC Berkeley EECS. From linear to nonlinear Model-predictive control (MPC) POMDPs

Nonlinear Optimization for Optimal Control Part 2. Pieter Abbeel UC Berkeley EECS. From linear to nonlinear Model-predictive control (MPC) POMDPs Nonlinear Optimization for Optimal Control Part 2 Pieter Abbeel UC Berkeley EECS Outline From linear to nonlinear Model-predictive control (MPC) POMDPs Page 1! From Linear to Nonlinear We know how to solve

More information

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information

Introduction to gradient descent

Introduction to gradient descent 6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our

More information

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Outline for today. Computation of the likelihood function for GLMMs. Likelihood for generalized linear mixed model

Outline for today. Computation of the likelihood function for GLMMs. Likelihood for generalized linear mixed model Outline for today Computation of the likelihood function for GLMMs asmus Waagepetersen Department of Mathematics Aalborg University Denmark likelihood for GLMM penalized quasi-likelihood estimation Laplace

More information

CS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation

CS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Discussion of Maximization by Parts in Likelihood Inference

Discussion of Maximization by Parts in Likelihood Inference Discussion of Maximization by Parts in Likelihood Inference David Ruppert School of Operations Research & Industrial Engineering, 225 Rhodes Hall, Cornell University, Ithaca, NY 4853 email: dr24@cornell.edu

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Mixed models in R using the lme4 package Part 4: Theory of linear mixed models

Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Douglas Bates 8 th International Amsterdam Conference on Multilevel Analysis 2011-03-16 Douglas Bates

More information

Stochastic Analogues to Deterministic Optimizers

Stochastic Analogues to Deterministic Optimizers Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured

More information

Generalized Gradient Descent Algorithms

Generalized Gradient Descent Algorithms ECE 275AB Lecture 11 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 11 ECE 275A Generalized Gradient Descent Algorithms ECE 275AB Lecture 11 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San

More information

Chapter 3 - Estimation by direct maximization of the likelihood

Chapter 3 - Estimation by direct maximization of the likelihood Chapter 3 - Estimation by direct maximization of the likelihood 02433 - Hidden Markov Models Martin Wæver Pedersen, Henrik Madsen Course week 3 MWP, compiled June 7, 2011 Recall: Recursive scheme for the

More information

Sparse Matrix Methods and Mixed-effects Models

Sparse Matrix Methods and Mixed-effects Models Sparse Matrix Methods and Mixed-effects Models Douglas Bates University of Wisconsin Madison 2010-01-27 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Outline lecture 6 2(35)

Outline lecture 6 2(35) Outline lecture 35 Lecture Expectation aximization E and clustering Thomas Schön Division of Automatic Control Linöping University Linöping Sweden. Email: schon@isy.liu.se Phone: 13-1373 Office: House

More information

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle

More information

Sufficient Conditions for Finite-variable Constrained Minimization

Sufficient Conditions for Finite-variable Constrained Minimization Lecture 4 It is a small de tour but it is important to understand this before we move to calculus of variations. Sufficient Conditions for Finite-variable Constrained Minimization ME 256, Indian Institute

More information

Nonlinear equations and optimization

Nonlinear equations and optimization Notes for 2017-03-29 Nonlinear equations and optimization For the next month or so, we will be discussing methods for solving nonlinear systems of equations and multivariate optimization problems. We will

More information

Numerical optimization

Numerical optimization Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

A Trust-region-based Sequential Quadratic Programming Algorithm

A Trust-region-based Sequential Quadratic Programming Algorithm Downloaded from orbit.dtu.dk on: Oct 19, 2018 A Trust-region-based Sequential Quadratic Programming Algorithm Henriksen, Lars Christian; Poulsen, Niels Kjølstad Publication date: 2010 Document Version

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Random Eigenvalue Problems Revisited

Random Eigenvalue Problems Revisited Random Eigenvalue Problems Revisited S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. Email: S.Adhikari@bristol.ac.uk URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html

More information

1 Computing with constraints

1 Computing with constraints Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Mixed effects models - II Henrik Madsen, Jan Kloppenborg Møller, Anders Nielsen April 16, 2012 H. Madsen, JK. Møller, A. Nielsen () Chapman & Hall

More information

TRACKING and DETECTION in COMPUTER VISION

TRACKING and DETECTION in COMPUTER VISION Technischen Universität München Winter Semester 2013/2014 TRACKING and DETECTION in COMPUTER VISION Template tracking methods Slobodan Ilić Template based-tracking Energy-based methods The Lucas-Kanade(LK)

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Numerical Methods I Solving Nonlinear Equations

Numerical Methods I Solving Nonlinear Equations Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

Computation fundamentals of discrete GMRF representations of continuous domain spatial models

Computation fundamentals of discrete GMRF representations of continuous domain spatial models Computation fundamentals of discrete GMRF representations of continuous domain spatial models Finn Lindgren September 23 2015 v0.2.2 Abstract The fundamental formulas and algorithms for Bayesian spatial

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics) COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24 Administrivia This course Topics http://www.satyenkale.com/coms4771/ 1. Supervised learning Core

More information

Computational methods for mixed models

Computational methods for mixed models Computational methods for mixed models Douglas Bates Department of Statistics University of Wisconsin Madison March 27, 2018 Abstract The lme4 package provides R functions to fit and analyze several different

More information

Learning Parameters of Undirected Models. Sargur Srihari

Learning Parameters of Undirected Models. Sargur Srihari Learning Parameters of Undirected Models Sargur srihari@cedar.buffalo.edu 1 Topics Log-linear Parameterization Likelihood Function Maximum Likelihood Parameter Estimation Simple and Conjugate Gradient

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Bayesian Analysis (Optional)

Bayesian Analysis (Optional) Bayesian Analysis (Optional) 1 2 Big Picture There are two ways to conduct statistical inference 1. Classical method (frequentist), which postulates (a) Probability refers to limiting relative frequencies

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

Robot Mapping. Least Squares. Cyrill Stachniss

Robot Mapping. Least Squares. Cyrill Stachniss Robot Mapping Least Squares Cyrill Stachniss 1 Three Main SLAM Paradigms Kalman filter Particle filter Graphbased least squares approach to SLAM 2 Least Squares in General Approach for computing a solution

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Log-Linear Models, MEMMs, and CRFs

Log-Linear Models, MEMMs, and CRFs Log-Linear Models, MEMMs, and CRFs Michael Collins 1 Notation Throughout this note I ll use underline to denote vectors. For example, w R d will be a vector with components w 1, w 2,... w d. We use expx

More information

Mathematics 22: Lecture 7

Mathematics 22: Lecture 7 Mathematics 22: Lecture 7 Separation of Variables Dan Sloughter Furman University January 15, 2008 Dan Sloughter (Furman University) Mathematics 22: Lecture 7 January 15, 2008 1 / 8 Separable equations

More information

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification N. Zabaras 1 S. Atkinson 1 Center for Informatics and Computational Science Department of Aerospace and Mechanical Engineering University

More information

EM Algorithm. Expectation-maximization (EM) algorithm.

EM Algorithm. Expectation-maximization (EM) algorithm. EM Algorithm Outline: Expectation-maximization (EM) algorithm. Examples. Reading: A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.,

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 5 Sequential Monte Carlo methods I 31 March 2017 Computer Intensive Methods (1) Plan of today s lecture

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!

Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/! Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!! WARNING! this class will be dense! will learn how to use nonlinear optimization

More information

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems 1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of

More information

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016 Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University

GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR Raymond J. Carroll: Texas A&M University Naisyin Wang: Xihong Lin: Roberto Gutierrez: Texas A&M University University of Michigan Southern Methodist

More information

( ).666 Information Extraction from Speech and Text

( ).666 Information Extraction from Speech and Text (520 600).666 Information Extraction from Speech and Text HMM Parameters Estimation for Gaussian Output Densities April 27, 205. Generalization of the Results of Section 9.4. It is suggested in Section

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,

More information