Introduction to General and Generalized Linear Models
|
|
- Easter Lamb
- 5 years ago
- Views:
Transcription
1 Introduction to General and Generalized Linear Models Mixed effects models - Part IV Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby January 2011 Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
2 This lecture General mixed effects models Laplace approximation Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
3 General mixed effects models General mixed effects models Let us now look at methods to deal with nonlinear and non-normal mixed effects models. In general it will be impossible to obtain closed form solutions and hence numerical methods must be used. Estimation and inference will be based on likelihood principles. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
4 General mixed effects models General mixed effects models The general mixed effects model can be represented by its likelihood function: L M (θ;y) = L(θ;u,y)du R q where y is the observed random variables, θ is the model parameters to be estimated, and U is the q unobserved random variables or effects. The likelihood function L is the joint likelihood of both the observed and the unobserved random variables. The likelihood function for estimating θ is the marginal likelihood L M obtained by integrating out the unobserved random variables. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
5 General mixed effects models General mixed effects models The integral shown on the previous slide is generally difficult to solve if the number of unobserved random variables is more than a few, i.e. for large values of q. A large value of q significantly increases the computational demands due to the product rule which states that if an integral is sampled in m points per dimension to evaluate it, the total number of samples needed is m q, which rapidly becomes infeasible even for a limited number of random effects. The likelihood function gives a very broad definition of mixed models: the only requirement for using mixed modeling is to define a joint likelihood function for the model of interest. In this way mixed modeling can be applied to any likelihood based statistical modeling. Examples of applications are linear mixed models (LMM) and nonlinear mixed models (NLMM), generalized linear mixed models, but also models based on Markov chains, ODEs or SDEs. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
6 Hierarchical models General mixed effects models As for the Gaussian linear mixed models it is useful to formulate the model as a hierarchical model containing a first stage model f Y u (y;u,β) which is a model for the data given the random effects, and a second stage model f U (u;ψ) which is a model for the random effects. The total set of parameters is θ = (β,ψ). Hence the joint likelihood is given as L(β,Ψ;u,y) = f Y u (y;u,β)f U (u;ψ) Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
7 Hierarchical models General mixed effects models To obtain the likelihood for the model parameters (β, Ψ) the unobserved random effects are again integrated out. The likelihood function for estimating (β, Ψ) is as before the marginal likelihood L M (β,ψ;y) = L(β,Ψ;u,y)du R q where q is the number of random effects, and β and Ψ are the parameters to be estimated. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
8 General mixed effects models Grouping structures and nested effects For nonlinear mixed models where no closed form solution to the likelihood function is available it is necessary to invoke some form of numerical approximation to be able to estimate the model parameters. The complexity of this problem is mainly dependent on the dimensionality of the integration problem which in turn is dependent on the dimension of U and in particular the grouping structure in the data for the random effects. These structures include a single grouping, nested grouping, partially crossed and crossed random effects. For problems with only one level of grouping the marginal likelihood can be simplified as L M (β,ψ;y) = M i=1 R q i f Y ui (y;u i,β)f Ui (u i ;Ψ)du i where q i is the number of random effects for group i and M is the number of groups. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
9 General mixed effects models Grouping structures and nested effects Instead of having to solve an integral of dimension q it is only necessary to solve M smaller integrals of dimension q i. In typical applications there is often just one or only a few random effects for each group, and this thus greatly reduces the complexity of the integration problem. If the data has a nested grouping structure a reduction of the dimensionality of the integral similar to that shown on the previous slide can be performed. An example of a nested grouping structure is data collected from a number of schools, a number of classes within each school and a number of students from each class. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
10 General mixed effects models Grouping structures and nested effects If the nonlinear mixed model is extended to include any structure of random effects such as crossed or partially crossed random effects it is required to evaluate the full multi-dimensional integral Estimation in these models can efficiently be handled using the multivariate Laplace approximation, which only samples the integrand in one point common to all dimensions. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
11 The Laplace approximation For a given set of model parameters θ the joint log-likelihood l(θ,u,y) = log(l(θ,u,y)) is approximated by a second order Taylor approximation around the optimum ũ = û θ of the log-likelihood function w.r.t. the unobserved random variables u, i.e. l(θ,u,y) l(θ,ũ,y) 1 2 (u ũ)t H(ũ)(u ũ) where the first-order term of the Taylor expansion disappears since the expansion is done around the optimum ũ and H(ũ) = l uu(θ,u,y) u=ũ is the negative Hessian of the joint log-likelihood evaluated at ũ which will simply be referred to as the Hessian. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
12 The Laplace approximation Using the approximation in the Laplace approximation of the marginal log-likelihood becomes l M,LA (θ,y) = log exp ( l(θ,ũ,y) 1 R q 2 (u ũ)t H(ũ)(u ũ) ) du = l(θ,ũ,y) 1 2 log H(ũ) 2π Yhe integral is eliminated by transforming it to an integration of a multivariate Gaussian density with mean ũ and covariance H 1 (ũ). Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
13 The Laplace approximation The Laplace likelihood only approximates the marginal likelihood for mixed models with nonlinear random effects and thus maximizing the Laplace likelihood will result in some amount of error in the resulting estimates. It can be shown that joint log-likelihood converges to a quadratic function of the random effect for increasing number of observations per random effect and thus that the Laplace approximation is asymptotically exact. In practical applications the accuracy of the Laplace approximation may still be of concern, but often improved numerical approximation of the marginal likelihood (such as Gaussian quadrature) may easily be computationally infeasible to perform. Another option for improving the accuracy is Importance sampling. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
14 Two-level hierarchical model For the two-level or hierarchical model it is readily seen that the joint log-likelihood is l(θ,u,y) = l(β,ψ,u,y) = logf Y u (y;u,β)+logf U (u;ψ) which implies that the Laplace approximation becomes l M,LA (θ,y) = logf Y u (y;ũ,β)+logf U (ũ;ψ) 1 2 log H(ũ) 2π It is clear that as long as a likelihood function of the random effects and model parameters can be defined it is possible to use the Laplace likelihood for estimation in a mixed model framework. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
15 Gaussian second stage model Let us assume that the second stage model is zero mean Gaussian, i.e. u N(0,Ψ) which means that the random effect distribution is completely described by its covariance matrix Ψ. In this case the Laplace likelihood in becomes l M,LA (θ,y) = logf Y u (y;ũ,β) 1 2 log Ψ 1 2ũT Ψ 1 ũ 1 2 log H(ũ) where it is seen that we still have no assumptions on the first stage model f Y u (y;u,β). Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
16 Gaussian second stage model If we furthermore assume that the first stage model is Gaussian Y U = u N(µ(β,u),Σ) then the Laplace likelihood can be further specified. For the hierarchical Gaussian model it is rather easy to obtain a numerical approximation of the Hessian H at the optimum, ũ H(ũ) µ u Σ 1 µ T u +Ψ 1 where µ u is the partial derivative with respect to u. The approximation in is called Gauss-Newton approximation In some contexts estimation using this approximation is also called the First Order Conditional Estimation (FOCE) method. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
17 Automatic differentiation A simpel and efficient way to use the Laplace transformation technique outlined is via the open source software package AD Model Builder, which takes advantages of automatic differentiation. Any calculation done via a computer program can be broken down to a long chain of simple operations like +,,, /, exp, log, sin, cos, tan,, and so on. It is simple to write down the analytical derivative of each of these operations by themselves. If our log-likelihood function l consisted of only a few of these simple operations, then it would be tractable to use the chain rule (f g) (x) = f (g(x))g (x) to find the analytical gradient l of the log-likelihood function l. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
18 Automatic differentiation Automatic differentiation is a technique where the chain rule is used by the computer program itself. When the program evaluates the log-likelihood it keeps track of all the operations used along the way, and then runs the program backwards (reverse mode automatic differentiation) and uses the chain rule to update the derivatives one simple operation at a time. Automatic differentiation is accurate, and the computational cost of evaluating the gradient is surprisingly low. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
19 Automatic differentiation Theorem The computational cost of evaluating the gradient of the log-likelihood l with reverse mode automatic differentiation is less than four times the computational cost of evaluating the log-likelihood function l itself. This holds no matter how many parameters the model contain. It is surprising that computational cost does not depend on how many parameters the model contain. There is however a practical concern. The computational cost mentioned above is measured in the number of operations, but reverse mode automatic differentiation requires all the intermediate variables in the calculation of the negative log-likelihood to be stored in the computer s memory, so if the calculation is lengthy, for instance consisting of a long iterative procedure, then the memory requirements can be enormous. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
20 Automatic differentiation combined with the Laplace approximation Finding the gradient of the Laplace approximation of the marginal log-likelihood is challenging, because the approximation itself includes the result of a function minimization, and not just a straightforward sequence of simple operations. It is however possible, but requires up to third order derivatives to be computed internally by clever successive application of automatic differentiation. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
21 Importance sampling Importance sampling is a re-weighting technique for approximating integrals w.r.t. a density f by simulation in cases where it is not feasible to simulate from the distribution with density f. Instead it uses samples from a different distribution with density g, where the support of g includes the support of f. For general mixed effects models it is possible to simulate from the distribution with density proportional to the second order Taylor approximation L(θ,û θ,y ) = exp { l(θ,û θ,y ) 1 2 (u û θ) T ( l uu (θ,u,y) u=û θ )(u û θ ) } as, apart from a normalization constant, it is the density φûθ,ˆv θ (u) of a multivariate normal with mean û θ and covariance ˆV θ = H 1 (û θ ) = ( l uu (θ,u,y ) u=û θ ) 1. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
22 Importance sampling Laplace approximation The integral to be approximated can be rewritten as: L(θ,u,Y ) L M (θ,y ) = L(θ,u,Y )du = φûθ,ˆv θ (u) φ û θ,ˆv θ (u)du. So if u (i), i = 1,...,N is simulated from the multivariate normal distribution with mean û θ and covariance ˆV θ, then the integral can be approximated by the mean of the importance weights L M (θ,y ) = 1 N L(θ,u (i),y ) φûθ,ˆv θ (u (i) ) Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
23 AD Model Builder AD Model Builder is a programming language that builds on C++. It includes helper functions for reading in data, defining model parameters, and implementing and optimizing the negative log-likelihood function. The central feature is automatic differentiation (AD), which is implemented in such a way that the user rarely has to think about it at all. AD Model Builder can be used for fixed effects models, but in addition it includes Laplace approximation and importance sampling for dealing with general mixed effects models. AD Model Builder is developed by Dr. Dave Fournier and was a commercial product for many years. Recently AD Model Builder has been placed in the public domain (see Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January / 23
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Mixed effects models - Part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationAutomated Likelihood Based Inference for Stochastic Volatility Models using AD Model Builder. Oxford, November 24th 2008 Hans J.
Automated Likelihood Based Inference for Stochastic Volatility Models using AD Model Builder Oxford, November 24th 2008 Hans J. Skaug (with Jun Yu and David Fournier) Outline AD Model Builder ADMB Foundation
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant
More informationThe classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know
The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is
More informationThe classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA
The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.
More informationUncertainty quantification for Wavefield Reconstruction Inversion
Uncertainty quantification for Wavefield Reconstruction Inversion Zhilong Fang *, Chia Ying Lee, Curt Da Silva *, Felix J. Herrmann *, and Rachel Kuske * Seismic Laboratory for Imaging and Modeling (SLIM),
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationTime Series Analysis
Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Regression based methods, 1st part: Introduction (Sec.
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationFitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation
Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationOutline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs
Outline Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team UseR!2009,
More informationReliability Monitoring Using Log Gaussian Process Regression
COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical
More informationNonlinear Optimization for Optimal Control Part 2. Pieter Abbeel UC Berkeley EECS. From linear to nonlinear Model-predictive control (MPC) POMDPs
Nonlinear Optimization for Optimal Control Part 2 Pieter Abbeel UC Berkeley EECS Outline From linear to nonlinear Model-predictive control (MPC) POMDPs Page 1! From Linear to Nonlinear We know how to solve
More informationLecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010
Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More informationMixed models in R using the lme4 package Part 7: Generalized linear mixed models
Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationOutline for today. Computation of the likelihood function for GLMMs. Likelihood for generalized linear mixed model
Outline for today Computation of the likelihood function for GLMMs asmus Waagepetersen Department of Mathematics Aalborg University Denmark likelihood for GLMM penalized quasi-likelihood estimation Laplace
More informationCS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation
Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationDiscussion of Maximization by Parts in Likelihood Inference
Discussion of Maximization by Parts in Likelihood Inference David Ruppert School of Operations Research & Industrial Engineering, 225 Rhodes Hall, Cornell University, Ithaca, NY 4853 email: dr24@cornell.edu
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationRandom Matrix Eigenvalue Problems in Probabilistic Structural Mechanics
Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationNumerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09
Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods
More informationIntroduction to Maximum Likelihood Estimation
Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:
More informationDynamic System Identification using HDMR-Bayesian Technique
Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationMixed models in R using the lme4 package Part 4: Theory of linear mixed models
Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Douglas Bates 8 th International Amsterdam Conference on Multilevel Analysis 2011-03-16 Douglas Bates
More informationStochastic Analogues to Deterministic Optimizers
Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured
More informationGeneralized Gradient Descent Algorithms
ECE 275AB Lecture 11 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 11 ECE 275A Generalized Gradient Descent Algorithms ECE 275AB Lecture 11 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San
More informationChapter 3 - Estimation by direct maximization of the likelihood
Chapter 3 - Estimation by direct maximization of the likelihood 02433 - Hidden Markov Models Martin Wæver Pedersen, Henrik Madsen Course week 3 MWP, compiled June 7, 2011 Recall: Recursive scheme for the
More informationSparse Matrix Methods and Mixed-effects Models
Sparse Matrix Methods and Mixed-effects Models Douglas Bates University of Wisconsin Madison 2010-01-27 Outline Theory and Practice of Statistics Challenges from Data Characteristics Linear Mixed Model
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationOutline lecture 6 2(35)
Outline lecture 35 Lecture Expectation aximization E and clustering Thomas Schön Division of Automatic Control Linöping University Linöping Sweden. Email: schon@isy.liu.se Phone: 13-1373 Office: House
More informationSpring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM
University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle
More informationSufficient Conditions for Finite-variable Constrained Minimization
Lecture 4 It is a small de tour but it is important to understand this before we move to calculus of variations. Sufficient Conditions for Finite-variable Constrained Minimization ME 256, Indian Institute
More informationNonlinear equations and optimization
Notes for 2017-03-29 Nonlinear equations and optimization For the next month or so, we will be discussing methods for solving nonlinear systems of equations and multivariate optimization problems. We will
More informationNumerical optimization
Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationA Trust-region-based Sequential Quadratic Programming Algorithm
Downloaded from orbit.dtu.dk on: Oct 19, 2018 A Trust-region-based Sequential Quadratic Programming Algorithm Henriksen, Lars Christian; Poulsen, Niels Kjølstad Publication date: 2010 Document Version
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationRandom Eigenvalue Problems Revisited
Random Eigenvalue Problems Revisited S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. Email: S.Adhikari@bristol.ac.uk URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html
More information1 Computing with constraints
Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationRegularization in Cox Frailty Models
Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University
More informationMixed models in R using the lme4 package Part 5: Generalized linear mixed models
Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Mixed effects models - II Henrik Madsen, Jan Kloppenborg Møller, Anders Nielsen April 16, 2012 H. Madsen, JK. Møller, A. Nielsen () Chapman & Hall
More informationTRACKING and DETECTION in COMPUTER VISION
Technischen Universität München Winter Semester 2013/2014 TRACKING and DETECTION in COMPUTER VISION Template tracking methods Slobodan Ilić Template based-tracking Energy-based methods The Lucas-Kanade(LK)
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationNumerical Methods I Solving Nonlinear Equations
Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)
More informationMixed models in R using the lme4 package Part 5: Generalized linear mixed models
Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationComputation fundamentals of discrete GMRF representations of continuous domain spatial models
Computation fundamentals of discrete GMRF representations of continuous domain spatial models Finn Lindgren September 23 2015 v0.2.2 Abstract The fundamental formulas and algorithms for Bayesian spatial
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationCOMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)
COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24 Administrivia This course Topics http://www.satyenkale.com/coms4771/ 1. Supervised learning Core
More informationComputational methods for mixed models
Computational methods for mixed models Douglas Bates Department of Statistics University of Wisconsin Madison March 27, 2018 Abstract The lme4 package provides R functions to fit and analyze several different
More informationLearning Parameters of Undirected Models. Sargur Srihari
Learning Parameters of Undirected Models Sargur srihari@cedar.buffalo.edu 1 Topics Log-linear Parameterization Likelihood Function Maximum Likelihood Parameter Estimation Simple and Conjugate Gradient
More informationA Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationBayesian Analysis (Optional)
Bayesian Analysis (Optional) 1 2 Big Picture There are two ways to conduct statistical inference 1. Classical method (frequentist), which postulates (a) Probability refers to limiting relative frequencies
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationModelling Non-linear and Non-stationary Time Series
Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September
More informationRobot Mapping. Least Squares. Cyrill Stachniss
Robot Mapping Least Squares Cyrill Stachniss 1 Three Main SLAM Paradigms Kalman filter Particle filter Graphbased least squares approach to SLAM 2 Least Squares in General Approach for computing a solution
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationLog-Linear Models, MEMMs, and CRFs
Log-Linear Models, MEMMs, and CRFs Michael Collins 1 Notation Throughout this note I ll use underline to denote vectors. For example, w R d will be a vector with components w 1, w 2,... w d. We use expx
More informationMathematics 22: Lecture 7
Mathematics 22: Lecture 7 Separation of Variables Dan Sloughter Furman University January 15, 2008 Dan Sloughter (Furman University) Mathematics 22: Lecture 7 January 15, 2008 1 / 8 Separable equations
More informationFully Bayesian Deep Gaussian Processes for Uncertainty Quantification
Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification N. Zabaras 1 S. Atkinson 1 Center for Informatics and Computational Science Department of Aerospace and Mechanical Engineering University
More informationEM Algorithm. Expectation-maximization (EM) algorithm.
EM Algorithm Outline: Expectation-maximization (EM) algorithm. Examples. Reading: A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.,
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 5 Sequential Monte Carlo methods I 31 March 2017 Computer Intensive Methods (1) Plan of today s lecture
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationPose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!
Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!! WARNING! this class will be dense! will learn how to use nonlinear optimization
More informationNumerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems
1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of
More informationLecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016
Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationGENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University
GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR Raymond J. Carroll: Texas A&M University Naisyin Wang: Xihong Lin: Roberto Gutierrez: Texas A&M University University of Michigan Southern Methodist
More information( ).666 Information Extraction from Speech and Text
(520 600).666 Information Extraction from Speech and Text HMM Parameters Estimation for Gaussian Output Densities April 27, 205. Generalization of the Results of Section 9.4. It is suggested in Section
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,
More information