Using P-splines to smooth two-dimensional Poisson data

Size: px
Start display at page:

Download "Using P-splines to smooth two-dimensional Poisson data"

Transcription

1 1 Using P-splines to smooth two-dimensional Poisson data Maria Durbán 1, Iain Currie 2, Paul Eilers 3 17th IWSM, July Dept. Statistics and Econometrics, Universidad Carlos III de Madrid, Spain. 2 Dept. Actuarial Mathematics and Statistics, Heriot-Watt University, Edinburgh, UK 3 Department of Medical Statistics, Leiden University Medical Center, The Netherlands

2 What is this talk about? 2

3 What is this talk about? 2 Introduction The data P-splines Smoothing Poisson data with P-splines (one dimensional case)

4 What is this talk about? 2 Introduction The data P-splines Smoothing Poisson data with P-splines (one dimensional case) Several models for two-dimensional Poisson data. Generalized additive model Two-dimensional smoothing with penalties Dimension reduction using P-splines

5 What is this talk about? 2 Introduction The data P-splines Smoothing Poisson data with P-splines (one dimensional case) Several models for two-dimensional Poisson data. Generalized additive model Two-dimensional smoothing with penalties Dimension reduction using P-splines Dicuss computational issues for large data sets.

6 What is this talk about? 2 Introduction The data P-splines Smoothing Poisson data with P-splines (one dimensional case) Several models for two-dimensional Poisson data. Generalized additive model Two-dimensional smoothing with penalties Dimension reduction using P-splines Dicuss computational issues for large data sets. Analysis of mortality data.

7 The data 3 Male policyholders, source: Continuous Mortality Investigation Bureau (CMIB). For each calendar year ( ) and each age (11-100) we have: Number of years lived (the exposure). Number of policy claims (deaths). Mortality of male policyholders has improved rapidly over the last 30 years Model mortality trends overtime and dependence on age.

8 P-spline Use B-splines as the basis for the regression. Modify the log-likelihood by a difference penalty on the regression coefficients. y = f(x) + ɛ f(x) Ba S = (y Ba) (y Ba) + λa D Da â = (B B + λd D) 1 B y 4

9 P-spline Use B-splines as the basis for the regression. Modify the log-likelihood by a difference penalty on the regression coefficients. y = f(x) + ɛ f(x) Ba S = (y Ba) (y Ba) + λa D Da â = (B B + λd D) 1 B y 4 B-spline basis Scaled B-splines and their sum

10 Poisson data and P-splines, 1D-case 5

11 Poisson data and P-splines, 1D-case 5 E x = number of years lived aged x y x = number of deaths aged x Y x P (E x θ x ) η = log(θ x ) = Ba Maximise l(a; y x ) 1 2 λa D Da â t+1 = (B W t B + λd D) 1 B W t z t where z = η + W 1 (y µ) is the working variable and W = diag(µ) is the diagonal matrix of weights.

12 1. A generalized additive model 6

13 1. A generalized additive model 6 Y = (y ij ) matrix of deaths at age i = 1,..., m and year j = 1,..., n. E = (E ij ), expousure Θ = (θ ij ) and log θ = Ba a = (α, a 1, a 2), B = (1 : B a : B y ) B a, N n a, set of B-splines for age B y, N n y, set of B-splines for years

14 1. A generalized additive model 6 Y = (y ij ) matrix of deaths at age i = 1,..., m and year j = 1,..., n. E = (E ij ), expousure Θ = (θ ij ) and log θ = Ba a = (α, a 1, a 2), B = (1 : B a : B y ) B a, N n a, set of B-splines for age B y, N n y, set of B-splines for years â t+1 = (B W t B + P ) B W t z t P = blockdiag(0, P a, P y ); P a = λ a D ad a and P y = λ y D yd y are the penalty matrices for age and year

15 1. A generalized additive model 6 Y = (y ij ) matrix of deaths at age i = 1,..., m and year j = 1,..., n. E = (E ij ), expousure Θ = (θ ij ) and log θ = Ba a = (α, a 1, a 2), B = (1 : B a : B y ) B a, N n a, set of B-splines for age B y, N n y, set of B-splines for years â t+1 = (B W t B + P ) B W t z t P = blockdiag(0, P a, P y ); P a = λ a D ad a and P y = λ y D yd y are the penalty matrices for age and year Smoothing parameter selection dev(y; a, λ a, λ y ) + δ tr(h) δ=2 AIC δ = log(n) BIC

16 Computational issues 7

17 Computational issues 7 No need for backfitting closed form for H

18 Computational issues 7 No need for backfitting closed form for H Singular Matrix Ridge penalty Generalized inverse Use a different parametrisation

19 Computational issues 7 No need for backfitting closed form for H Singular Matrix Ridge penalty Generalized inverse Use a different parametrisation Number of parameters = ncol(b), much smaller than N

20 Computational issues 7 No need for backfitting closed form for H Singular Matrix Ridge penalty Generalized inverse Use a different parametrisation Number of parameters = ncol(b), much smaller than N Fast when N is large, not posible with cubic smoothing splines

21 Model 1 8 log(mu) Age: 34 log(mu) Age: Year Year

22 2. Two dimensional smoothing with penalties 9

23 2. Two dimensional smoothing with penalties 9 Suppose log mortalities is a matrix of parameters: log Θ = A = (a 1,..., a n ), A = (a r 1,..., a r m) and impose a smoothness condition on each row and column of A:

24 2. Two dimensional smoothing with penalties 9 Suppose log mortalities is a matrix of parameters: log Θ = A = (a 1,..., a n ), A = (a r 1,..., a r m) and impose a smoothness condition on each row and column of A: n l(a; Y ) 1 2 λ a a jd a D a a j 1 2 λ y j=1 l(a; y) 1 2 a (λ a P a + λ y P y )a m i=1 a r i D y D y a r i a = (a 1,..., a n), P a = I n D a D a, P y = D y D y I m. â t+1 = (W t + P ) 1 W t z t

25 Computational issues 10

26 Computational issues 10 Algorithm: Iterate between rows and columns Working variable to update the column estimates: Z = (z 1,..., z n ) = A + (Y M λ y AD y D y )/M. Updated estimate of a j, j = 1,..., n, is a j = (diag(µ j ) + λ a D a D a ) 1 diag(µ j )z j.

27 Computational issues 10 Algorithm: Iterate between rows and columns Working variable to update the column estimates: Z = (z 1,..., z n ) = A + (Y M λ y AD y D y )/M. Updated estimate of a j, j = 1,..., n, is a j = (diag(µ j ) + λ a D a D a ) 1 diag(µ j )z j. Copes with the potential computational problems associated with twodimensional smoothing with large data sets.

28 Computational issues 10 Algorithm: Iterate between rows and columns Working variable to update the column estimates: Z = (z 1,..., z n ) = A + (Y M λ y AD y D y )/M. Updated estimate of a j, j = 1,..., n, is a j = (diag(µ j ) + λ a D a D a ) 1 diag(µ j )z j. Copes with the potential computational problems associated with twodimensional smoothing with large data sets. Problem: tr(h) cannot be calculated AIC, BIC cannot be computed

29 Model 2 11 log(mu) Age: 34 log(mu) Age: Year Year

30 3. Dimension reduction using P -splines 12

31 3. Dimension reduction using P -splines 12 B a, m n a, one-dimensional B-spline basis for smoothing by age for a single year B y, n n y, one-dimensional B-spline basis for smoothing by year for a single age Assume that log θ = Ba B = B y B a. Equivalent to Model 2 with a in matrix form: A = (a 1,..., a ny ), A = (a r 1,..., a r n a ). l(a; y) 1 2 a (λ a P a + λ y P y )a P a = I ny D a D a and P y = D y D y I na

32 Computational issues 13

33 Computational issues 13 bdeg = 0, n a = n, n y = m B = I nm and Model 2 = Model 3, but not possible to fit it.

34 Computational issues 13 bdeg = 0, n a = n, n y = m B = I nm and Model 2 = Model 3, but not possible to fit it. Matrix B is N n a n y storage problems. Solution:

35 Computational issues 13 bdeg = 0, n a = n, n y = m B = I nm and Model 2 = Model 3, but not possible to fit it. Matrix B is N n a n y storage problems. Solution: work with partitioned matrix B = [B 1, B 2, B 3 ] take advantaje of the banded nature of B

36 Model 3 14 log(mu) Age: 34 log(mu) Age: Year Year

37 log(mu) log(mu) Year Age Year Age -2 0 log(mu) Year Age

38 Conclusions and future work 16

39 Conclusions and future work 16 P -splines are useful tool to model two-dimensional Poisson data Investigate a method for approximating the value of tr(h) in Model 2 Develope methods for dealing with over-dispersion Fit the models in the context of GLMM Comparison with age-period-cohort models

40 Z Y X

GLAM An Introduction to Array Methods in Statistics

GLAM An Introduction to Array Methods in Statistics GLAM An Introduction to Array Methods in Statistics Iain Currie Heriot Watt University GLAM A Generalized Linear Array Model is a low-storage, high-speed, method for multidimensional smoothing, when data

More information

A Hierarchical Perspective on Lee-Carter Models

A Hierarchical Perspective on Lee-Carter Models A Hierarchical Perspective on Lee-Carter Models Paul Eilers Leiden University Medical Centre L-C Workshop, Edinburgh 24 The vantage point Previous presentation: Iain Currie led you upward From Glen Gumbel

More information

Currie, Iain Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh EH14 4AS, UK

Currie, Iain Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh EH14 4AS, UK An Introduction to Generalized Linear Array Models Currie, Iain Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh EH14 4AS, UK E-mail: I.D.Currie@hw.ac.uk 1 Motivating

More information

P -spline ANOVA-type interaction models for spatio-temporal smoothing

P -spline ANOVA-type interaction models for spatio-temporal smoothing P -spline ANOVA-type interaction models for spatio-temporal smoothing Dae-Jin Lee 1 and María Durbán 1 1 Department of Statistics, Universidad Carlos III de Madrid, SPAIN. e-mail: dae-jin.lee@uc3m.es and

More information

Multidimensional Density Smoothing with P-splines

Multidimensional Density Smoothing with P-splines Multidimensional Density Smoothing with P-splines Paul H.C. Eilers, Brian D. Marx 1 Department of Medical Statistics, Leiden University Medical Center, 300 RC, Leiden, The Netherlands (p.eilers@lumc.nl)

More information

Flexible Spatio-temporal smoothing with array methods

Flexible Spatio-temporal smoothing with array methods Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session IPS046) p.849 Flexible Spatio-temporal smoothing with array methods Dae-Jin Lee CSIRO, Mathematics, Informatics and

More information

Recovering Indirect Information in Demographic Applications

Recovering Indirect Information in Demographic Applications Recovering Indirect Information in Demographic Applications Jutta Gampe Abstract In many demographic applications the information of interest can only be estimated indirectly. Modelling events and rates

More information

Smoothing Age-Period-Cohort models with P -splines: a mixed model approach

Smoothing Age-Period-Cohort models with P -splines: a mixed model approach Smoothing Age-Period-Cohort models with P -splines: a mixed model approach Running headline: Smooth Age-Period-Cohort models I D Currie, Department of Actuarial Mathematics and Statistics, and the Maxwell

More information

Space-time modelling of air pollution with array methods

Space-time modelling of air pollution with array methods Space-time modelling of air pollution with array methods Dae-Jin Lee Royal Statistical Society Conference Edinburgh 2009 D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 1 Motivation

More information

Modelling trends in digit preference patterns

Modelling trends in digit preference patterns Appl. Statist. (2017) 66, Part 5, pp. 893 918 Modelling trends in digit preference patterns Carlo G. Camarda, Institut National d Études Démographiques, Paris, France Paul H. C. Eilers Erasmus University

More information

Smoothing Age-Period-Cohort models with P -splines: a mixed model approach

Smoothing Age-Period-Cohort models with P -splines: a mixed model approach Smoothing Age-Period-Cohort models with P -splines: a mixed model approach Running headline: Smooth Age-Period-Cohort models I D Currie, Department of Actuarial Mathematics and Statistics, and the Maxwell

More information

Estimating prediction error in mixed models

Estimating prediction error in mixed models Estimating prediction error in mixed models benjamin saefken, thomas kneib georg-august university goettingen sonja greven ludwig-maximilians-university munich 1 / 12 GLMM - Generalized linear mixed models

More information

Functional SVD for Big Data

Functional SVD for Big Data Functional SVD for Big Data Pan Chao April 23, 2014 Pan Chao Functional SVD for Big Data April 23, 2014 1 / 24 Outline 1 One-Way Functional SVD a) Interpretation b) Robustness c) CV/GCV 2 Two-Way Problem

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Triangles in Life and Casualty

Triangles in Life and Casualty Triangles in Life and Casualty Gary G. Venter, Guy Carpenter LLC gary.g.venter@guycarp.com Abstract The loss development triangles in casualty insurance have a similar setup to the mortality datasets used

More information

Consistent Bivariate Distribution

Consistent Bivariate Distribution A Characterization of the Normal Conditional Distributions MATSUNO 79 Therefore, the function ( ) = G( : a/(1 b2)) = N(0, a/(1 b2)) is a solu- tion for the integral equation (10). The constant times of

More information

Ratemaking application of Bayesian LASSO with conjugate hyperprior

Ratemaking application of Bayesian LASSO with conjugate hyperprior Ratemaking application of Bayesian LASSO with conjugate hyperprior Himchan Jeong and Emiliano A. Valdez University of Connecticut Actuarial Science Seminar Department of Mathematics University of Illinois

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

Generalized linear mixed models (GLMMs) for dependent compound risk models

Generalized linear mixed models (GLMMs) for dependent compound risk models Generalized linear mixed models (GLMMs) for dependent compound risk models Emiliano A. Valdez joint work with H. Jeong, J. Ahn and S. Park University of Connecticut 52nd Actuarial Research Conference Georgia

More information

Cohort Effect Structure in the Lee-Carter Residual Term. Naoki Sunamoto, FIAJ. Fukoku Mutual Life Insurance Company

Cohort Effect Structure in the Lee-Carter Residual Term. Naoki Sunamoto, FIAJ. Fukoku Mutual Life Insurance Company Cohort Effect Structure in the Lee-Carter Residual Term Naoki Sunamoto, FIAJ Fukoku Mutual Life Insurance Company 2-2 Uchisaiwaicho 2-chome, Chiyoda-ku, Tokyo, 100-0011, Japan Tel: +81-3-3593-7445, Fax:

More information

An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K.

An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K. An Introduction to GAMs based on penalied regression splines Simon Wood Mathematical Sciences, University of Bath, U.K. Generalied Additive Models (GAM) A GAM has a form something like: g{e(y i )} = η

More information

On the Importance of Dispersion Modeling for Claims Reserving: Application of the Double GLM Theory

On the Importance of Dispersion Modeling for Claims Reserving: Application of the Double GLM Theory On the Importance of Dispersion Modeling for Claims Reserving: Application of the Double GLM Theory Danaïl Davidov under the supervision of Jean-Philippe Boucher Département de mathématiques Université

More information

Array methods in statistics with applications to the modelling and forecasting of mortality. James Gavin Kirkby

Array methods in statistics with applications to the modelling and forecasting of mortality. James Gavin Kirkby Array methods in statistics with applications to the modelling and forecasting of mortality James Gavin Kirkby Submitted for the degree of Doctor of Philosophy Heriot-Watt University School of Mathematical

More information

SUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE

SUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE SUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE M. Lázaro 1, I. Santamaría 2, F. Pérez-Cruz 1, A. Artés-Rodríguez 1 1 Departamento de Teoría de la Señal y Comunicaciones

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions

More information

Homework sheet 4: EIGENVALUES AND EIGENVECTORS. DIAGONALIZATION (with solutions) Year ? Why or why not? 6 9

Homework sheet 4: EIGENVALUES AND EIGENVECTORS. DIAGONALIZATION (with solutions) Year ? Why or why not? 6 9 Bachelor in Statistics and Business Universidad Carlos III de Madrid Mathematical Methods II María Barbero Liñán Homework sheet 4: EIGENVALUES AND EIGENVECTORS DIAGONALIZATION (with solutions) Year - Is

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

GB2 Regression with Insurance Claim Severities

GB2 Regression with Insurance Claim Severities GB2 Regression with Insurance Claim Severities Mitchell Wills, University of New South Wales Emiliano A. Valdez, University of New South Wales Edward W. (Jed) Frees, University of Wisconsin - Madison UNSW

More information

Bayesian covariate models in extreme value analysis

Bayesian covariate models in extreme value analysis Bayesian covariate models in extreme value analysis David Randell, Philip Jonathan, Kathryn Turnbull, Mathew Jones EVA 2015 Ann Arbor Copyright 2015 Shell Global Solutions (UK) EVA 2015 Ann Arbor June

More information

Generalized linear mixed models for dependent compound risk models

Generalized linear mixed models for dependent compound risk models Generalized linear mixed models for dependent compound risk models Emiliano A. Valdez joint work with H. Jeong, J. Ahn and S. Park University of Connecticut ASTIN/AFIR Colloquium 2017 Panama City, Panama

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

Generalized linear mixed models (GLMMs) for dependent compound risk models

Generalized linear mixed models (GLMMs) for dependent compound risk models Generalized linear mixed models (GLMMs) for dependent compound risk models Emiliano A. Valdez, PhD, FSA joint work with H. Jeong, J. Ahn and S. Park University of Connecticut Seminar Talk at Yonsei University

More information

Lecture XI. Approximating the Invariant Distribution

Lecture XI. Approximating the Invariant Distribution Lecture XI Approximating the Invariant Distribution Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Invariant Distribution p. 1 /24 SS Equilibrium in the Aiyagari model G.

More information

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

Variable Selection and Model Choice in Survival Models with Time-Varying Effects Variable Selection and Model Choice in Survival Models with Time-Varying Effects Boosting Survival Models Benjamin Hofner 1 Department of Medical Informatics, Biometry and Epidemiology (IMBE) Friedrich-Alexander-Universität

More information

CHAPTER 3 Further properties of splines and B-splines

CHAPTER 3 Further properties of splines and B-splines CHAPTER 3 Further properties of splines and B-splines In Chapter 2 we established some of the most elementary properties of B-splines. In this chapter our focus is on the question What kind of functions

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

ABC methods for phase-type distributions with applications in insurance risk problems

ABC methods for phase-type distributions with applications in insurance risk problems ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon

More information

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN SOLUTIONS

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN SOLUTIONS INSTITUTE AND FACULTY OF ACTUARIES Curriculum 09 SPECIMEN SOLUTIONS Subject CSA Risk Modelling and Survival Analysis Institute and Faculty of Actuaries Sample path A continuous time, discrete state process

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

mboost - Componentwise Boosting for Generalised Regression Models

mboost - Componentwise Boosting for Generalised Regression Models mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten Hothorn Department of Statistics Ludwig-Maximilians-University Munich 13.8.2008 Boosting in a Nutshell Boosting

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

The convergence of stationary iterations with indefinite splitting

The convergence of stationary iterations with indefinite splitting The convergence of stationary iterations with indefinite splitting Michael C. Ferris Joint work with: Tom Rutherford and Andy Wathen University of Wisconsin, Madison 6th International Conference on Complementarity

More information

Modelling the Covariance

Modelling the Covariance Modelling the Covariance Jamie Monogan Washington University in St Louis February 9, 2010 Jamie Monogan (WUStL) Modelling the Covariance February 9, 2010 1 / 13 Objectives By the end of this meeting, participants

More information

Monte Carlo Method for Finding the Solution of Dirichlet Partial Differential Equations

Monte Carlo Method for Finding the Solution of Dirichlet Partial Differential Equations Applied Mathematical Sciences, Vol. 1, 2007, no. 10, 453-462 Monte Carlo Method for Finding the Solution of Dirichlet Partial Differential Equations Behrouz Fathi Vajargah Department of Mathematics Guilan

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.

More information

Multiplying matrices by diagonal matrices is faster than usual matrix multiplication.

Multiplying matrices by diagonal matrices is faster than usual matrix multiplication. 7-6 Multiplying matrices by diagonal matrices is faster than usual matrix multiplication. The following equations generalize to matrices of any size. Multiplying a matrix from the left by a diagonal matrix

More information

COS 424: Interacting with Data

COS 424: Interacting with Data COS 424: Interacting with Data Lecturer: Rob Schapire Lecture #14 Scribe: Zia Khan April 3, 2007 Recall from previous lecture that in regression we are trying to predict a real value given our data. Specically,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016 Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic

More information

SMSTC: Probability and Statistics

SMSTC: Probability and Statistics SMSTC: Probability and Statistics Fraser Daly Heriot Watt University October 2018 Fraser Daly (Heriot Watt University) SMSTC: Probability and Statistics October 2018 1 / 28 Outline Probability and Statistics

More information

Theorems. Least squares regression

Theorems. Least squares regression Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most

More information

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric

More information

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Andreas Groll 1 and Gerhard Tutz 2 1 Department of Statistics, University of Munich, Akademiestrasse 1, D-80799, Munich,

More information

Modelling general patterns of digit preference

Modelling general patterns of digit preference Modelling general patterns of digit preference Carlo G Camarda 1, Paul HC Eilers 2,3 and Jutta Gampe 1 1 Max Planck Institute for Demographic Research, Rostock, Germany 2 Methodology and Statistics, Faculty

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

MULTIDIMENSIONAL COVARIATE EFFECTS IN SPATIAL AND JOINT EXTREMES

MULTIDIMENSIONAL COVARIATE EFFECTS IN SPATIAL AND JOINT EXTREMES MULTIDIMENSIONAL COVARIATE EFFECTS IN SPATIAL AND JOINT EXTREMES Philip Jonathan, Kevin Ewans, David Randell, Yanyun Wu philip.jonathan@shell.com www.lancs.ac.uk/ jonathan Wave Hindcasting & Forecasting

More information

Chapter 7: Model Assessment and Selection

Chapter 7: Model Assessment and Selection Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has

More information

Spatial Process Estimates as Smoothers: A Review

Spatial Process Estimates as Smoothers: A Review Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed

More information

1. Let A be a 2 2 nonzero real matrix. Which of the following is true?

1. Let A be a 2 2 nonzero real matrix. Which of the following is true? 1. Let A be a 2 2 nonzero real matrix. Which of the following is true? (A) A has a nonzero eigenvalue. (B) A 2 has at least one positive entry. (C) trace (A 2 ) is positive. (D) All entries of A 2 cannot

More information

A Quick Tour of Linear Algebra and Optimization for Machine Learning

A Quick Tour of Linear Algebra and Optimization for Machine Learning A Quick Tour of Linear Algebra and Optimization for Machine Learning Masoud Farivar January 8, 2015 1 / 28 Outline of Part I: Review of Basic Linear Algebra Matrices and Vectors Matrix Multiplication Operators

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

3.1 Interpolation and the Lagrange Polynomial

3.1 Interpolation and the Lagrange Polynomial MATH 4073 Chapter 3 Interpolation and Polynomial Approximation Fall 2003 1 Consider a sample x x 0 x 1 x n y y 0 y 1 y n. Can we get a function out of discrete data above that gives a reasonable estimate

More information

Motivation Non-linear Rational Expectations The Permanent Income Hypothesis The Log of Gravity Non-linear IV Estimation Summary.

Motivation Non-linear Rational Expectations The Permanent Income Hypothesis The Log of Gravity Non-linear IV Estimation Summary. Econometrics I Department of Economics Universidad Carlos III de Madrid Master in Industrial Economics and Markets Outline Motivation 1 Motivation 2 3 4 5 Motivation Hansen's contributions GMM was developed

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

CHAPTER 10 Shape Preserving Properties of B-splines

CHAPTER 10 Shape Preserving Properties of B-splines CHAPTER 10 Shape Preserving Properties of B-splines In earlier chapters we have seen a number of examples of the close relationship between a spline function and its B-spline coefficients This is especially

More information

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Adaptive Piecewise Polynomial Estimation via Trend Filtering Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering

More information

Math 127C, Spring 2006 Final Exam Solutions. x 2 ), g(y 1, y 2 ) = ( y 1 y 2, y1 2 + y2) 2. (g f) (0) = g (f(0))f (0).

Math 127C, Spring 2006 Final Exam Solutions. x 2 ), g(y 1, y 2 ) = ( y 1 y 2, y1 2 + y2) 2. (g f) (0) = g (f(0))f (0). Math 27C, Spring 26 Final Exam Solutions. Define f : R 2 R 2 and g : R 2 R 2 by f(x, x 2 (sin x 2 x, e x x 2, g(y, y 2 ( y y 2, y 2 + y2 2. Use the chain rule to compute the matrix of (g f (,. By the chain

More information

Methodological challenges in research on consequences of sickness absence and disability pension?

Methodological challenges in research on consequences of sickness absence and disability pension? Methodological challenges in research on consequences of sickness absence and disability pension? Prof., PhD Hjelt Institute, University of Helsinki 2 Two methodological approaches Lexis diagrams and Poisson

More information

Modeling the Covariance

Modeling the Covariance Modeling the Covariance Jamie Monogan University of Georgia February 3, 2016 Jamie Monogan (UGA) Modeling the Covariance February 3, 2016 1 / 16 Objectives By the end of this meeting, participants should

More information

Deposited on: 07 September 2010

Deposited on: 07 September 2010 Lee, D. and Shaddick, G. (2008) Modelling the effects of air pollution on health using Bayesian dynamic generalised linear models. Environmetrics, 19 (8). pp. 785-804. ISSN 1180-4009 http://eprints.gla.ac.uk/36768

More information

PENALIZING YOUR MODELS

PENALIZING YOUR MODELS PENALIZING YOUR MODELS AN OVERVIEW OF THE GENERALIZED REGRESSION PLATFORM Michael Crotty & Clay Barker Research Statisticians JMP Division, SAS Institute Copyr i g ht 2012, SAS Ins titut e Inc. All rights

More information

Counts using Jitters joint work with Peng Shi, Northern Illinois University

Counts using Jitters joint work with Peng Shi, Northern Illinois University of Claim Longitudinal of Claim joint work with Peng Shi, Northern Illinois University UConn Actuarial Science Seminar 2 December 2011 Department of Mathematics University of Connecticut Storrs, Connecticut,

More information

Estimating the term structure of mortality

Estimating the term structure of mortality Insurance: Mathematics and Economics 42 (2008) 492 504 www.elsevier.com/locate/ime Estimating the term structure of mortality Norbert Hári a, Anja De Waegenaere b,, Bertrand Melenberg b,a, Theo E. Nijman

More information

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Penalized Splines, Mixed Models, and Recent Large-Sample Results Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong

More information

Monitoring actuarial assumptions in life insurance

Monitoring actuarial assumptions in life insurance Monitoring actuarial assumptions in life insurance Stéphane Loisel ISFA, Univ. Lyon 1 Joint work with N. El Karoui & Y. Salhi IAALS Colloquium, Barcelona, 17 LoLitA Typical paths with change of regime

More information

A DECOMPOSITION PROCEDURE BASED ON APPROXIMATE NEWTON DIRECTIONS

A DECOMPOSITION PROCEDURE BASED ON APPROXIMATE NEWTON DIRECTIONS Working Paper 01 09 Departamento de Estadística y Econometría Statistics and Econometrics Series 06 Universidad Carlos III de Madrid January 2001 Calle Madrid, 126 28903 Getafe (Spain) Fax (34) 91 624

More information

Massachusetts Institute of Technology Department of Economics Statistics. Lecture Notes on Matrix Algebra

Massachusetts Institute of Technology Department of Economics Statistics. Lecture Notes on Matrix Algebra Massachusetts Institute of Technology Department of Economics 14.381 Statistics Guido Kuersteiner Lecture Notes on Matrix Algebra These lecture notes summarize some basic results on matrix algebra used

More information

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004 Estimation in Generalized Linear Models with Heterogeneous Random Effects Woncheol Jang Johan Lim May 19, 2004 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

MAT 1332: CALCULUS FOR LIFE SCIENCES. Contents. 1. Review: Linear Algebra II Vectors and matrices Definition. 1.2.

MAT 1332: CALCULUS FOR LIFE SCIENCES. Contents. 1. Review: Linear Algebra II Vectors and matrices Definition. 1.2. MAT 1332: CALCULUS FOR LIFE SCIENCES JING LI Contents 1 Review: Linear Algebra II Vectors and matrices 1 11 Definition 1 12 Operations 1 2 Linear Algebra III Inverses and Determinants 1 21 Inverse Matrices

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Statistics 360/601 Modern Bayesian Theory

Statistics 360/601 Modern Bayesian Theory Statistics 360/601 Modern Bayesian Theory Alexander Volfovsky Lecture 5 - Sept 12, 2016 How often do we see Poisson data? 1 2 Poisson data example Problem of interest: understand different causes of death

More information

MTH5112 Linear Algebra I MTH5212 Applied Linear Algebra (2017/2018)

MTH5112 Linear Algebra I MTH5212 Applied Linear Algebra (2017/2018) MTH5112 Linear Algebra I MTH5212 Applied Linear Algebra (2017/2018) COURSEWORK 3 SOLUTIONS Exercise ( ) 1. (a) Write A = (a ij ) n n and B = (b ij ) n n. Since A and B are diagonal, we have a ij = 0 and

More information

Bayesian density estimation from grouped continuous data

Bayesian density estimation from grouped continuous data Bayesian density estimation from grouped continuous data Philippe Lambert,a, Paul H.C. Eilers b,c a Université de Liège, Institut des sciences humaines et sociales, Méthodes quantitatives en sciences sociales,

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

Flexible modelling of the cumulative effects of time-varying exposures

Flexible modelling of the cumulative effects of time-varying exposures Flexible modelling of the cumulative effects of time-varying exposures Applications in environmental, cancer and pharmaco-epidemiology Antonio Gasparrini Department of Medical Statistics London School

More information

Problem # Max points possible Actual score Total 120

Problem # Max points possible Actual score Total 120 FINAL EXAMINATION - MATH 2121, FALL 2017. Name: ID#: Email: Lecture & Tutorial: Problem # Max points possible Actual score 1 15 2 15 3 10 4 15 5 15 6 15 7 10 8 10 9 15 Total 120 You have 180 minutes to

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

arxiv: v3 [stat.me] 11 Apr 2018

arxiv: v3 [stat.me] 11 Apr 2018 Gaussian Process Models for Mortality Rates and Improvement Factors Mike Ludkovski Jimmy Risk Howard Zail arxiv:1608.08291v3 [stat.me] 11 Apr 2018 April 13, 2018 Abstract We develop a Gaussian process

More information

Chapter 5: Generalized Linear Models

Chapter 5: Generalized Linear Models w w w. I C A 0 1 4. o r g Chapter 5: Generalized Linear Models b Curtis Gar Dean, FCAS, MAAA, CFA Ball State Universit: Center for Actuarial Science and Risk Management M Interest in Predictive Modeling

More information

Exercise Set Suppose that A, B, C, D, and E are matrices with the following sizes: A B C D E

Exercise Set Suppose that A, B, C, D, and E are matrices with the following sizes: A B C D E Determine the size of a given matrix. Identify the row vectors and column vectors of a given matrix. Perform the arithmetic operations of matrix addition, subtraction, scalar multiplication, and multiplication.

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25

More information

CSL361 Problem set 4: Basic linear algebra

CSL361 Problem set 4: Basic linear algebra CSL361 Problem set 4: Basic linear algebra February 21, 2017 [Note:] If the numerical matrix computations turn out to be tedious, you may use the function rref in Matlab. 1 Row-reduced echelon matrices

More information

Compressive Inference

Compressive Inference Compressive Inference Weihong Guo and Dan Yang Case Western Reserve University and SAMSI SAMSI transition workshop Project of Compressive Inference subgroup of Imaging WG Active members: Garvesh Raskutti,

More information

Estimation of spatiotemporal effects by the fused lasso for densely sampled spatial data using body condition data set from common minke whales

Estimation of spatiotemporal effects by the fused lasso for densely sampled spatial data using body condition data set from common minke whales Estimation of spatiotemporal effects by the fused lasso for densely sampled spatial data using body condition data set from common minke whales Mariko Yamamura 1, Hirokazu Yanagihara 2, Keisuke Fukui 3,

More information

Maria Cameron Theoretical foundations. Let. be a partition of the interval [a, b].

Maria Cameron Theoretical foundations. Let. be a partition of the interval [a, b]. Maria Cameron 1 Interpolation by spline functions Spline functions yield smooth interpolation curves that are less likely to exhibit the large oscillations characteristic for high degree polynomials Splines

More information