School on Modelling Tools and Capacity Building in Climate and Public Health April 2013

Similar documents
Beyond Mean Regression

gamboostlss: boosting generalized additive models for location, scale and shape

Improving linear quantile regression for

Analysing geoadditive regression data: a mixed model approach

Detection of risk factors for obesity in early childhood with quantile regression methods for longitudinal data

Impact Evaluation Technical Workshop:

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape

Simple Linear Regression

Spatio-Temporal Expectile Regression Models

Unconditional Quantile Regressions

Principal components in an asymmetric norm

Chapter 1 - Lecture 3 Measures of Location

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

Multivariate Lineare Modelle

Asymmetric least squares estimation and testing

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Lecture 12: Small Sample Intervals Based on a Normal Population Distribution

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Introduction to Quantile Regression

Statistics & Data Sciences: First Year Prelim Exam May 2018

Approximate Median Regression via the Box-Cox Transformation

Post-exam 2 practice questions 18.05, Spring 2014

Treatment Effects Beyond the Mean: A Practical Guide. Using Distributional Regression

Analyzing Highly Dispersed Crash Data Using the Sichel Generalized Additive Models for Location, Scale and Shape

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Generalized Linear Models Introduction

Stat 704 Data Analysis I Probability Review

Module 11: Linear Regression. Rebecca C. Steorts

1. Exploratory Data Analysis

Modelling geoadditive survival data

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Subject CS1 Actuarial Statistics 1 Core Principles

SUMMARIZING MEASURED DATA. Gaia Maselli

Summary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1

Gaussian Random Fields

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Gradient boosting in Markov-switching generalized additive models for location, scale and shape

UNIVERSITY OF TORONTO Faculty of Arts and Science

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

Conditional Transformation Models

A Resampling Method on Pivotal Estimating Functions

Spatial Modeling of grub density of the may beetle (forest cockchafer: Melolontha hippocastani) in the Hessischen Ried area

The Simulation Study to Test the Performance of Quantile Regression Method With Heteroscedastic Error Variance

Semiparametric Generalized Linear Models

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

Economics 536 Lecture 21 Counts, Tobit, Sample Selection, and Truncation

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook

12 The Analysis of Residuals

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Additive Terms. Flexible Regression and Smoothing. Mikis Stasinopoulos 1 Bob Rigby 1

MATH 644: Regression Analysis Methods

QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Quantile regression in varying coefficient models

Quantile Regression Methods for Reference Growth Charts

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Simple Linear Regression

3 Joint Distributions 71

Stat 5102 Final Exam May 14, 2015

Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY. SECOND YEAR B.Sc. SEMESTER - III

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

On the L p -quantiles and the Student t distribution

Chapter 8. Quantile Regression and Quantile Treatment Effects

On the Dependency of Soccer Scores - A Sparse Bivariate Poisson Model for the UEFA EURO 2016

Exam Applied Statistical Regression. Good Luck!

Single Index Quantile Regression for Heteroscedastic Data

Quartiles, Deciles, and Percentiles

PEP-PMMA Technical Note Series (PTN 01) Percentile Weighed Regression

BIOS 2083: Linear Models

Learning Objectives for Stat 225

Quantile regression for longitudinal data using the asymmetric Laplace distribution

Lecture 7: Dynamic panel models 2

Continuous random variables

Generalized Linear Models. Kurt Hornik

Sampling Distributions of Statistics Corresponds to Chapter 5 of Tamhane and Dunlop

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

Economics 326 Methods of Empirical Research in Economics. Lecture 7: Con dence intervals

A Complete Spatial Downscaler

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Simple Linear Regression

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

An Introduction to Bayesian Linear Regression

STAT5044: Regression and Anova

Lec 1: An Introduction to ANOVA

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression

Regression Models - Introduction

Final Exam STAT On a Pareto chart, the frequency should be represented on the A) X-axis B) regression C) Y-axis D) none of the above

ECON 5350 Class Notes Review of Probability and Distribution Theory

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Introduction. The Linear Regression Model One popular model is the linear regression model. It writes as :

9. Linear Regression and Correlation

Simple Linear Regression Analysis

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Problem Set # 1. Master in Business and Quantitative Methods

Z score indicates how far a raw score deviates from the sample mean in SD units. score Mean % Lower Bound

MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Transcription:

2453-14 School on Modelling Tools and Capacity Building in Climate and Public Health 15-26 April 2013 Expectile and Quantile Regression and Other Extensions KAZEMBE Lawrence University of Malawi Chancellor College Faculty of Science Department of Mathematical Sciences 18 Chirunga Road, 0000 Zomba MALAWI

Expectile and Quantile Regression and Other Extensions Lawrence Kazembe University of Namibia Windhoek, Namibia A presentation at School on Modelling Tools and Capacity Building in Climate and Public Health ICTP, Trieste, Italy 1

Objectives and Questions Objective: Introduce other regression beyond the mean Are there median regression models What about regression at the mode? Can we fit regression model at any other locations too? What of the variance, skewness and kurtosis? 2

Prelimaries Given a r.v. y f( ) then -E(Y )=μ defines the mean; -Var(Y )=σ 2 is the variance. General assumption: two measures completely define the distribution (stationarity concept) Classical regression is often characterized by relating to the mean -E(y x) =βx if x is the set of covariates. -y N(μ, σ 2 ), then μ = βx 3

-y Bern(π), then log ( ) π 1 π = βx -y P ois(λ), then log(λ) =βx. Statisticians are mean lovers (Friedman, Friedman & Amoo) It goes like: We are mean lovers. Deviation is considered normal. We are right 95% of the time. Statisticians do it discretely and continuously. We can legally comment on someone s posterior distribution. Why the mean? -Mean regression reduces complexity -However, the mean is not sufficient to describe a distribution.

Examples Plot (Rent in Munich, Kneib et al) 4

Examples Quantile Plot (Rent in Munich, Kneib et al) 5

Example (Malaria vs Altitude, Kazembe et al) 6

Example (Botswana data) 7

Motivating Examples Epidemiology and Public Health: -Investigating height, weight and body mass index as a function of different covariates e.g. age (Wei et al. 2006) -Exploring stunting curves in African and Indian children (Yue et al. 2012) -Generating age-specific centile charts (Chitty et al. 1994). Economics: -Study of determinants of wages (Koenker, 2005). 8

Education: -The performance of students in public schools (Koenker and Hallock, 2001). Climate data: - Spatiotemporal analysis of Boston temperature (Reich 2012)

Double GLM Classical regression assumes a homogeneous variance y x,ε N(βx,σ 2 ) ε N(0,σ 2 ) However variance heterogeneity (heteroscedasticity) is an order in real-life problems The variance of the response may depend on the covariates 9

Normal regression example: -Regression for mean and variance of a normal distribution y i = η i1 + exp(η 2i )ε i, ε i N(0, 1) such that E(y i x i )=η 1i Var(y i x i ) = exp(η 2i ) 2

Regression for location, scale and shape In general: Specify a distribution for the response on all parameters and relate to the predictors. The generalized additive model location, scale and shape (GAMLSS) is a statistical model developed by Rigby and Stasinopoulos (2005). For a probability (density) function f(y i μ i,σ i,ν i,τ i ) conditional on (μ i,σ i,ν i,τ i ) a vector of four distribution parameters, each of which can be a function 10

to the explanatory variables (X) g 1 (μ) =η 1 = β 1 X 1 + g 2 (σ) =η 2 = β 2 X 2 + g 3 (ν) =η 3 = β 3 X 3 + g 4 (τ) =η 4 = β 4 X 4 + J 1 j=1 J 2 j=1 J 3 j=1 J 4 j=1 h j1 (x j1 ) (1) h j2 (x j2 ) (2) h j3 (x j3 ) (3) h j4 (x j4 ) (4)

GAMLSS assumes different models for each parameter -Model 1: Mean regression -Model 2: Dispersion regression -Model 3: Skewness regression -Model 4: Kurtosis regression Other summary measures are also permissible

Quantile Regression Quantile, Centile, and Percentile are terms that can be used interchangeably -A 0.5 quantile 50 percentile, which is a median. Related terms are quartiles, quintiles and deciles: divides distribution into 4, 5, and 10 parts Quantiles are related to the median. Suppose Y has a cumulative distribution F (y) = P (Y τ). Then τth quantile of Y is defined to be Q(τ) = τ f(y)dy =inf{y : τ F (y)} 11

for 0 <τ<1. The quantile regression drops the parametric assumption for the error/ response distribution. Fit separate models for different asymmetries τ [0, 1]: y i = η iτ + ε iτ Instead of assuming E(y i η iτ = 0, one assumes or P (ε iτ 0) = τ F yi (η iτ )=P (ε iτ 0) = τ

This gives a set of regression function at any assumed quantile value. Assumptions: -it is distribution-free since it does not make any specific assumption on the type of errors -it does not even require i.i.d errors -it allows for heterogeneity.

Expectile Regression Expectiles are related to the mean, as are quantiles related to the median. ni=1 y i η i min median regression ni=1 w τ (y i,η iτ ) y i η iτ min quantile regression ni=1 y i η i 2 min n i=1 w τ (y i,η iτ ) y i η iτ 2 min mean regression expectile regression where w τ is the check function defined by 12

w τ = τ if y i >μ(τ) 1 τ if y i μ(τ) for some population expectile μ(τ) for different values of an asymmetric parameter 0 <τ<1. Expectiles are obtained by solving eτ τ = y e τ f y (y)dy y e τ f y (y)dy = G y (e τ ) e τ F y (e τ ) 2(G y (e τ ) e τ F y (e τ ))+(e τ μ) where -f y ( ) andf y ( ) denote the density and cumulative distribution function of y. -G y (e) = e yf y (y)dy is the partial moment function of y and -G y ( ) =μ is the expectation of y.

Worked example - binomial data 13

500 1500 ALT 500 1500 ALT 500 1500 ALT s(alt, df = c(2, 4, 2)):1 1 0 1 2 s(alt, df = c(2, 4, 2)):2 0.4 0.0 0.2 0.4 0.6 0.8 s(alt, df = c(2, 4, 2)):3 0.2 0.0 0.1 0.2 500 1500 ALT 500 1500 ALT 500 1500 ALT s(alt, df = c(2, 4, 2)):4 0.10 0.05 0.00 0.05 0.10 s(alt, df = c(2, 4, 2)):5 0.3 0.2 0.1 0.0 0.1 s(alt, df = c(2, 4, 2)):6 0.3 0.2 0.1 0.0 0.1

8 10 14 TMIN 8 10 14 TMIN 8 10 14 TMIN s(tmin, df = c(2, 4, 2)):1 1.0 0.5 0.0 0.5 1.0 1.5 s(tmin, df = c(2, 4, 2)):2 0.4 0.0 0.2 0.4 0.6 0.8 s(tmin, df = c(2, 4, 2)):3 0.1 0.0 0.1 0.2 0.3 0.4 8 10 14 TMIN 8 10 14 TMIN 8 10 14 TMIN 14 s(tmin, df = c(2, 4, 2)):4 0.2 0.1 0.0 0.1 0.2 s(tmin, df = c(2, 4, 2)):5 0.5 0.3 0.1 0.1 s(tmin, df = c(2, 4, 2)):6 0.5 0.3 0.1 0.1

25 30 35 TMAX 25 30 35 TMAX 25 30 35 TMAX s(tmax, df = c(2, 4, 2)):1 2 1 0 1 s(tmax, df = c(2, 4, 2)):2 0.8 0.4 0.0 0.4 s(tmax, df = c(2, 4, 2)):3 0.2 0.0 0.2 0.4 25 30 35 TMAX 25 30 35 TMAX 25 30 35 TMAX 15 s(tmax, df = c(2, 4, 2)):4 0.10 0.00 0.10 0.20 s(tmax, df = c(2, 4, 2)):5 0.2 0.1 0.0 0.1 0.2 s(tmax, df = c(2, 4, 2)):6 0.05 0.00 0.05 0.10 0.15

Reference Hudson, I. L., Rea, A., Dalrymple, M. L., Eilers, P. H. C (2008). Climate impacts on sudden infant death syndrome: a GAMLSS approach. Proceedings of the 23rd international workshop on statistical modelling pp. 277280. Beyerlein, A., Fahrmeir, L., Mansmann, U., Toschke., A. M (2008). Alternative regression models to assess increase in childhood BM. BMC Medical Research Methodology, 8(59). Smyth, G. K (1989). Generalized linear models with varying dispersion. J. R. Statist. Soc. B, 51, 4760. Sobotka, F., and T. Kneib, 2010. Geoadditive Expectile Regression. Computational Statistics and Data Analysis, doi: 10.1016/j.csda.2010.11.015. 16