Robust Inference in Generalized Linear Models

Similar documents
WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Minimum Hellinger Distance Estimation with Inlier Modification

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Generalized Linear Models 1

Linear Regression Models P8111

Stat 5101 Lecture Notes

Generalized Linear Models

Generalized linear models

Comparison of Estimators in GLM with Binary Data

Generalized Linear Models

Consider fitting a model using ordinary least squares (OLS) regression:

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

3 Joint Distributions 71

A Statistical Distance Approach to Dissimilarities in Ecological Data

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Weighted tests of homogeneity for testing the number of components in a mixture

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Nonlinear Models. What do you do when you don t have a line? What do you do when you don t have a line? A Quadratic Adventure

Robust estimation for linear regression with asymmetric errors with applications to log-gamma regression

Modeling Overdispersion

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Statistical Simulation An Introduction

2015 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Saddlepoint Test in Measurement Error Models

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Generalized Linear Models

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Journal of Statistical Software

Breakdown points of trimmed likelihood estimators and related estimators in generalized linear models

A new robust model selection method in GLM with application to ecological data

robmixglm: An R package for robust analysis using mixtures

A strategy for modelling count data which may have extra zeros

Subject CS1 Actuarial Statistics 1 Core Principles

HANDBOOK OF APPLICABLE MATHEMATICS

Using R in 200D Luke Sonnet

The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.1/27

Package CorrMixed. R topics documented: August 4, Type Package

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

REFERENCES AND FURTHER STUDIES

Frontier Analysis with R

Non-Gaussian Response Variables

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Practical Statistics for the Analytical Scientist Table of Contents

Metric Predicted Variable on Two Groups

Introduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland.

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Classification. Chapter Introduction. 6.2 The Bayes classifier

This manual is Copyright 1997 Gary W. Oehlert and Christopher Bingham, all rights reserved.

Log-linear Models for Contingency Tables

Generalized Estimating Equations

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data

Generalized Additive Models

AN INTRODUCTION TO PROBABILITY AND STATISTICS

R-companion to: Estimation of the Thurstonian model for the 2-AC protocol

LOGISTIC REGRESSION Joseph M. Hilbe

Package ForwardSearch

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science


Bayesian Dynamic Modeling for Space-time Data in R

Duration of Unemployment - Analysis of Deviance Table for Nested Models

Statistica Sinica Preprint No: SS R2

Package sklarsomega. May 24, 2018

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION

TUTORIAL 8 SOLUTIONS #

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

How to work correctly statistically about sex ratio

Journal of Statistical Software

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Age 55 (x = 1) Age < 55 (x = 0)

Exam Applied Statistical Regression. Good Luck!

A Handbook of Statistical Analyses Using R 3rd Edition. Torsten Hothorn and Brian S. Everitt

Robust Stochastic Frontier Analysis: a Minimum Density Power Divergence Approach

arxiv: v4 [math.st] 16 Nov 2015

Introduction to General and Generalized Linear Models

Logistic Regression 21/05

SUPPLEMENTARY MATERIAL TECHNICAL APPENDIX Article: Comparison of control charts for monitoring clinical performance using binary data

Sampling Inspection. Young W. Lim Wed. Young W. Lim Sampling Inspection Wed 1 / 26

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Semiparametric Generalized Linear Models

Robust regression in R. Eva Cantoni

Checking model assumptions with regression diagnostics

Nonparametric Tests for Multi-parameter M-estimators

GOV 2001/ 1002/ E-2001 Section 3 Theories of Inference

Hypothesis Tests and Confidence Intervals Involving Fitness Landscapes fit by Aster Models By Charles J. Geyer and Ruth G. Shaw Technical Report No.

ORDER RESTRICTED STATISTICAL INFERENCE ON LORENZ CURVES OF PARETO DISTRIBUTIONS. Myongsik Oh. 1. Introduction

Derivation of Monotone Likelihood Ratio Using Two Sided Uniformly Normal Distribution Techniques

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Spline Density Estimation and Inference with Model-Based Penalities

Logistic Regressions. Stat 430

Projection Estimators for Generalized Linear Models

Statistical Inference

Chapter 5 Exercises 1

6. INFLUENTIAL CASES IN GENERALIZED LINEAR MODELS. The Generalized Linear Model

Transcription:

Robust Inference in Generalized Linear Models Claudio Agostinelli claudio@unive.it Dipartimento di Statistica Università Ca Foscari di Venezia San Giobbe, Cannaregio 873, Venezia Tel. 041 2347446, Fax. 041 2347444 http://www.dst.unive.it/~claudio 12 August 2008 Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 1/ 50

Outline 1 Introduction Weighted Likelihood Estimators How to construct the Weights 2 Robust estimation for GLM Standard weights Weights based on the Anscombe residuals What is available 3 TODO list 4 Conclusions Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 2/ 50

Cyclamen dataset Frequency 0 50 100 150 200 250 300 0 2 4 6 8 10 12 14 16 18 20 22 24 26 Number of Flowers This data (www.statsci.org/data/general/cyclamen.html) comes from an experiment on induction of flowering of cyclamen. R code Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 3/ 50

Weighted Likelihood Estimating Equations The estimating equations of WLEE is a modified version of the MLE equations where at each score is associated a weight defined as follows w(x; θ, f ) = A(δ(x; θ, f )) + 1 δ(x; θ, f ) + 1 where A(δ) is the Residual Adjustment Function. Hence the WLEE estimator is the solution of n w(x i ; θ, f )u(x i ; θ) = 0 i=1 where u(x; θ) is the score function for the model. Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 4/ 50

Pearson Residuals In our approach, outliers are observations that are highly unlikely to occur under the assumed model [see Markatou et al., 1995, 1998]. This definition is well adapted in many context since it is based on a probabilistic distance. One way to measure this discrepancy is to use the Pearson Residuals [Lindsay, 1994] defined as follows δ(x, θ, f ) = f (x) m (x; θ) 1 where f (x) is a non parametric density estimator based on the data and m (x; θ) is a smoothed version of the density of the model. Note that in the discrete case the smoothing is needed. Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 5/ 50

Pearson Residuals δ(x, 8, f(x)) 0 200 400 600 800 1000 0 5 10 15 20 25 x Pearson Residuals (δ(x; λ = 8, f (x))) for f (x) = 0.98m(x; 8) + 0.02m(x; 20) where m(x; λ) is the probability distribution of a Poisson distribution. R code Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 6/ 50

Residual Adjustment Function In order to construct a weight function attached to the score function we need to choose a Residual Adjustment Function (RAF) [see Lindsay, 1994]. Here we will choose it inside two different families Power Divergence Measure [Cressie and Read, 1984, 1988]; Generalized Kullback Leibler Disparity [Park and Basu, 2003]. Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 7/ 50

Power Divergence Measure RAF Residual Adjustment Function based on the Power Divergence Measure was introduced in Lindsay [1994] Special cases: { ( τ (δ + 1) A pdm (δ, τ) = 1/τ 1 ) τ < log(δ + 1) tau = τ = 1: Maximum likelihood; τ = 2: Hellinger distance; τ : Kullback Leibler divergence; τ = 1: Neyman s Chi Square. Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 8/ 50

Power Divergence Measure A(δ) 2 1 0 1 2 3 4 ML HD KL NCS 0 2 4 6 8 10 Residual Adjustment Function: Hellinger (HD), Kullback Leibler (KL), Neyman s Chi square (NCS) R code Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 9/ 50 δ

Generalized Kullback Leibler RAF Residual Adjustment Function based on the Generalized Kullback Leibler divergence was introduced in Park and Basu [2003] log(τδ + 1) A gkl (δ, τ) = 0 τ 1 τ Special cases: τ 0: Maximum likelihood; τ = 1: Kullback Leibler divergence. It could be interpreted as a linear combination between the Likelihood divergence and the Kullback Leibler divergence. Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 10/ 50

Generalized Kullback Leibler A(δ) 1 0 1 2 3 4 5 ML GKL0.25 GKL0.5 KL 0 2 4 6 8 10 δ Residual Adjustment Function: Generalized Kullback Leibler R code Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 11/ 50

Example: Poisson distribution The WLEE for the Poisson distribution is the solution(s) of the following fixed point equation λ = n i=1 w(x i; λ)x i n i=1 w(x i; λ) this is implemented in function wle.poisson in the package wle. For the cyclamen dataset we have > wlepois <- wle.poisson(flowers) > wlepois Call: wle.poisson(x = Flowers) lambda: [1] 8.66 Number of solutions 1 Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 12/ 50

Weights 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 Flowers Weights for the Cyclamen dataset R code Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 13/ 50

Distribution functions 0.0 0.2 0.4 0.6 0.8 1.0 Empirical Estimated 5 10 15 20 25 Flowers Cyclamen dataset: Comparison between the empirical distribution function and the estimated model distribution R code Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 14/ 50

Literature review A. Bianco and VJ. Yohai. Robust estimation in the logistic regression model, in robust statistics. In H. Rieder, editor, Data Analysis and Computer Intensive Methods, Proceedings of the workshop in honor of Peter J. Huber, 1996. E. Cantoni and E. Ronchetti. Robust inference for generalized linear models. Journal of the American Statistical Association, 2001. M. Markatou, A. Basu, and BG. Lindsay. Weighted likelihood estimating equations: The discrete case with applications to logistic regression. Journal of Statistical Planning and Inference, 1997. CH. Muller and N. Neykov. Breakdown points of trimmed likelihood estimators and related estimators in generalized linear models. Journal of Statistical Planning and Inference, 2003. PJ. Rousseeuw and A. Christmann. Robustness against separation and outliers in logistic regression. Computational Statistics & Data Analysis, 2003. LA. Stefanski, RJ. Carroll, and D. Ruppert. Optimally bounded score functions for generalized linear models with applications to logistic regression. Biometrika, 1986. Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 15/ 50

WLEE for GLM We need to consider two situations: 1 levels of the predictors with a sufficient number of replications; In this case we can define the weights as ususal by consider the conditional distribution of the response variable with respect to the corresponding levels. 2 levels of the predictors with one or too few number of replications; Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 16/ 50

Example: Flowers Variety 0 20 40 60 80 0 20 40 60 80 0 3 6 9 13 17 21 25 Flowers Variety=1 0 3 6 9 13 17 21 25 Flowers Variety=2 0 20 40 60 80 0 20 40 60 80 0 3 6 9 13 17 21 25 Flowers Variety=3 0 3 6 9 13 17 21 25 Flowers Variety=4 Conditional distribution of Flowers given Variety R code Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 17/ 50

Example: Flowers Variety > outvar <- glm(flowers ~ Variety, + family = poisson) > outvar Call: glm(formula = Flowers Variety, family = poisson) Coefficients: (Intercept) Variety2 Variety3 2.1925842-0.0989213-0.0009307 Variety4-0.0434527 Degrees of Freedom: 1917 Total (i.e. Null); 1914 Residual Null Deviance: 1256 Residual Deviance: 1230 AIC: 8868 > wleoutvar <- wle.glm(flowers ~ + Variety, family = poisson) > wleoutvar Call: wle.glm(formula = Flowers Variety, family = poisson) Root: 1 Coefficients: (Intercept) Variety2 Variety3 2.192945-0.099111-0.005113 Variety4-0.044739 Degrees of Freedom: 1917 Total (i.e. Null); 1914 Residual Null Deviance: 1227 Residual Deviance: 1202 AIC: 8815 Number of solutions 1 Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 18/ 50

Weights 0.0 0.4 0.8 Weights 0.0 0.4 0.8 0 5 10 15 20 25 0 5 10 15 20 25 Flowers Variety=1 Flowers Variety=2 Weights 0.0 0.4 0.8 Weights 0.0 0.4 0.8 0 5 10 15 20 25 0 5 10 15 20 25 Flowers Variety=3 Flowers Variety=4 Weights for the Cyclamen dataset in the model Flowers Variety R code Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 19/ 50

Distribution functions 0.0 0.4 0.8 Distribution functions 0.0 0.4 0.8 0 5 10 15 20 25 Flowers Variety=1 0 5 10 15 20 25 Flowers Variety=2 Distribution functions 0.0 0.4 0.8 Distribution functions 0.0 0.4 0.8 0 5 10 15 20 25 Flowers Variety=3 0 5 10 15 20 25 Flowers Variety=4 Conditional distributions for the Cyclamen dataset in the model Flowers Variety R code Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 20/ 50

WLEE for GLM Case 2 use observations in a neighborhood of the level predictors in order to evaluate the weights of the response in that level (This is implemented using the parameter window.size in the wle.glm.control function); use weights based on the asymptotic distribution of the Anscombe residuals (use.asymptotic in wle.glm.control). Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 21/ 50

Anscombe residuals They are introduced in D. R. Cox and E. J. Snell. A general definition of residuals. Journal of the Royal Statistical Society. Series B (Methodological), 30(2):248 275, 1968. R. M. Loynes. On cox and snell s general definition of residuals. Journal of the Royal Statistical Society. Series B (Methodological), 31(1):103 106, 1969. Donald A. Pierce and Daniel W. Schafer. Residuals in generalized linear models. Journal of the American Statistical Association, 81(396):977 986, 1986. Rollin Brant. Residual components in generalized linear models. The Canadian Journal of Statistics, 15(2):115 126, 1987. Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 22/ 50

Anscombe residuals The Anscombe residuals are obtain by the following transformation in both the Y and µ A(y) = Var(µ) 1/3 dµ where Var(µ) is the variance function expressed in terms of µ. The Anscombe residual adjusts for the scale of the variance by dividing by µ A(y) τ 2 where τ 2 = 2 θ 2 b(θ). Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 23/ 50

Anscombe residuals Poisson [Cox and Snell, 1968] 3 2 (Y 2/3 (µ 1/6) 2/3 ) µ 1/6 Binomial (this function includes a bias correction term) [Cox and Snell, 1968] φ(y /m) φ(θ 1 6 (1 2θ)/m) θ 1/6 (1 θ) 1/6 / m where φ(u) = u 0 t 1/3 (1 t) 1/3 dt (0 u 1) could be computed using the Incomplete Beta function. Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 24/ 50

Example: Flowers Variety > wleoutvarasy <- wle.glm(flowers ~ + Variety, family = poisson, + control = list(glm = glm.control(), + wle = wle.glm.control(use.asymptotic = length(flo > wleoutvarasy Call: wle.glm(formula = Flowers Variety, family = poisson, control = list(glm = glm.control(), wle = wle.glm.control(use.asymptotic = length(flowers)))) Root: 1 Coefficients: (Intercept) Variety2 Variety3 2.194016-0.102452-0.005794 Variety4-0.045259 Degrees of Freedom: 1917 Total (i.e. Null); 1914 Residual Null Deviance: 1209 Residual Deviance: 1183 AIC: 8594 Number of solutions 1 Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 25/ 50

Weights 0.0 0.4 0.8 Weights 0.0 0.4 0.8 0 5 10 15 20 25 0 5 10 15 20 25 Flowers Variety=1 Flowers Variety=2 Weights 0.0 0.4 0.8 Weights 0.0 0.4 0.8 0 5 10 15 20 25 0 5 10 15 20 25 Flowers Variety=3 Flowers Variety=4 Comparison of Weights based on the Poisson and on the Anscombe residuals for the Cyclamen dataset in the model Flowers Variety R code Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 26/ 50

What is implemented Family: Poisson, Binomial, QuasiPoisson, QuasiBinomial; Estimation process; Print method. Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 27/ 50

How it is implemented wle.glm the main function. It accepts all the arguments of glm. The control argument has two components: glm which is the usual argument given by glm.control() and the wle argument given by wle.glm.control(). wle.glm.fit the function that performs the fit. It calls glm.fit and wle.glm.weights; wle.glm.weights the function that evaluates the weights for a given set of parameters; wle.glm.control controls the arguments for the WLEE part; residuals.anscombe which evaluates the Anscombe residuals for several family, since now, Poisson, Binomial, Gamma, InverseGamma. Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 28/ 50

TODO Short term Documentation; Check print method (S3); Summary method (S3); Plot method (S3) similar to the one available for wle.lm. Long term Anova function (anova.wle.glm, S3), deviance (deviance.wle.glm, S3) and tests by the extension of the results in Agostinelli and Markatou [2001]; Add more families: Gamma, InverseGamma, Normal. Model selection: the actual weights AIC (in print method) is based on actual model, this is not the best way to do, we will implement a method extractaic.wle.glm with weights based on the full model; Very Long term build function wle.dglm for over and under dispersed models; build function wle.bigglm for big datasets. Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 29/ 50

Conclusions We introduced Robust estimators for Generalized Linear Models based on Weighted Likelihood Estimating Equations; We use the package wle; The Sweave of the presentation would be available at: www.dst.unive.it/~claudio/r/index.html; Functions will be available in the next release of the wle package, version 0.9-4. Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 30/ 50

> cyclamen <- na.omit(read.table("./cyclamen.txt", + header = TRUE)) > cyclamen$variety <- factor(cyclamen$variety) > cyclamen$regimem <- factor(cyclamen$regimem) > cyclamen$fertilizer <- factor(cyclamen$fertilizer) > Flowers <- cyclamen$flowers > Variety <- cyclamen$variety > par(mai = c(1, 1, 0, 0)) > barplot(table(factor(flowers, levels = 0:26)), + xlab = "Number of Flowers", + ylab = "Frequency") Return Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 31/ 50

> delta <- function(x, eps = 0.02) { + res <- ((1 - eps) * dpois(x, + 8) + eps * dpois(x, 20))/dpois(x, + 8) - 1 + return(res) + } > plot(0:25, delta(0:25), xlab = "x", + ylab = expression(delta(x, + 8, f(x)))) Return Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 32/ 50

> Apdm <- function(x, tau) { + if (tau == Inf) + a <- log(x + 1) + else a <- tau * ((x + 1)^(1/tau) - + 1) + return(a) + } > plot(function(x) Apdm(x, tau = 2), + from = -1, to = 10, xlab = expression(delta), + ylab = expression(a(delta)), + col = 4) > plot(function(x) Apdm(x, tau = Inf), + from = -1, to = 10, col = 5, + add = TRUE) > plot(function(x) Apdm(x, tau = -1), + from = -1, to = 10, col = 6, + add = TRUE) Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 33/ 50

> abline(0, 1, col = 3) > abline(h = 0, lty = 3) > abline(v = 0, lty = 3) > legend(6, -0.1, legend = c("ml", + "HD", "KL", "NCS"), col = 3:6, + lty = rep(1, 4)) Return Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 34/ 50

> Agkl <- function(x, tau) { + if (tau == 0) + a <- x + else a <- log(tau * x + 1)/tau + return(a) + } > plot(function(x) Agkl(x, tau = 0.25), + from = -1, to = 10, xlab = expression(delta), + ylab = expression(a(delta)), + col = 4) > plot(function(x) Agkl(x, tau = 0.5), + from = -1, to = 10, col = 5, + add = TRUE) > plot(function(x) Agkl(x, tau = 1), + from = -1, to = 10, col = 6, + add = TRUE) > abline(0, 1, col = 3) Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 35/ 50

> abline(h = 0, lty = 3) > abline(v = 0, lty = 3) > legend(6, 0.8, legend = c("ml", + "GKL0.25", "GKL0.5", "KL"), + col = 3:6, lty = rep(1, 4)) Return Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 36/ 50

> nodup <-!duplicated(flowers) > plot(flowers[nodup], wlepois$weights[nodup], + xlim = c(0, 26), xlab = "Flowers", + ylab = "Weights", pch = 19, + col = 2) Return Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 37/ 50

> plot(ecdf(flowers), main = "", + xlab = "Flowers", ylab = "Distribution functions") > plot(function(x) ppois(x, lambda = wlepois$lambda), + type = "s", add = TRUE, col = 2, + lty = 2) > legend(x = "bottomright", legend = c("empirical", + "Estimated"), col = 1:2, lty = 1:2, + inset = 0.1) Return Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 38/ 50

> flowersvariety <- sapply(split(factor(flowers, + 0:26), Variety), table) > layout(matrix(1:4, nrow = 2, byrow = TRUE)) > barplot(flowersvariety[, 1], xlab = "Flowers Variety=1", + ylim = c(0, 105)) > barplot(flowersvariety[, 2], xlab = "Flowers Variety=2", + ylim = c(0, 105)) > barplot(flowersvariety[, 3], xlab = "Flowers Variety=3", + ylim = c(0, 105)) > barplot(flowersvariety[, 4], xlab = "Flowers Variety=4", + ylim = c(0, 105)) Return Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 39/ 50

> fv <- split(1:length(flowers), + Variety) > nodup <-!duplicated(flowers) > layout(matrix(1:4, nrow = 2, byrow = TRUE)) > plot(flowers[fv[[1]]], wleoutvar$root1$wle.weights[fv[[1] + xlim = c(0, 26), xlab = "Flowers Variety=1", + ylab = "Weights", pch = 19, + col = 1, ylim = c(0, 1)) > points(flowers[nodup], wlepois$weights[nodup], + xlim = c(0, 26), col = 2) > plot(flowers[fv[[2]]], wleoutvar$root1$wle.weights[fv[[2] + xlim = c(0, 26), xlab = "Flowers Variety=2", + ylab = "Weights", pch = 19, + col = 1, ylim = c(0, 1)) > points(flowers[nodup], wlepois$weights[nodup], + xlim = c(0, 26), col = 2) > plot(flowers[fv[[3]]], wleoutvar$root1$wle.weights[fv[[3] Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 40/ 50

+ xlim = c(0, 26), xlab = "Flowers Variety=3", + ylab = "Weights", pch = 19, + col = 1, ylim = c(0, 1)) > points(flowers[nodup], wlepois$weights[nodup], + xlim = c(0, 26), col = 2) > plot(flowers[fv[[4]]], wleoutvar$root1$wle.weights[fv[[4] + xlim = c(0, 26), xlab = "Flowers Variety=4", + ylab = "Weights", pch = 19, + col = 1, ylim = c(0, 1)) > points(flowers[nodup], wlepois$weights[nodup], + xlim = c(0, 26), col = 2) Return Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 41/ 50

> fit.val <- sapply(split(wleoutvar$root1$fitted.values, + Variety), function(x) x[1]) > layout(matrix(1:4, nrow = 2, byrow = TRUE)) > plot(0:26, cumsum(flowersvariety[, + 1]/sum(flowersvariety[, 1])), + main = "", xlab = "Flowers Variety=1", + ylab = "Distribution functions", + type = "s") > plot(function(x) ppois(x, lambda = fit.val[1]), + type = "s", add = TRUE, col = 2, + lty = 2) > plot(0:26, cumsum(flowersvariety[, + 2]/sum(flowersvariety[, 2])), + main = "", xlab = "Flowers Variety=2", + ylab = "Distribution functions", + type = "s") > plot(function(x) ppois(x, lambda = fit.val[2]), Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 42/ 50

+ type = "s", add = TRUE, col = 2, + lty = 2) > plot(0:26, cumsum(flowersvariety[, + 3]/sum(flowersvariety[, 3])), + main = "", xlab = "Flowers Variety=3", + ylab = "Distribution functions", + type = "s") > plot(function(x) ppois(x, lambda = fit.val[3]), + type = "s", add = TRUE, col = 2, + lty = 2) > plot(0:26, cumsum(flowersvariety[, + 4]/sum(flowersvariety[, 4])), + main = "", xlab = "Flowers Variety=4", + ylab = "Distribution functions", + type = "s") > plot(function(x) ppois(x, lambda = fit.val[4]), + type = "s", add = TRUE, col = 2, + lty = 2) Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 43/ 50

Return Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 44/ 50

> fv <- split(1:length(flowers), + Variety) > layout(matrix(1:4, nrow = 2, byrow = TRUE)) > plot(flowers[fv[[1]]], wleoutvar$root1$wle.weights[fv[[1] + xlim = c(0, 26), xlab = "Flowers Variety=1", + ylab = "Weights", pch = 19, + col = 1, ylim = c(0, 1)) > points(flowers[fv[[1]]], wleoutvarasy$root1$wle.weights[f + col = 2) > plot(flowers[fv[[2]]], wleoutvar$root1$wle.weights[fv[[2] + xlim = c(0, 26), xlab = "Flowers Variety=2", + ylab = "Weights", pch = 19, + col = 1, ylim = c(0, 1)) > points(flowers[fv[[2]]], wleoutvarasy$root1$wle.weights[f + col = 2) > plot(flowers[fv[[3]]], wleoutvar$root1$wle.weights[fv[[3] + xlim = c(0, 26), xlab = "Flowers Variety=3", Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 45/ 50

+ ylab = "Weights", pch = 19, + col = 1, ylim = c(0, 1)) > points(flowers[fv[[3]]], wleoutvarasy$root1$wle.weights[f + col = 2) > plot(flowers[fv[[4]]], wleoutvar$root1$wle.weights[fv[[4] + xlim = c(0, 26), xlab = "Flowers Variety=4", + ylab = "Weights", pch = 19, + col = 1, ylim = c(0, 1)) > points(flowers[fv[[4]]], wleoutvarasy$root1$wle.weights[f + col = 2) Return Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 46/ 50

C. Agostinelli and M. Markatou. Test of hypotheses based on the weighted likelihood methodology. Statistica Sinica, 11(2): 499 514, 2001. D. R. Cox and E. J. Snell. A general definition of residuals. Journal of the Royal Statistical Society. Series B (Methodological), 30 (2):248 275, 1968. ISSN 00359246. URL http://www.jstor.org/stable/2984505. N. Cressie and T.R.C. Read. Multinomial goodness of fit tests. Journal of the Royal Statistical Society, Series B, 46:440 464, 1984. N. Cressie and T.R.C. Read. Cressie Read Statistic, pages 37 39. Wiley, 1988. In: Encyclopedia of Statistical Sciences, Supplementary Volume, edited by S. Kotz and N.L. Johnson. B.G. Lindsay. Efficiency versus robustness: The case for minimum hellinger distance and related methods. Annals of Statistics, 22: 1018 1114, 1994. Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 47/ 50

M. Markatou, A. Basu, and B.G. Lindsay. Weighted likelihood estimating equations: The continuous case. Technical report, Department of Statistics, Columbia University, New York, 1995. M. Markatou, A. Basu, and B.G. Lindsay. Weighted likelihood estimating equations with a bootstrap root search. Journal of the American Statistical Association, 93:740 750, 1998. C. Park and A. Basu. The generalized kullback-leibler divergence and robust inference. Journal of Statistical Computation and Simulation, 73(5):311 332, 2003. Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 48/ 50

These slides are prepared using L A TEX, beamer class and Sweave package in R. They are compiled with R ver. 2.6.0 running under OS darwin8.10.1 and package wle ver. 0.9-3. Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 49/ 50

Copyright 2008 Claudio Agostinelli. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts and with the Back-Cover Texts being as in (a) below. A copy of the license is included in the section entitled GNU Free Documentation License. The R code available in this document is released under the GNU General Public License. (a) The Back-Cover Texts is: You have freedom to copy, distribute and/or modify this document under the GNU Free Documentation License. You have freedom to copy, distribute and/or modify the R code available in this document under the GNU General Public License Claudio Agostinelli user!2008 Conference Dortmund ver. 1.0 12 August 2008 50/ 50