Two-step centered spatio-temporal auto-logistic regression model

Similar documents
Easter bracelets for years

Analysis of Boyer and Moore s MJRTY algorithm

Methylation-associated PHOX2B gene silencing is a rare event in human neuroblastoma.

Analyzing large-scale spike trains data with spatio-temporal constraints

Evolution of the cooperation and consequences of a decrease in plant diversity on the root symbiont diversity

A new simple recursive algorithm for finding prime numbers using Rosser s theorem

Thomas Lugand. To cite this version: HAL Id: tel

Completeness of the Tree System for Propositional Classical Logic

Dispersion relation results for VCS at JLab

Passerelle entre les arts : la sculpture sonore

On size, radius and minimum degree

Multiple sensor fault detection in heat exchanger system

Case report on the article Water nanoelectrolysis: A simple model, Journal of Applied Physics (2017) 122,

Smart Bolometer: Toward Monolithic Bolometer with Smart Functions

Can we reduce health inequalities? An analysis of the English strategy ( )

Best linear unbiased prediction when error vector is correlated with other random vectors in the model

Full-order observers for linear systems with unknown inputs

On Symmetric Norm Inequalities And Hermitian Block-Matrices

Soundness of the System of Semantic Trees for Classical Logic based on Fitting and Smullyan

Polychotomous regression : application to landcover prediction

The FLRW cosmological model revisited: relation of the local time with th e local curvature and consequences on the Heisenberg uncertainty principle

L institution sportive : rêve et illusion

A review on Sliced Inverse Regression

Heavy Metals - What to do now: To use or not to use?

On the simultaneous stabilization of three or more plants

Vibro-acoustic simulation of a car window

On production costs in vertical differentiation models

TECHNICAL REPORT NO. 1078R. Modeling Spatial-Temporal Binary Data Using Markov Random Fields

b-chromatic number of cacti

Unbiased minimum variance estimation for systems with unknown exogenous inputs

Classification of high dimensional data: High Dimensional Discriminant Analysis

Using multitable techniques for assessing Phytoplankton Structure and Succession in the Reservoir Marne (Seine Catchment Area, France)

Methods for territorial intelligence.

Some explanations about the IWLS algorithm to fit generalized linear models

A simple kinetic equation of swarm formation: blow up and global existence

approximation results for the Traveling Salesman and related Problems

Impedance Transmission Conditions for the Electric Potential across a Highly Conductive Casing

From Unstructured 3D Point Clouds to Structured Knowledge - A Semantics Approach

Influence of network metrics in urban simulation: introducing accessibility in graph-cellular automata.

On infinite permutations

Stickelberger s congruences for absolute norms of relative discriminants

A Simple Proof of P versus NP

Posterior Covariance vs. Analysis Error Covariance in Data Assimilation

Exogenous input estimation in Electronic Power Steering (EPS) systems

A CONDITION-BASED MAINTENANCE MODEL FOR AVAILABILITY OPTIMIZATION FOR STOCHASTIC DEGRADING SYSTEMS

Factorisation of RSA-704 with CADO-NFS

Interactions of an eddy current sensor and a multilayered structure

Reduced Models (and control) of in-situ decontamination of large water resources

Impairment of Shooting Performance by Background Complexity and Motion

On Symmetric Norm Inequalities And Hermitian Block-Matrices

The beam-gas method for luminosity measurement at LHCb

The status of VIRGO. To cite this version: HAL Id: in2p

A new approach of the concept of prime number

Finite Volume for Fusion Simulations

RHEOLOGICAL INTERPRETATION OF RAYLEIGH DAMPING

DEM modeling of penetration test in static and dynamic conditions

Negative results on acyclic improper colorings

On Politics and Argumentation

Parallel Repetition of entangled games on the uniform distribution

3D-CE5.h: Merge candidate list for disparity compensated prediction

Some approaches to modeling of the effective properties for thermoelastic composites

On Poincare-Wirtinger inequalities in spaces of functions of bounded variation

Spatial representativeness of an air quality monitoring station. Application to NO2 in urban areas

Sound intensity as a function of sound insulation partition

Solving an integrated Job-Shop problem with human resource constraints

Solving the neutron slowing down equation

Linear Quadratic Zero-Sum Two-Person Differential Games

Low frequency resolvent estimates for long range perturbations of the Euclidean Laplacian

Cutwidth and degeneracy of graphs

The Windy Postman Problem on Series-Parallel Graphs

Widely Linear Estimation with Complex Data

Climbing discrepancy search for flowshop and jobshop scheduling with time-lags

Territorial Intelligence and Innovation for the Socio-Ecological Transition

On Newton-Raphson iteration for multiplicative inverses modulo prime powers

The Mahler measure of trinomials of height 1

There are infinitely many twin primes 30n+11 and 30n+13, 30n+17 and 30n+19, 30n+29 and 30n+31

SOLAR RADIATION ESTIMATION AND PREDICTION USING MEASURED AND PREDICTED AEROSOL OPTICAL DEPTH

Tropical Graph Signal Processing

Entropies and fractal dimensions

On the longest path in a recursively partitionable graph

Understanding SVM (and associated kernel machines) through the development of a Matlab toolbox

Higgs searches at L3

Towards an active anechoic room

Hook lengths and shifted parts of partitions

On path partitions of the divisor graph

Christian Défarge, Pascale Gautret, Joachim Reitner, Jean Trichet. HAL Id: insu

Stochastic invariances and Lamperti transformations for Stochastic Processes

Solution to Sylvester equation associated to linear descriptor systems

Near-Earth Asteroids Orbit Propagation with Gaia Observations

Influence of a Rough Thin Layer on the Potential

Prompt Photon Production in p-a Collisions at LHC and the Extraction of Gluon Shadowing

Urban noise maps in a GIS

Numerical Modeling of Eddy Current Nondestructive Evaluation of Ferromagnetic Tubes via an Integral. Equation Approach

Impulse response measurement of ultrasonic transducers

A Simple Model for Cavitation with Non-condensable Gases

Basic concepts and models in continuum damage mechanics

Nonlocal computational methods applied to composites structures

AC Transport Losses Calculation in a Bi-2223 Current Lead Using Thermal Coupling With an Analytical Formula

A Study of the Regular Pentagon with a Classic Geometric Approach

How to make R, PostGIS and QGis cooperate for statistical modelling duties: a case study on hedonic regressions

Transcription:

Two-step centered spatio-temporal auto-logistic regression model Anne Gégout-Petit, Shuxian Li To cite this version: Anne Gégout-Petit, Shuxian Li. Two-step centered spatio-temporal auto-logistic regression model. SADA416, Applied Statistics for Development in Africa, Nov 2016, Cotonou, Benin. <hal-01394868> HAL Id: hal-01394868 https://hal.inria.fr/hal-01394868 Submitted on 10 Nov 2016 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Two-step centered spatio-temporal auto-logistic regression model Anne Gégout-Petit 1 & Shuxian Li 2 1 Université de Lorraine, Institut Elie Cartan, Faculté des Sciences et Technologies, B.P. 70239 54506 VANDOEUVRE LES NANCY CEDEX 2 UMR SAVE INRA, Bordeaux Sciences Agro Abstract. In our study, we focus on spatio-temporal causal auto-logistic model and proposed a two-step-centered parametrization version of it. We study the existence of the joint law according to the conditional marginals. The simulation study show that the one-step model can not reflect the temporal data structure when both spatial and temporal dependance are strong, while for the two-step model, there is an adequate agreement between the data structure and the temporal large-scale structure. The results of estimation for simulated lattices over years were performed by expectation-maximization (EM) pseudo-likelihood in two stages. They show that under the two-step centered parametrization, the inference for parameters of both temporal and spatial regressions are accurate, while under one-step centered parametrization their inference are always conflicting. Keywords: Spatio-temporal modelling; autologistic model; Markov random field; largescale, small-scale model structure; binary response. 1 Introduction Since spatial and spatio-temporal binary data are commonly existing in nature, the models on such data had drawn large interests of scientists from various fields such as ecology, epidemiology and image analysis, during past years. 40 years ago, Besag [2] firstly proposed an auto-logistic model for spatial binary data, assuming a simple dependence on surrounding neighbors. This model was proved to be very useful and then was extended in order to integrate the regression on the covariates [6,7] and temporal dependance as in [9]. However, the non-centered parametrization of the auto-logistic regression models present parameter interpretation difficulties across varying levels of statistical dependence. This problem has been first pointed out by Caragea [3] who proposed a centered parametrization for spatial auto-logistic regression model to overcome this difficulty. Wang [8] developed a centered spatiotemporal auto-logistic regression model but (probably due to constrain about existence of the joint law) this model depends both on the past and future and is not well adapted to real world problems modelling. In this paper, we propose a two-step centered auto-logistic causal model and will show its advantage over the existing centered parametrization for the auto-logistic models only depending on the past. We tackle the existence of the joint distribution but present the expectationmaximization pseudo-likelihood estimation in two steps. The paper is organized as follows: first of all, after a brief introduction about auto-logistic models and the interest to center them in some cases, we will present the two-step centered auto-logistic model. Then we show several 1

simulations to compare the spatio-temporal auto-logistic model depending on past under onestep and two-step centered parametrization. Afterward we present the results of estimation via maximization of the pseudo-likelihood. 2 Centered autologistic models We are interested by a set of variables Z = {Z i : i = 1,..., n} indexed by a lattice with n points denoted by S = {1,..., n}. A spatial auto-model for Z is given by the n marginal conditional laws of the Z i s given the other variables {Z j, j i]}. If Z i is binary, we speak about auto-logistic model and we have to model the p i = P(Z i = 1 Z j, j i) for 1 i n. Hammesley-Clifford theorem (see for instance [2] or [4]) makes the link between auto-model and Markov Random Field and gives some conditions for the existence of the joint law. If probability p i does not depend on the whole set of variables {Z j, j i]} but only on the neighborhood of i denoted by N i, we have p i = P(Z i = 1 Z j, j N i ) and the N i s are link to the structure of cliques of the Markov Random Field. In 1997, Gumpertz followed by many other papers, introduce covariates in the model leading to p i = exp(xt i β + β ij Z j ) 1 + exp(x T i β + β ij Z j ) logit(p i ) = X T i β + β ij Z j j N largescale i smallscale where large-scale component is the overall level of a process given also by covariates and the small-scale component take into account high-order parts of the data structure like variances or covariances. Owever, Caragea [3] made the remark that if we put c i = exp(xt i β) 1 + exp(x T i β) (independent case), then log[ p i/(1 p i ) c i /(1 c i ) ] = β ij Z j, and odds of Z i = 1 relative to the independence model increases for any nonzero neighbors, and can never decrease. It is not reasonable if most of neighbors are zeros and could bias the realizations towards 1. For this reason, they proposed the centered model (1) in 2009 that allows the p i s to increase or decrease around the mean behavior (large scale) of the model. logit(p i ) = X T i β + ρ exp(x T j (Z j β) 1 + exp(x T (1) j β)) We study now the opportunity to propose an autologistic spatio-temporal model that is centered in two steps, the first one according to the current covariates and the second one according to the regression on the past. It is given by the marginal conditional laws [Z i,t Z j,t ; j i, Z t 1, X t ] = [Z i,t Z j,t : j N i, Z i,t 1, X t ] and where the status at point i is influenced by the past at point i and the the current neighborhood status of N i. More precisely: 2

( logit(p i,t ) = X T i,tβ + ρ 2 Z i,t 1 + ρ 1 Z j,t exp(xt j,t β + ρ ) 2Z ij,t 1 ) 1 + exp(x T reg covariates reg past i,t β + ρ 2Z j,t 1 ) auto-reg centered To prove the existence of the joint law of the Z i,t, we have to consider the process as a Markov chain of Markov field. Framework of Guyon-Hardouin [5] is useful for this and allows us to express the transition probabilities from value y to value z according to the covariates at time t. They are given by P(y, z Ft X ) = C(y, Ft X ) exp( z i X T i,tβ + ρ 1 1 j Ni z i z j + exp(x T j,t z i (ρ 2 y i ρ β + ρ 2y j ) 1 1 + exp(x T i S i S j N i j,t β + ρ )) 2y j ) instantaneous past 3 Simulation and inference We will use Perfect sampling preceded by a Gibbs sampling step for the burn-in to simulate data and compare the large scale structure measured by the mean overall the points of the lattice in model (2 ) and the following one-step model ( logit(p i,t ) = X T i,tβ + ρ 2 Z i,t 1 + ρ 1 Z j,t exp(xt j,t β) 1 + exp(x T i,t β) Simulation were performed on a 20 20 lattice and we find that this centering is fairly accurate to capture the small-scale effect. For inference, because of the intractable constant in joint distribution, we deal with the pseudo-likelihood and a Expectation Maximisation algorithm to infer the parameters. We give in Tables 1 and 2 the result of inference and see that the two-step centered model is more suitable for estimation. ) (2) (3) β 1 β 2 ρ 1 ρ 2 mean -1.7917(-1.5) 0.2322(0.5) 0.1941(0.8) 0.8400(0.8) variance 0.0221 0.0013 0.0421 0.0159 Table 1: One-step centered model simulation estimated by PL, (true values in the brackets). β 1 β 2 ρ 1 ρ 2 mean -1.4835(-1.5) 0.5167(0.5) 0.7858(0.8) 0.7994(0.8) variance 0.0117 0.0057 0.0061 0.0084 Table 2: two-step centered model simulation estimated by PL, (true values in the brackets). 3

4 Discussion In this chapter, we developed a two-step centered spatio-temporal auto-logistic regression model to fit the binary spatio-temporal data over a lattice with exogenous covariates. The marginal data structure of the model under such centered parametrization can better reflect the largescale structure, and in this way, the correct interpretation of the parameters is ensured. The objective of the work is to use it for the analysis of epidemiological data about esca disease for vines. But lattice in this context are very large (around 2000 vines over 12 years).we have to develop more adapted methods of inference. Here we benefited the easy calculation of pseudo-likelihood function to obtain a numerical estimation of the model which is statistically imperfect. In perspective, several methods based on approximation likelihood could be tried to estimate our models. Recently, Bee [1] proposed a AMLE (Approximate Maximize Likelihood Estimation) which is very interesting for our case. The advantage of AMLE method is since the only requirement of approximating maximum likelihood estimates is the ability to simulate the model to be estimated. It does not need to evaluate the likelihood function, only sufficient statistics but no joint distribution need to be clarified. When the inference of this model will be improved and adapted to large data set, we will apply the model to esca data in order to test the dependence on neighbors and the effects of environmental covariates. Bibliographie [1] Bee, M., Espa, G., and Giuliani, D. (2015). Approximate maximum likelihood estimation of the autologistic model. Computational Statistics & Data Analysis, 84:14-26. [2] Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B (Methodological), 192-236 [3] Caragea, P. C. and Kaiser, M. S. (2009). Autologistic models with interpretable parameters. Journal of agricultural, biological, and environmental statistics, 14(3):281-300 [4] Gaetan, C. and Guyon, X. (2008). Modélisation et statistique spatiales, volume 63. Springer. [5] Guyon, X. and Hardouin, C. (2002). Markov chain markov field dynamics: models and statistics. Statistics: A Journal of Theoretical and Applied Statistics, 36(4):339-363. [6] Gumpertz, M. L., Graham, J. M., and Ristaino, J. B. (1997). Autologistic model of spatial pattern of phytophthora epidemic in bell pepper: effects of soil variables on disease presence. Journal of Agricultural, Biological, and Environmental Statistics, 131-156 [7] Huffer, F. W. and Wu, H. (1998). Markov chain monte carlo for autologistic regression models with application to the distribution of plant species. Biometrics, 509-524 [8] Wang, Z. and Zheng, Y. (2013). Analysis of binary data via a centered spatial-temporal autologistic regression model. Environmental and Ecological Statistics, 20(1): 37-57. [9] Zhu, J., Huang, H.-C., andwu, J. (2005). Modeling spatial-temporal binary data using markov random fields. Journal of Agricultural, Biological, and Environmental Statistics, 10(2): 212-225. 4