Simultaneity in the Multivariate Count Data Autoregressive Model

Similar documents
EXPLANATORYVARIABLES IN THE AR(1) COUNT DATAMODEL

Vector autoregressions, VAR

A Non-Parametric Approach of Heteroskedasticity Robust Estimation of Vector-Autoregressive (VAR) Models

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Missing dependent variables in panel data models

Improving GMM efficiency in dynamic models for panel data with mean stationarity

Title. Description. var intro Introduction to vector autoregressive models

Gravity Models, PPML Estimation and the Bias of the Robust Standard Errors

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

A Guide to Modern Econometric:

Katz Family of Distributions and Processes

Final Exam. Economics 835: Econometrics. Fall 2010

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

Asymptotic distribution of the Yule-Walker estimator for INAR(p) processes

Vector Auto-Regressive Models

VAR Models and Applications

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

2. Multivariate ARMA

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

Econometría 2: Análisis de series de Tiempo

Multivariate ARMA Processes

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions

Inference in VARs with Conditional Heteroskedasticity of Unknown Form

Generalized Autoregressive Score Models

Linear Regression. Junhui Qian. October 27, 2014

Financial Econometrics

Forecasting the Size Distribution of Financial Plants in Swedish Municipalities

Integer-Valued Moving Average Modelling of the Number of Transactions in Stocks

Economics Department LSE. Econometrics: Timeseries EXERCISE 1: SERIAL CORRELATION (ANALYTICAL)

Bernoulli Difference Time Series Models

Conditional Maximum Likelihood Estimation of the First-Order Spatial Non-Negative Integer-Valued Autoregressive (SINAR(1,1)) Model

ECON3327: Financial Econometrics, Spring 2016

Bootstrap tests of multiple inequality restrictions on variance ratios

AR, MA and ARMA models

A Note on Demand Estimation with Supply Information. in Non-Linear Models

ARIMA Modelling and Forecasting

You must continuously work on this project over the course of four weeks.

Econometric Analysis of Cross Section and Panel Data

GMM estimation of spatial panels

Testing an Autoregressive Structure in Binary Time Series Models

Econometrics Summary Algebraic and Statistical Preliminaries

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure

Lesson 17: Vector AutoRegressive Models

A Bootstrap Test for Conditional Symmetry

Short T Panels - Review

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Multivariate Time Series

The Size and Power of Four Tests for Detecting Autoregressive Conditional Heteroskedasticity in the Presence of Serial Correlation

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Multivariate Time Series: VAR(p) Processes and Models

DSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc.

Single Equation Linear GMM with Serially Correlated Moment Conditions

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Efficient Estimation of Dynamic Panel Data Models: Alternative Assumptions and Simplified Estimation

7. MULTIVARATE STATIONARY PROCESSES

The BLP Method of Demand Curve Estimation in Industrial Organization

at least 50 and preferably 100 observations should be available to build a proper model

Multivariate GARCH models.

2b Multivariate Time Series

Applied Microeconometrics (L5): Panel Data-Basics

1 Estimation of Persistent Dynamic Panel Data. Motivation

The regression model with one stochastic regressor (part II)

MFE Financial Econometrics 2018 Final Exam Model Solutions

Economics 472. Lecture 10. where we will refer to y t as a m-vector of endogenous variables, x t as a q-vector of exogenous variables,

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

ESTIMATION PROBLEMS IN MODELS WITH SPATIAL WEIGHTING MATRICES WHICH HAVE BLOCKS OF EQUAL ELEMENTS*

Econometrics of Panel Data

Testing Random Effects in Two-Way Spatial Panel Data Models

Diagnostic Test for GARCH Models Based on Absolute Residual Autocorrelations

Introduction to Estimation Methods for Time Series models. Lecture 1

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

A Few Special Distributions and Their Properties

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles

9. Multivariate Linear Time Series (II). MA6622, Ernesto Mordecki, CityU, HK, 2006.

Appendix of the paper: Are interest rate options important for the assessment of interest rate risk?

Cross-sectional space-time modeling using ARNN(p, n) processes

Consistency and Asymptotic Normality for Equilibrium Models with Partially Observed Outcome Variables

Flexible Estimation of Treatment Effect Parameters

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49

Measurement Errors and the Kalman Filter: A Unified Exposition

DSGE-Models. Limited Information Estimation General Method of Moments and Indirect Inference

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

Specification Test on Mixed Logit Models

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

Chapter 3 - Temporal processes

5: MULTIVARATE STATIONARY PROCESSES

Network Connectivity and Systematic Risk

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates

Estimation and Testing for Common Cycles

Autoregressive Moving Average (ARMA) Models and their Practical Applications

The PPP Hypothesis Revisited

Estimating and Identifying Vector Autoregressions Under Diagonality and Block Exogeneity Restrictions

Analyzing spatial autoregressive models using Stata

Lecture 8: Multivariate GARCH and Conditional Correlation Models

Transcription:

Simultaneity in the Multivariate Count Data Autoregressive Model Kurt Brännäs Department of Economics Umeå School of Business and Economics, Umeå University email: kurt.brannas@econ.umu.se Umeå Economic Studies 870, 2013 Abstract This short paper proposes a simultaneous equations model formulation for time series of count data. Some of the basic moment properties of the model are obtained. The inclusion of real valued exogenous variables is suggested to be through the parameters of the model. Some remarks on the application of the model to spatial data are made. Instrumental variable and generalized method of moments estimators of the structural form parameters are also discussed. Key Words Integer-valued, Spatial, INAR, Interdependence, Properties, Estimation JEL C35, C36, C39, C51 MSC2010 60G10, 62F10, 62H05, 62P20

1 Introduction This paper introduces a simultaneous equations model for a vector of jointly determined count (integer-valued) data endogenous variables. The model representation extends a strand of literature originating from McKenzie (1985) and Al-Osh and Alzaid (1987) on the univariate integer-valued autoregressive model of order one (INAR(1)) by allowing for simultaneous effects in a way related to classical econometric treatments for continuous variables. Simultaneity is here and elsewhere mostly taken to be a consequence of a low sampling frequency. With a higher sampling frequency, the causal effects are taken to be unidirectional, but at a lower frequency, say, at an annual frequency, the causal directions may not empirically be separable. Other previous extensions of the INAR(1) model are surveyed by McKenzie (2003) and later contributors to the research field. The extensions include allowing for higher order lags (e.g., Alzaid and Al-Osh, 1990), multivariate models (e.g., McKenzie, 1988), and the inclusion of exogenous variables (Brännäs, 1995). The INAR models have also been extended to include moving average terms. In the economics field we, e.g., find applied INAR(1) works reported for the pharmaceutical market (Rudholm, 2001), ownership of shares (Brännäs, 2013), and the entry and exit behavior of firms in a regional setting (Berglund and Brännäs, 2001). So is there a real need for a model extension of this type? The answer is not surprisingly yes, and it is here motivated by two areas of application finance and regional economics. In finance, integer-valued time series models have been found useful, at least, in an interpretational sense for variables such as the number of share owners in stocks, the number of traded stocks (trading volume) and the number of transactions. It is quite clear that one can expect simultaneity at all but the highest (intra-day) sampling frequencies simply by the fast information flows in the financial sector as well as by portfolio arguments. Empirically it appears plausible to expect either positive or negative effects between jointly determined variables within each of the sub-categories. In a regional economics context, there are many examples of count data variables, such as child births, accident frequencies, and the number of, say, crimes. Given annual data there are bound to be simultaneous effects that by reduced form modelling approaches are hidden in a covariance matrix for the error term vector, and where only total rather than the direct and indirect effects, that a simultaneous equations model offers, can be 1

estimated. Section 2 defines the proposed simultaneous model for integer-valued or count data, and Section 3 gives a few of its moment properties. In Section 4 we discuss some particulars that appear relevant for regional econometric applications of the model type. Section 5 gives some general comments on the estimation of unknown parameters. 2 The Model The structural form of a simultaneous integer-valued autoregressive model of order one (SINAR(1)) can be written as y t = A y t + B y t 1 + ɛ t, t = 1,..., T, (1) where the M M matrix A is of the general form 0 α 12 α 13 α 1M α 21 0 α 23 α 2M. A =................. αm 1,M α M1 α M,M 1 0. The endogenous y t = (y 1t,..., y Mt ) vector and its lags are all integer-valued. The model contains simultaneity or interdependence across these y it variables as reflected by the non-zero off-diagonal elements in the A matrix. The symbol represents binomial thinning which replaces standard multiplication in order for the model to generate integer-valued outcomes. For instance, for a scalar integer-valued random y variable the thinning operation is defined as α y = y i=1 u i, where {u i } y i=1 is an iid sequence of 0 1 random variables and Pr(u i ) = α. It follows that the integer-valued α y [0, y] and that for a given y, α y is binomially distributed. This motivates the label binomial thinning. The parameters in the A and B matrices are interpreted as probabilities, so that α ij [0, 1], β ij [0, 1], for all relevant i, j. Thinning operations are performed element by element, such that for the ith equation we have from (1) that y it = M M α ij y j,t + β ij y j,t 1 + ɛ it. j=1,j =i j=1 2

The different thinning operations are assumed independent and independent of the disturbance term ɛ t, for all t. 1 For the unobservable random ɛ t vector we assume E(ɛ t ) = λ 0 and E(ɛ t ɛ s) = Σ, for t = s and equal to 0 when t = s. A few key results for binomial thinning operations are given in the Appendix. Note that the model in (1) does not yet contain exogenous variables, see more on this below. By this specification there can only be contemporaneous positive effects between the y it, i..., M variables for the given specification of A. This is beneficial in guaranteeing that y it 0, for all i and t. Even if this condition is to hold true we may account for smaller negative effects by using a minus sign for some of the α ij in A. In such cases thinnings are to be interpreted as (α ij y jt ). Related to this, we may define A = I M A, with I M the M M identity matrix, and write the model as A y t = B y t 1 + ɛ t. This form reveals the closeness to structural VAR models. With up to 2 M 2 M potential parameters in the A and B matrices the general model in (1) is likely to be too rich in parameters for many practical purposes unless some additional restrictions are enforced, beyond the zeroes in the diagonal of A. These zeroes correspond to the normalization convention (cf. the ones in the A matrix). Moreover, the standard assumptions E(ɛ t ) = λ 0 and E(ɛ t ɛ s) = Σ, bring along M + M(M + 1)/2 additional and potentially free parameters. Note that in the important and parsimoniously parameterized special case of independently Poisson distributed ɛ it, i = 1,..., M, we have diag(σ) = λ and zeroes elsewhere in Σ. Various special cases of (1) have previously been considered in the literature. When A = 0, the model in (1) simplifies to a multivariate INAR(1) (e.g., Berglund and Brännäs, 1996; Pedeli and Karlis, 2013). If in addition, B = βi, i.e. the matrix is diagonal with a scalar parameter, and there is no dependence between the elements in the ɛ t vector and all elements have the same first two moments, so that, e.g., λ = λ 0 1 M, with λ 0 an unknown scalar parameter and 1 M = (1,..., 1), and Σ = σ 2 I M, the model simplifies to a replicated INAR(1) (e.g., Silva, 2005). With also B = 0 we then simply have M independent variables. 1 Brännäs and Hellström (2001) consider the consequences of relaxing such independence assumptions in the INAR(1) model. Most often only higher order moments will change when such assumptions are varied. 3

The model in (1) may be extended in, at least, two important directions. It is obviously possible to incorporate higher order lags of y t. Importantly, exogenous variables, that may vary over time and may be included in lagged form, can be incorporated through the parameters of the model. For α ij we may adopt, say, a logistic specification (cf. Brännäs, 1995) to get α ij,t = 1/(1 + exp(x ij,t θ x)), with θ x a k x dimensional parameter vector, and for β ij,t = 1/(1 + exp(w ij,t θ w)), there is a k w 1 parameter vector θ w. The exogenous variables are collected into the vectors x ij,t and w ij,t, respectively. For the λ vector we may let the elements be of an exponential form, i.e. λ it = exp(z it θ z), with θ z a k z 1 parameter vector. Such seemingly extending specifications are likely to reduce the number of unknown parameters rather than to increase the number, i.e. we expect that k x + k w + k z is much smaller than 2 M 2 M + M. Other ways of incorporating exogenous variables cannot in a general way guarantee that the endogenous y t variable vector comes out as integer-valued. While it is possible to incorporate exogenous variables in the y t vector in terms of a linear VAR/reduced form any such attempt comes at the cost of a non-interpretable corresponding structural form, at least, when the exogenous variables are continuous ones. Obviously, there can be mixed forms of continuous and integer-valued variables, but where the continuous variables effect the integer-valued ones through nonlineary specified parameters. The specification that comes closest to the classical, static simultaneous equations model has B = 0, A time invariant, and with exogenous variables included only through λ t, i.e. A y t = λ t + ɛ t with ɛ t = ɛ t λ t. If y t takes on large numbers λ t can potentially be specified as linear in z without violating the non negativity constraint in y t. The general simultaneous equations model with time dependent parameters based on exogenous variables is written y t = A t y t + B t y t 1 + λ t + ɛ t, t = 1,..., T. In Section 5, we make some more specific remarks on how the simultaneous model can be adapted for spatial econometrics applications. 4

3 Model Properties The structural form representation in (1) is not very useful when we wish to study the moment properties of the y t vector and when we are interested in the dynamic properties or in forecasting. For such purposes the reduced or the final forms are the typical points of departure. Here, we focus on the reduced form as this also serves as the basis for some widely employed estimators. To illustrate the difficulties that emerge in attempting to find such a reduced form in this setting, consider as an example the following simple structural form: By substitution we get y 1t = α 1 y 2t + β 1 y 1t 1 + ɛ 1t y 2t = α 2 y 1t + β 2 y 2t 1 + ɛ 2t. y 1t α 1 α 2 y 1t = α 1 β 2 y 2t 1 + β 1 y 1t 1 + ɛ 1t + α 1 ɛ 2t y 2t α 1 α 2 y 2t = α 2 β 1 y 1t 1 + β 2 y 2t 1 + ɛ 2t + α 2 ɛ 1t, but with the thinning operators rather than multiplications it is difficult to explicitly solve for y 1t and y 2t, respectively, if we are to respect distributional rules. Under the assumption of a Poisson distributed y it, the distribution of the left hand side y it α 1 α 2 y it = 1 y it α 1 α 2 y it of either equation remains Poisson with parameter (1 α 1 α 2 )E(y it ). The Poisson arises from Poisson assumptions for ɛ it and y i0, for i = 1, 2, and the right hand side will then be Poisson distributed, say, Poisson(η it ). Assuming equality in distribution of the left and right hand sides we obtain that y it is Poisson distributed with parameter (1 α 1 α 2 ) 1 η it. Unfortunately, there does not appear to be general results for the direct inversion of thinning operations, and this makes giving general reduced form results complicated. Hence, it appears difficult to obtain explicit distributional results and functional expressions for the y t vector except perhaps in some special cases. While different, the current specification then shares the property of a nonexplicit reduced form with general nonlinear simultaneous equation models. Still, some moment results can be obtained. We start by conditioning the model in (1) on past observations Y t 1 to get E(y t Y t 1 ) = AE(y t Y t 1 ) + By t 1 + λ. 5

The matrix A = I A matrix is assumed invertible in this complete system, so that the conditional expectation is E(y t Y t 1 ) = A 1 By t 1 + A 1 λ = Cy t 1 + λ. (2) This is the key part of the reduced form and the full reduced form can now be written y t = E(y t Y t 1 ) + ξ t = Cy t 1 + λ + ξ t, (3) where E(ξ t Y t 1 ) = 0. This way of writing the model is useful for model analysis, though it is not useful as a description of the data generating process as there is no guarantee that integer-valued y t can be generated. For this the ξ t needs to have an explicit form. The structural form is mostly seen as the ideal description of the data generating process with its explicit direct and indirect effect interpretations of the parameterization. The reduced form only gives total effects. The model in (3) is of a VAR(1) form, which makes model based analysis by analogy to the VAR literature straightforward. For stationarity we require that the largest eigenvalue of C is smaller than one in absolute value. Under stationarity the unconditional expected value is E(y t ) = (I C) 1 λ. Unfortunately, there does not seem to be a general result translating the eigenvalue condition into conditions on the underlying A and B matrices. Likewise, given the probability interpretations of the structural form parameters α ij and β ij there is no general guarantee that the elements of the C matrix can be interpreted as probabilities. Note also, that in a univariate setting c = 1 is to be rejected whenever an observed change y t y t 1 < 0 emerges. as: To obtain the conditional variance V(y t Y t 1 ) we rewrite the structural form in (1) y t = Ay t + By t 1 + λ + ɛ + (A y t Ay t ) + (B y t 1 By t 1 ) = Ay t + By t 1 + λ + ɛ t, (4) where the composite disturbance terms ɛ t = ɛ t λ and ɛ t both have zero means, but obviously the latter contains both current values and lags of y t. The corresponding reduced form is identical to the one in (10) with ξ t = (I A) 1 ɛ t = A 1 ɛ t. The important ingredient for the conditional variance is the one-step-ahead prediction error 6

ỹ t = y t E(y t Y t 1 ). We get ỹ t = Ay t + By t 1 + λ + ɛ t AE(y t Y t 1 ) By t 1 λ, = Aỹ t + ɛ t = A 1 ɛ t and therefore we get the conditional variance expression V(y t Y t 1 ) = E(ỹ t ỹ t Y t 1 ) = A 1 E(ɛ t ɛ t )(A ) 1 = A 1 Σ(A ) 1 after some tedious calculations. This corresponds to the covariance matrix for the reduced form of conventional simultaneous equations models. If the exogenous variables x t, w t and z t enter the A and B matrices as well as the λ vector, the conditioning needs to be extended to include the set of exogenous variables χ τ = {x τ, w τ, z τ τ t}. The only consequence for conditional moments is that the structural form parameter matrices and the λ vector are then to be time-indexed. 4 Remarks on Spatial Modelling In the spatial econometrics literature (e.g., Anselin, Florax and Rey, 2004) the spatial distance between observable units is an important ingredient. The simultaneous equation model in (1) can be adapted to a spatial context by letting the elements of the y t vector represent the same variable but measured in the different spatial units, such as municipalities or regions. Here, the spatial effects are perhaps best implicitly incorporated through matrices A and B and the λ vector. The spatial econometrics literature makes frequent use of a weight matrix W, that may be time-varying, to reflect the spatial distance and it can, e.g., have elements in the form of gravity models W ij = ω 0 M i M j /D ij, i = j and W ii = 0, for all i. Here, ω 0 is an unknown parameter, M i represents a measure of mass for unit i, and D ij is a measure of distance between units i and j. The M i and M j may be measured by wealth or some other economic size variable and will therefore likely be time-varying, implying timedependence also in W ij,t. As the distance increases W ij will typically become smaller and then implying a smaller spatial correlation. Since, the A and B matrices in (1) contain probabilities a logistic specification may 7

be usefully applied (Brännäs, 1995). Thus, thanks to symmetry α ij = 1/(1 + exp(α 0 + W ij )) = α ji, i = j, which holds also in the presence of time-dependence. The M diagonal elements of A are again all set equal to zero. The effect of this simple parametrization is to reduce M 2 M potentially unknown α ij to two unknown parameters, α 0 and ω 0. When ω 0 = 0 there is no spatial effect. If W contains no unknown parameter, as, e.g., when W ij is set equal to 1/D ij we could use α ij = 1/(1 + exp(α 0 + αw ij )) = α ji, i = j. If we were to also include explanatory variables in α ij the symmetry is likely to be lost. The probabilities of the B matrix may be specified in some analogous manner, and the λ vector as well as the covariance matrix Σ in some related ways. A direct use of the more conventional spatial econometrics analogues, such as, AW y or A Wy are less suitable than the illustrated A(W) y specification if we wish to adhere to the count data interpretation of y t. For small numbers of spatial units or when the spatial dependence is limited to the nearest neighbours it is not always necessary to include distance explicitly, but to instead use the α ij and β ij parameters directly. Such studies have up till now assumed α ij = 0. For instance, Brännäs and Brännäs (1998) used a binomial INAR(1) model for fish visits in a closed experimental tank system, and Ghodsi, Shitan and Bakouch (2012) studied the moment properties of a space-time INAR model of order (1,1). 5 Notes on Estimation The presence of y t variables in the right hand side of (1) implies a dependence with the disturbance term ɛ t. This dependence renders, e.g., the ordinary (conditional) least squares estimator inconsistent. In addition, previous sections have indicated that obtaining a distributionally welldefined reduced form is nontrivial in general. For that reason maximum likelihood estimation appears to be beyond reach in most cases. Instead, we first focus on directly estimating the unknown parameters based on the structural form in (1). We consider both single equation as well as joint estimation of the full system. We later discuss estimating the parameters in terms of the reduced form. The presence of the thinning operations in (1) complicates a direct use of consistent estimation approaches such as conventional instrumental variable (IV) or generalized 8

method of moments (GMM). Therefore, the rewritten simultaneous equation model in (4), i.e. y t = Ay t + By t 1 + λ + ɛ t, is a more convenient starting point. Hence, whether instrumental variables y t k, k 1, can or cannot be used to consistently estimate the parameters needs to be studied. Since, E[(A y t Ay t )y t k Y t k ] = 0 E[(B y t 1 By t 1 )y t k Y t k ] = 0 it follows that these instrumental variables are also unconditionally uncorrelated with the composite disturbance term ɛ t. In addition, it follows from the model that the instrumental variables are correlated with the right hand side variables. Therefore, IV or GMM estimators based on a single equation or on all equations jointly will be consistent and asymptotically, normally distributed. Since the number of instrumental variables will typically be large in this setting, the GMM estimator will be quite efficient. In fact, the current context is very close to the one treated in the literature on the estimation of the first order dynamic panel data model. When exogenous variables are nonlinearly included in the A t and B t matrices and in the λ t vector, the estimation of the parameter vector θ = (θ x, θ w, θ z) is to be made by some nonlinear version of the IV or GMM estimators. In this case, the exogenous variables and their lags can also be used as instrumental variables. Depending on the parametrization, there may be cases where systemwide rather than single equation estimation should be pursued. constant across equations as, e.g., in the spatial model. This is the case when some parameters are viewed as Next, we consider estimation based on the prediction error or alternatively on the reduced form. The ith equation of (1) is y it = M j=1,j =i α ij y jt + M j=1,j β ij y j,t 1 + ɛ it and simultaneity implies that the right hand side current y jt variables are correlated with ɛ it. The prediction error is ỹ it = y it E(y it Y t 1 ), which by specialization of (2) can be written ỹ it = y it M γ ij y j,t 1 κ i, j=1 where γ ij is the jth element in the ith row of (I A) 1 B and κ i is the ith element of the M 1 vector (I A) 1 λ. The prediction error has mean zero, but its variance is likely to be heteroskedastic. For the full system, the prediction error is ỹ t = y t (I A) 1 (By t 1 λ). 9

Whether we consider limited or full information estimation methods, nonlinearity is a key ingredient of any least squares estimator based on this prediction error. Obviously, it will also be important to consider the identifiability of individual equations. 10

Appendix: Some results for binomial thinning operators The binomial thinning operator is defined as α y = y u i, i=1 where y is an integer-valued random variable, and the {u i } sequence is made up of 0 1 iid random variables. The probability Pr(u i = 1) = α is constant across i. Hence, the integer-valued α y [0, y], and for a given y it follows a binomial distribution. If, e.g., y is Poisson distributed the distribution of α y is Poisson distributed as well. Results are most easily obtained using the probability generating function. For the case of a given y we have E(t α y y) = E(t u 1+u 2 +...+u y y) = E(t u ) y = [t 0 Pr(u = 0) + t 1 Pr(u = 1)]y = [1 α + αt]y. The conditional expectation is then E(α y y) = E(t α y y)/ t t=1 = αy and unconditionally we get E(α y) = E y [E(α y y)] = αe(y). The corresponding second order moments are E[(α y) 2 y] = αy + α 2 y(y 1) and unconditionally E[(α y) 2 ] = α 2 E(y 2 ) + α(1 α)e(y). From these results it follows that the conditional variance is V(α y y) = α(1 α)y and that the unconditional variance is V(α y) = α(1 α)e(y) + α 2 V(y). In addition, E[(α y i )(α y j ) y i, y j ] = α 2 y i y j and E(α y i )(α y j ) = α 2 E(y i y j ). Among other useful results we mention the following equalities in distribution: α (β y) = (αβ) y; α β y = β α y and α (y + x) = α y + α x. Note that α y + β y and (α + β) y are not equal in distribution. 11

References Al-Osh, M.A. and Alzaid, A.A. (1987). First Order Integer-valued Autoregressive INAR(1) Process. Journal of Time Series Analysis 8, 261-275. Alzaid, A.A. and Al-Osh, M.A. (1990). An Integer-Valued p-order Autoregressive Structure (INAR(p)). Journal of Applied Probability 27, 314-324. Anselin, L., Florax, R.J.G.M. and Rey, S.J. (eds) (2004). Advances in Spatial Econometrics. Springer, Berlin. Berglund, E. and Brännäs, K. (1996). Entry and Exit of Plants: A Study Based on Swedish Panel Count Data for Municipalities. In Yearbook of the Finnish Statistical Society 1995, 95-111, Helsinki. Berglund, E. and Brännäs, K. (2001). Plants Entry and Exit in Swedish Municipalities. Annals of Regional Science 35, 431-448. Brännäs, K. (1995). Explanatory Variables in the AR(1) Count Data Model. Umeå Economic Studies 381. Brännäs, K. (2013). The Number of Shareholders: Time Series Modelling and Some Empirical Results. Umeå Economic Studies 860. Brännäs, E. and Brännäs, K. (1998). A Model of Patch Visit Behaviour in Fish. Biometrical Journal 40, 717-724. Brännäs, K. and Hellström, J. (2001). Generalized Integer-Valued Autoregression. Econometric Reviews 20, 425-443. Ghodsi, A., Shitan, M. and Bakouch, H. (2012). A First-Order Spatial Integer-Valued Autoregressive SINAR(1,1) Model. Communications in Statistics - Theory and Methods 41, 2773-2787. McKenzie, E. (1985). Some Simple Models for Discrete Variate Time Series. Water Resources Bulletin 21, 645-650. McKenzie, E. (1988). Some ARMA models for dependent sequences of Poisson counts. Advances in Applied Probability 20, 822-835. McKenzie, E. (2003) Discrete Variate Time Series. In Handbook of Statistics, Volume 21, Shanbhag, D.N. and Rao, C.R. (eds), Elsevier, Amsterdam, pp. 573-606. Pedeli, X. and Karlis, D. (2013). Some properties of multivariate INAR(1) processes. Computational Statistics & Data Analysis 67, 213Ð225. Rudholm, N. (2001). Entry and the number of firms in the Swedish pharmaceuticals 12

market. Review of Industrial Organization 19, 351-364. Silva, I. (2005). Contributions to the Analysis of Discrete-Valued Time Series. PhD thesis. Department of Applied Mathematics, University of Porto. (Published as Analysis of discrete-valued time series. LAP Lambert Academic Publishing, 2012). 13