A GENERALIZED ESTIMATING EQUATIONS APPROACH FOR ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY

Size: px
Start display at page:

Download "A GENERALIZED ESTIMATING EQUATIONS APPROACH FOR ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY"

Transcription

1 Aust. N. Z. J. Statist. 42(2), 2000, A GENERALIZED ESTIMATING EQUATIONS APPROACH FOR ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY JANET BISHOP 1,DAVID DIE 1 AND YOU-GAN WANG 2,3 CSIRO Marine Research and CSIRO Mathematical & Information Sciences Summary The article describes a generalized estimating equations approach that was used to investigate the impact of technology on vessel performance in a trawl fishery during , while accounting for spatial and temporal correlations in the catch effort data. Robust estimation of parameters in the presence of several levels of clustering depended more on the choice of cluster definition than on the choice of correlation structure within the cluster. Models with smaller cluster sizes produced stable results, while models with larger cluster sizes, that may have had complex within-cluster correlation structures and that had withincluster covariates, produced estimates sensitive to the correlation structure. The preferred model arising from this dataset assumed that catches from a vessel were correlated in the same years and the same areas, but independent in different years and areas. The model that assumed catches from a vessel were correlated in all years and areas, equivalent to a random effects term for vessel, produced spurious results. This was an unexpected finding that highlighted the need to adopt a systematic strategy for modelling. The article proposes a modelling strategy of selecting the best cluster definition first, and the working correlation structure (within clusters) second. The article discusses the selection and interpretation of the model in the light of background knowledge of the data and utility of the model, and the potential for this modelling approach to apply in similar statistical situations. Key words: covariance; fishing power; generalized estimating equations; overdispersion; Poisson; spatial and temporal correlations. 1. Introduction In this paper, we show some practical complications in using the generalized estimating equations (GEE) approach to analyse data with spatial and temporal correlations, and suggest a strategy for resolving some of these problems, illustrating it by analysis of data from a fisheries application. The aim of the analysis is to estimate the extent to which new technology has increased the catching power of a fishing fleet. This topic is important for fishery managers, who generally share a common objective: to ensure the long-term viability of the fishery. To achieve this objective, many fisheries are managed by so-called input controls such as restrictions on Received November 1998; revised August 1999; accepted November Author to whom correspondence should be addressed. 1 CSIRO Division of Marine Research, PO Box 120, Cleveland, QLD 4163, Australia. janet.bishop@marine.csiro.au 2 CSIRO Mathematical & Information Sciences, PO Box 120, Cleveland, QLD 4163, Australia. 3 Dept Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Boston, MA 02115, USA. Acknowledgments. The authors thank the fishers of the Northern Prawn Fishery who provided their catch records, and Carolyn M. Robins (Bureau of Rural Sciences, Canberra, Australia) and Margot Sachse (Australian Fisheries Management Authority, Canberra) for establishing a validated dataset. The authors are grateful for comments on earlier drafts from Richard Morton (CSIRO CMIS, Canberra) and Andre Punt (CSIRO Marine Research, Hobart) and three anonymous reviewers, that influenced the direction of the final version. This project was supported by the Australian Fish Management Authority Research Fund.. Published by Blackwell Publishers Ltd, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden MA 02148, USA

2 160 JANET BISHOP, DAVID DIE AND YOU-GAN WANG licences, fishing seasons, gear or vessels. The need for, and effectiveness of, such restrictions are shown by several analyses among which is the topic of the current paper: fishing power also known as catching power or the effectiveness of the time spent fishing. Accurate estimates of fishing power, and the monitoring of changes in the fishing power of the fleet over time are very important for supporting good decision-making in input-controlled fisheries. To estimate fishing power, catch is modelled as a function of days spent fishing (fishing effort), abundance of the stock, and vessel characteristics including the presence and configuration of new technology on board (Hilborn & Walters, 1992 pp ; Robins, Wang & Die, 1998). In this paper we analyse factors that contribute to fishing power in the tiger prawn component of Australia s Northern Prawn Fishery. This fishery extends across northern Australia for a distance of nearly 1000 nautical miles (Robins et al., 1998). Commercial logbook records of daily catch weights and location of catch describe 98% of the catch of tiger prawns landed by the 243 trawlers in the fishery since the late 1980s. These data are typical of fisheries. They are repeated observations of catch rates from the same vessels fishing in various statistical areas. The data are catch weights of prawns, and we assume they arise from a Poisson process with overdispersion. They are observational not experimental data. Biological characteristics of the stocks lead to seasonal abundance fluctuations that affect catch rates, and that vary spatially at a spatial scale finer than those of any statistical areas. Any of these characteristics of the data may introduce spatial or temporal correlations into the error structure unless accounted for in analyses. Statistical analyses that ignore correlated errors can increase the risk of invalid scientific inferences (Cressie, 1993 Chapter 1.3; Diggle, Liang & Zeger, 1994 Chapter 1). Therefore, we reasoned that intercorrelations in the data should be accounted for in the analysis of fishing power. Statistical methods that account for such correlations in analyses of binary, count and ordinal data have been developed over the last decade (Liang & Zeger, 1986; Zeger & Liang, 1986, 1992; Diggle et al., 1994). We use GEE methods (Liang & Zeger, 1986; Zeger & Liang, 1986) to investigate the impact of new technology in the fishery, and to take account of any spatial and temporal correlations in catch rates. The GEE methods offer several advantages; however, to gain these advantages certain conditions apply. Recently, Balemi & Lee (1999) obtained finite-sample expansions of bias and efficiency of estimates from the GEE approach with mis-specified working correlation matrices. Their main findings are (i) bias and efficiency depend on the combination of a number of characteristics of the data: cluster size, intra-cluster correlation of covariates, intra-cluster correlation of response variable, variability of cluster size, and the relative response association, and (ii) the performance of GEE is excellent for moderate degrees of response correlation and small clusters. The literature on GEE does not yet include comprehensive rules or recommendations for modelling strategies, such as are available for generalized linear modelling (GLM), although we acknowledge this situation is changing rapidly as the literature on theory and applications of GEE expands. Considerable time and effort can be spent on modelling correlation structures with the GEE approach, and beforehand there is no way to know with any certainty how much advantage can be obtained from a GEE model for a particular dataset. We are interested in a practical strategy for modelling to limit the amount of effort required in developing the model while aiming to use the most appropriate one for the analysis. Section 2 gives some background on GEE methods, followed by a description of the problem and the data. Section 3 describes the statistical model and the modelling strategy.

3 ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY 161 Section 4 reports the results. Finally, Section 5 discusses the model selection and interpretation in the light of background knowledge of the data and utility of the model, and the appropriateness of our modelling approach to tackle similar statistical applications. 2. Background 2.1. GEE Let Y i = (Y i1,...,y ini ) T represent the vector of n i measurements on the ith cluster and X i = (x i1,...,x ip ) T be the p vectors of independent variables (the p explanatory variables) on the ith cluster. The GEE method of Liang & Zeger (1986) allows regression modelling of data that are not multivariate normal, when the data can be modelled as a GLM except for the correlation among responses. Only the marginal distributions are modelled parametrically. The mean vector of Y i is assumed to be µ i = h 1 (X i β) where h is a link function. The estimating equation for β is n ( µi ) TV 1 i (Y i µ i ) = 0, β i=1 where V i is a working covariance matrix for Y i, given by V i = φa 1/2 i R i A 1/2 i, where φ is a scale parameter following the quasi-likelihood approach (where the variance of Y ij is expressed as a known function of the expectation µ ij, that is, var(y ij ) = φg(µ ij ), where φ is treated as a nuisance parameter and the focus of quasi-likelihood is on methods for inference about β ); A i is an n i n i diagonal matrix with g(µ ij ) as the j th diagonal element; R i is a n i n i working correlation matrix for Y i. Here, R i is referred to as a working correlation matrix because it does not have to be correctly specified. An approximate correlation structure is assumed. We often specify R i in terms of a vector of parameters which is denoted by ρ, in which case we write R i (ρ) to emphasize this dependence. Liang & Zeger (1986 pp ) describe several possible choices for the working correlation matrix R i, ranging from the simple assumption that repeated observations are uncorrelated, through to the most complex possibility that the n(n 1)/2 correlations vary. The selection of working correlation structures has been discussed or illustrated in the statistical literature (Lipsitz, Kim & Zhao, 1994; Pepe & Anderson, 1994; Albert & McShane, 1995; Lumley, 1996; Chen & Ahn, 1997; O Hara Hines, 1997; Sutradhar & Das, 1999). In the GEE approach the specified model is fitted by computing an initial estimate of β, for example with an ordinary generalized linear model assuming independence, estimating the dispersion parameter φ from the standardized residuals, and computing the working correlations R based on the standardized residuals and the assumed structure of R. Then an estimate of the covariance is computed, and the estimate of β is updated. These steps are iterated until convergence. Liang & Zeger (1986) showed that the estimator for ˆβ is consistent and asymptotically normal and its variance can be consistently estimated by a sandwich type estimator. The estimator M = I 1 0 I 1 I 1 0 is called the empirical or robust estimator of the covariance matrix of ˆβ, where I 1 = n i=1 ( µi β ) TV 1 i cov(y i )Vi 1 µ i β.

4 162 JANET BISHOP, DAVID DIE AND YOU-GAN WANG It has the property of being a consistent estimator of the covariance matrix of ˆβ, even if the working correlation matrix is mis-specified, that is, if cov(y i ) V i, whereas the model-based estimator of cov( ˆβ) is consistent only if both the mean model and the correlation matrix are correctly specified. The model-based estimator of cov( ˆβ) is cov M ( ˆβ) = I 1 0 where I 0 = n i=1 ( µi β ) TV 1 µ i i β. A well-known and important property (robust property) of the GEE approach is that the estimates of mean parameters remain consistent even if the correlation or the covariance structure is mis-specified. Intuitively, careful modelling of the correlation structure would improve the statistical inference (Albert & McShane, 1995). Even higher moments can be incorporated into estimation using the generalized version of GEE, GEE2 (Liang, Zeger & Qaqish, 1992). However, bias can arise and efficiency can be lost if higher moment assumptions are incorrect. So to extract full information from the data, as robustly and efficiently as possible, safe modelling is important. In fact, in many practical cases, cluster is not naturally defined and very different estimates may be produced when different sensible working matrices are used. It is not an easy task to pick the right one. In our case, year, month, vessel and spatial area variations have to be taken into account. The cluster level is not clear. If we allow all the possible combinations of factors to be different clusters, we have the naïve independence model the simplest model. On the other hand, if we assume that all the observations are correlated, we end up with only one cluster the most complicated model (Lumley & Heagerty, 1999). Another idea is simply to use the independence model, ignoring data dependence completely (Pepe & Anderson, 1994; Sutradhar & Das, 1999). Developments along these lines are useful for certain types of data. However, it is not well known how much loss in estimation efficiency occurs in complicated cluster settings. One should not give up modelling correlation structure solely to secure regression estimation robustness. In fact, ignoring the correlations between observations is in many cases simply an unacceptable sacrifice of information (Fitzmaurice, 1995). One unsatisfactory aspect of the GEE approach is a lack of goodness of fit statistics for variable selection or correlation structure selection. For example, the goodness of fit statistics produced by the PROC GENMOD procedure recently developed in SAS (SAS Institute, Inc., 1997) are all derived from the independent model and hence are invalid for clustered data. Therefore, no guidelines are available for comparing various working correlation structures The impact of new technology factors on fishing power The main aim of this analysis is to achieve accurate parameter estimates for the impact of the new technology factors on fishing power, especially for the technology factors that are potential targets for management restrictions. Modelling of the variance components or correlation structure is of little direct interest in itself. Nevertheless, to interpret the results, it is necessary to understand where the inter-correlations in this dataset might come from. The possible sources of inter-correlations in the data are described here, along with the model of fishing power, considering in turn each of the main factors that predict fishing power. The main categories of factors that affect catch per day are abundance of the stock, and fishing

5 ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY 163 power. Fishing power is determined by swept area (the area or volume covered by the trawl gear, per time unit), and the quality of the vessel and crew as a catching unit Abundance We are unable to describe the population dynamics (recruitment and mortality) of prawn stocks formally at the spatial and temporal scales of a single fishing day for a single vessel. However, we know that population biomass is likely to change with time (between years and months) and area. The generation time of tiger prawns is 6 12 months. There is a strong seasonal pattern in abundance, and catch rates in adjacent months are correlated, while abundance in adjacent years is only moderately correlated (see Figures 1 and 2; and Wang & Die, 1996). Two species of prawns constitute the tiger prawn component of the fishery. The two species differ in habitat preferences and life cycles. There are probably two or more distinct stocks of each species within the large area of the fishery. Some localities support a single species, others a mixture. The main predictor of species distribution is whether the habitat sediment type is predominantly mud or sand (Somers, 1994). Localities with historically high catches occur within each statistical area, but abundance at these localities does not always correlate with the overall abundance pattern for the month, year and area. Although we expect spatial variation in stocks, we have no reason to suspect the existence of a consistent geographical trend in catch rates. In theory, abundances of separate stocks do not have to be correlated, but there is some evidence that large-scale environmental variability affects wide areas of the fishery. This can generate correlation in abundance between statistical areas. Localities supporting different species may have negatively correlated abundance. In summary, there may be spatial and temporal correlations arising from the biology and distribution of the two species. Thus, the observations must be classified by area and time factors to account for changes in the characteristics of populations and thus to help explain variations in the catch. For this reason, area, year and month terms with all their interactions are in the main model (Robins et al., 1998). However, variations in abundance that occur at a scale of resolution other than those of the area, year and month classifiers, lead to correlated errors Swept area performance The main factors determining swept area are the width spanned by the nets under operational conditions, the trawl speed, and the vessel s ability to maintain trawl speed. The trawl speed is determined by such characteristics as the size of the vessel, engine power and propeller configuration, and drag from towing the gear. To represent these factors that determine swept area performance in this analysis we included vessel length, hull type (wooden, old steel and new steel), A-units (a measure of the vessel s ability to maintain trawl speed, defined as engine power plus tonnage), the width of the nets (headrope length) and the presence of a kort nozzle (a cylindrical device around the propeller that has the hydrodynamic effect of increasing the thrust of the propeller, allowing larger nets to be towed for the same horsepower, or, perhaps more importantly, allowing faster towing speeds for the same net) Catching unit vessel and crew The vessel is traditionally the observational unit with which catch is associated. It is often assumed that vessel characteristics equate to the characteristics of the catching unit, when in fact the crew are also part of the catching unit. Skipper experience contributes an important

6 164 JANET BISHOP, DAVID DIE AND YOU-GAN WANG AREA = Weipa (1) AREA = Mornington (3) AREA = Vanderlins (4) AREA = Melville (8) Figure 1. Catch of tiger prawns (kg per day per vessel) for four months per year (August November) over nine years ( ) for four areas in the Northern Prawn Fishery dimension to the catching power of each vessel, and skippers do not always skipper the same boat from year to year. Technology that helps the skippers target their nets to the areas of highest catch is also important. In this analysis we have investigated the type of trygear (a small trawl gear that is

7 ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY 165 # ) 4 - ) / H JA # "! ' & & ' & ' ' ' ' ' ' ' ' '! ' ' " ' ' # ' ' $ Figure 2. Catch of tiger prawns (kg per day per vessel) over four months per year (August November) for nine years, for area 5 in the Northern Prawn Fishery (Groote). Observations for randomly selected boats are joined by lines within each year. used as a sampling device to monitor catches), the towing position for trygear, the presence of Global Positioning System (GPS) and plotter systems, and the experience of the skippers. Information from catch in the trygear is used as the basis of decision-making about how to locate the highest concentrations of prawns, and accurate position information from GPS and plotter systems aids in retracing successful routes and avoiding undesirable areas. Catch rates from the same vessel might differ between different areas due to specializations of the boat and its equipment, or specializations of the skipper, for fishing in different areas. Catch rates might vary between different months due to change of decision-making strategies during the fishing season in response to combinations of declining catch rates and external factors such as fuel or market prices, bad weather, mechanical problems on board, or health problems among the crew. About half the vessels in the fleet are operated by one of several major companies. The remainder of the vessels are single owner-operated or small-company boats. While vessels in the same company share information and strategies, many skippers (including both company and independent operators) work closely with one, two or three others, sharing information to an even greater extent, for example by phoning each other several times each fishing night to discuss catch and strategies (Bishop & Sterling, 1999). Major changes in vessel and crew characteristics occur between years rather than between months within years, because vessel refits and changes in skipper occur at the end of the fishing season coinciding with the end of calendar years Data sources The analysis used commercial fishery logbook data collected from trawler skippers. The logbooks record each trawler s daily catch, and the position of the greatest catch of the day. The total catch, and the corresponding fishing effort, were determined for each vessel in the fishery in each of eight possible statistical areas each month, in each year (depicted in Figures 1 and 2). We excluded any observations based on fewer than four fishing days in any month, to remove less reliable data. Records from only August to November, the four main months

8 166 JANET BISHOP, DAVID DIE AND YOU-GAN WANG of the fishing season, were included. The final dataset used for the analysis accounted for between 63% and 71% of the total catch and fishing effort per year between 1988 and 1996 in the fishery. Information about vessel characteristics and technology on board was collated from vessel licence registers and interviews with skippers, trawler owners, fleet managers and sometimes ships chandlers. Two indices of skipper experience were obtained by linking lists of skippers with lists of equipment on board their trawlers: first, the number of seasons (there are two seasons in each year) of experience as skipper in the fishery since 1988; and second, the number of years experience with a GPS and plotter system on board. Each record contained catch and effort for vessel i in area j during month t of year k, along with the vessel s characteristics, technology on board, and skipper experience, all as at the start of the fishing season in year k. The dataset had some further characteristics. The main covariates of interest the technology factors could change within vessels over time. However, these factors changed only between years, never between months. Only one vessel characteristic, vessel length, never changed for a vessel. The time series had unevenly spaced time steps (catches for each month from August to November for nine years) because the fishing season is restricted in time each year. Vessels could fish in more than one area each month, but vessels usually fished in only a subset of all possible areas in any one year, and areas fished by a vessel varied from year to year. This produced an unbalanced design. The population of vessels was dynamic, with vessels entering and leaving the fleet each year, for reasons that might or might not be associated with their success at fishing (Dann & Pascoe, 1994 pp.10 14). In summary, the aim of the analysis was to estimate the contribution of new technology to increasing the catching power of the fleet, by either increasing swept area or improving the catching success of the vessel-plus-crew combination. While we believe that spatial and temporal correlations occur in the data, we are not sure how to describe the cluster and correlation structure. Therefore the robustness of the GEE approach is very attractive as a form of insurance against some of the limitations of our observational data. 3. Statistical model The model has a log-link function and terms for each of the technology factors and vessel characteristics being investigated, plus skipper experience, fishing effort, and the three-way interaction year-month-area with all its main effects and lower order interactions. A level for unknown was included in each of the categorized vessel technology factors to allow vessels with unknown status on one or more of these items to be retained in the analysis. The expected catch of vessel i fishing in area j, year k and month t is such that log(µ ij kt ) = α + α 1 log V 1i + α 2 log V 2ik + α 3 log V 3ik + α 4 V 4i + β 1 X 1ik + β 2 X 2ik + β 3 X 3ik + β 4 X 4ik + β 5 W 1ik + β 6 W 2ik + δ log E ij kt + h(a.y.m jkt ), where µ ij kt is the expected catch (weight in kg) of boat i fishing in area j, year k and month t ; and α is the intercept.

9 ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY 167 The vessel characteristics are V 1 V 4 : V 1i the vessel length (metres); V 2ik the total headrope length of gear being used (fathoms); V 3ik the vessel s A-units (continuous); V 4i the hull material category (three dummy indicators to represent four categories timber, old steel, new steel or unknown); α 1 α 3 are the fixed effects of the respective vessel characteristics V 1 V 3 ; α 4 is a vector of effects for V 4 with three categories. The technology factors are X 1 X 4 : X 1ik is the kort nozzle status (two dummy indicators to represent three categories present, absent, unknown); X 2ik is the type of trygear used (two dummy indicators to represent three categories otter, beam or unknown); X 3ik is the position of towing the trygear (two dummy indicators to represent three categories stern, other or unknown); X 4ik is the presence or absence of a GPS. The skipper experience indicators are W 1 W 2 : W 1ik is the number of years a skipper has worked with a plotter on board (range from 0 6); W 2ik is the number of seasons as a skipper in the fishery since 1988 (three dummy indicators to represent four categories: < 5 seasons, 5 8, 9 12 and > 12 seasons). For the respective technology characteristics (X 1 X 4 ), the fixed effects are β 1 β 4. Here β indicates vectors of parameters for multiple categories of each X ; β 5 is the effect of skipper experience with a plotter (in years); β 6 is the effect of skipper experience in the fishery (for three season categories); E ij kt is the term for days spent fishing (fishing effort); δ is the fixed effect of fishing effort; h(a.y.m jkt ) represents spatial and temporal changes in abundance of prawns arising from fluctuations in population reproduction and mortality rates, by the three-way interaction of area, year and month plus lower order terms and main effects (fixed effects for areas (A j ), years (Y k ), months (M t ) and all their interactions A.Y, A.M, M.Y and A.Y.M ) Specifying the variance or cluster structure We investigated ways of accounting for possible spatial and temporal associations in the data. Fitting a GEE model requires the definition of clusters of observations. Observations from within a cluster are assumed to be correlated, while observations from different clusters are assumed to be independent. Given the complexity of the data characteristics here, it was difficult to specify the cluster and correlation structure with confidence. We investigated several possible cluster structures, and within each, several possible correlation structures. We considered there were several possible ways of defining a cluster, depending on the assumptions we made about the data. For example, when vessels were clusters, catches from the same vessel were assumed to be correlated at all times and locations, and catches from different vessels were assumed to be independent. When vessel-year-area were clusters, catches from the same vessel were assumed to be correlated between months within any year and area, but independent otherwise. The possible cluster definitions (Table 1) fall into a natural hierarchy of complexity, in terms of the dimension of the associated GEE working correlation matrix (the cluster size). The model with vessels as clusters is the most complex, with a GEE working correlation matrix dimension of 268 (8 areas 9 years 4 months, but not all combinations occur). This cluster definition produced 243 clusters (vessels), and the number of observations per cluster ranged from 1 to 83. At the other extreme, the model with vessel-year-area as clusters is the simplest, with a GEE working correlation matrix dimension of four, because there are observations for only four months. This cluster definition produced 4890 clusters, and the number of observations per cluster ranged from one to four.

10 168 JANET BISHOP, DAVID DIE AND YOU-GAN WANG TABLE 1 Possible spatial and/or temporal clusters of correlated prawn catch observations, ordered by decreasing complexity (indicated by the GEE working correlation matrix dimension) Cluster definition Cluster size (GEE No. of clusters No. of observations working correlation per cluster: min max matrix dimension) vessel vessel-area vessel-year vessel-year-month vessel-year-area no clusters (independence) The form of correlation structures (and definition of corresponding correlation parameter, ρ ) that we report on are: independent (ρ = 0); exchangeable or equi-correlated, corr(y c,y d ) = ρ if c d ; auto-regressive (1), corr(y c,y d ) = ρ c d if c d ; m-dependent or stationary, corr(y c,y d ) = ρ c d if c d m, and 0 otherwise; unstructured (unspecified, so estimation of all correlations is required). They are similar to those specified by Liang & Zeger (1986 pp.17 18). The subscripts c and d refer to the cluster unit specified by the particular cluster-model being investigated. The autoregressive and m-dependent structures were investigated as applied to months ( t = 1,...,4), years (k = 1,...,9) and areas (j = 1,...,8, where areas were ordered geographically along the coastline of the fishery from east to west). Other correlation structures are possible, and we investigated several more complex structures; for example, to take into account two levels of clustering, we specified varying correlations between months within and between years and areas. However, we restrict this report to investigations of correlations between months within year, independent between years, (1) because of prior knowledge of abundance described in Section 2; (2) because of results from the analysis of residuals from the independence GEE model (see Section 3.2, Strategy 5); and (3) because no gain in efficiency was found from the more detailed modelling of correlations. The correlation structures can be ordered in a hierarchy of complexity, in terms of the number of correlation parameters that need to be estimated. Exchangeable and auto-regressive (the simplest structures) each require the estimation of one parameter; m-dependent requires the estimation of m parameters where m (K 1) and K is the dimension of the GEE working correlation matrix. Unstructured, the most complex, requires the estimation of 2 1 K(K 1) parameters. The combination of correlation structures and cluster definitions provides a convenient framework within which correlation models can be described, where the independent / vesselyear-area and exchangeable / vessel-year-area models are the two simplest models, and the unstructured / vessel model is the most complex Modelling strategy With the GEE approach, simplistic correlation models (that is, when cluster sizes are too small and/or there are too few correlation parameters) underestimate standard errors. Overly

11 ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY 169 complex correlation models overestimate standard errors. Under certain circumstances, misspecified correlation models lead to biased estimates. Our strategy was to look for a parsimonious model aided by the guidelines below. When regression coefficients are the scientific focus... one should invest the lion s share of time in modelling the mean structure, while using a reasonable approximation to the covariance (Diggle et al., 1994 p.145). The strategy in 1 8 below assumes that the aim is to find efficient and consistent estimators of the mean, that the correlation is a nuisance, that the lion s share of prior modelling work has reached a stage of confidence at which the mean model is reasonably correctly specified, and that the choice of link function is correct (Zeger & Liang, 1986 p. 129). A further requirement for achieving consistent GEE results is that any missing data are completely at random or few in number and do not follow a pattern of absence that depends on previous outcomes (Zeger & Liang, 1986). Unbalanced designs can be accommodated. 1. Define a hierarchy of possible clusters, based on expert knowledge of the data. 2. Fit a GEE model with one simple correlation structure for each cluster model (for example, the exchangeable correlation structure). If the results indicate little correlation, adopt the independence model. Little gain in efficiency is found in GEE models over independence models when ρ is less than about 0.3 (Liang & Zeger, 1986 Tables 1 and 2). If there is a sizeable correlation within the clusters, as estimated by ˆρ, then proceed to find the best model to account for the correlations. 3. Fit GEE models with several candidate correlation structures within the chosen cluster definition. Appropriate choices of working correlation structures for various types of data have been discussed or illustrated by Liang & Zeger (1986 pp.17 18), Lipsitz et al. (1994), Pepe & Anderson (1994), Albert & McShane (1995), Lumley (1996), Chen & Ahn (1997), and O Hara Hines (1997). Although the GEE methods are robust to mis-specification so choice should not matter, the robustness breaks down under some conditions, whereupon more accurately modelled correlation structures are required to achieve good results (see steps 4 and 6). Therefore, select candidate correlation structures based on knowledge of the process from which the data were generated. 4. Choose the best cluster definition by choosing the most complex cluster definition possible that provides reasonably consistent parameter estimates across correlation structures within a cluster. The justification for this step is as follows: the estimates from the GEE approach are asymptotically consistent even if the covariance structure is mis-specified (Diggle et al., 1994 p.145), with these exceptions: (a) coefficients for within-cluster covariates can differ markedly when the working correlation structure is grossly mis-specified (Zeger & Liang, 1986 Table 2) if predictor variables for one observation are correlated with the residuals for the other observation (Pepe & Anderson, 1994 p. 949; Lipsitz et al., 1994 p. 1161; Fitzmaurice, 1995); (b) stability can be influenced by the pattern of missing values (even when missing values satisfy general assumptions). According to Lipsitz et al. (1994 p.1162):

12 170 JANET BISHOP, DAVID DIE AND YOU-GAN WANG With missing data, each individual does not contribute equally weighted components to the effects of [within-cluster] covariates (even if the [within-cluster] covariate would have the same pattern across time for all individuals if there was no missing data), and we would expect more complicated correlation structures to be more efficient than the independence structure for estimating these parameters as well. 5. Analyse the residuals from the independent GEE model to investigate the relative contribution of the different clusters to the variance of the residuals, and to support the choice of cluster definition. We used analysis of variance components to investigate the residuals. Other authors have used different approaches, such as semivariogram models (Albert & McShane, 1995 p.629 Section 4 and pp ), and time series methods (Lumley & Heagerty, 1999 pp and Figure 2). 6. Choose the best correlation structure within the chosen cluster definition: (a) aim for as few correlation parameters as possible (Lipsitz et al., 1994; Lumley, 1996; O Hara Hines, 1997). (b) look for the structure producing the ratio of model-based to robust standard error estimates closest to 1.0 (e.g. Lipsitz et al., 1994 pp.1159, 1161). The justification for this step is as follows. There are some circumstances that are exceptions to the rule (of step 6a) that simple is best, namely, when the correlation is large (Liang & Zeger, 1986 Tables 1 and 2, p.19) or when there are within-cluster covariates or missing values that do not have the same pattern for all clusters (as described in step 4). In all these circumstances the estimates are likely to be inefficient unless the correlation structure is reasonably accurately specified, and improvement in efficiency can be made by specifying a correlation matrix closer to the true situation in the data. When the model-based and robust standard error estimates are similar, this is an indication that the specified working correlation model is more consistent with the observed association. Zeger & Liang used the ratio of t -statistics (rather than standard errors) for model-based to robust results to guide the choice of model, where the preferred model had the ratio of t-statistics closest to one (Zeger & Liang, 1986 Table 3 and p.128). 7. Check the model fit by inspecting residuals and covariance matrix. 8. For inference use the robust standard error estimates. Their use takes little effort whereas the careful modelling of covariance structure needed before model-based standard error estimates can be used may take a great deal of effort for relatively small gains. The modelbased estimate of the variance is consistent if both the mean model and the working correlation matrix are correctly specified; the robust estimate is consistent provided the mean model is correctly specified even if the working correlation is not (SAS Institute, Inc., 1997 p.296). We also agree with the caution given by O Hara Hines (1997 p.1554) against overinterpreting the correlation coefficients or final model of cluster and correlations: Attempts to interpret the estimated correlations ( ρ ) meaningfully in terms of what they say about the data can be futile, since quite different models for [the working correlation structure] were found to fit the data equally well.

13 3.3. Model fitting ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY 171 The GEE models were fitted by PROC GENMOD in SAS Version 6.12 (SAS Institute Inc., 1997 pp ; SAS code is available from the authors). The PROC GENMOD procedure implements the methods proposed by Liang & Zeger (1986), and includes options for selection of the various working correlation structures they described. The PROC GENMOD procedure also allows the use of user-specified correlation structures. Two sets of standard errors were calculated as defined by Liang & Zeger (1986) and as described in Section 2: naïve (or modelbased) and robust (or empirical). The PROC GENMOD procedure gives optional methods for calculation of the dispersion parameter. We report the dispersion parameter φ and scale factor φ estimated from the deviance/degrees of freedom, the results being very similar to those from the Pearson chi squared method proposed by Liang & Zeger (1986 p.17). The residuals from the independence GEE model were analysed with PROC VARCOMP in SAS. Intra-cluster correlations were calculated as σi 2/ σi Results Rates of uptake during 1988 to 1996 varied for the different technology items that were studied. A few vessels had GPS and plotter systems in 1988 and all vessels had them by The use of otter type of trygear increased from 30% of vessels in 1988 to 73% of vessels in Towing trygear in the stern position increased from 47% of vessels in 1988 to 68% of vessels in Kort nozzles were quite common in 1988 (found on 86% of vessels) and continued to increase in use, to 99% of vessels by The commissioning of 12 customdesigned trawlers in and the departure of older timber vessels in 1993 also affected the fleet profile. Catches of prawns decline each year from the start of fishing in August to the end of the season in November. Different years and areas can have quite different catch rates (Figures 1 and 2). Parameter estimates from the independence model showed that effort was the main predictor of catch, as expected. After adjusting for background fluctuations in abundance, and for differences in vessel length, A-units, total headrope length of gear, and skipper experience, then hull-type, kort nozzle and GPS each affected efficiency by an important amount, of the order of 5 7% (Table 2), and efficiency increased each year following the installation of a plotter, for up to six years. The type of trygear (otter or beam) made no difference, but the position of trygear may have had a small effect Correlations In this dataset, correlation parameter estimates from GEE models with exchangeable and AR(1) working correlation structures in different cluster structures ranged from ˆρ = 0.25 to ˆρ = 0.47, suggesting that moderate correlations existed (Table 2) Clusters In the simple models with small cluster sizes (clusters defined as vessel-year-area and vessel-year-month), the coefficients for the technology factors of interest were similar across correlation structures within the clusters (Table 2, a and b).

14 172 JANET BISHOP, DAVID DIE AND YOU-GAN WANG TABLE 2 Results from GEE models with different cluster and correlation specifications (columns 1 3 within each structure type show parameter estimates (robust standard errors), ratio of robust standard errors to model-based standard errors), and correlation parameter estimates, ˆρ Independence Exchangeable AR(1) Unstructured (a) Cluster = vessel-year-area, correlations between months gear 0.41 (.030) (.028) (.029) (.028) 1.3 A-units 0.25 (.026) (.024) (.025) (.024) 1.5 GPS 0.05 (.009) (.009) (.009) (.009) 1.0 plot-exp 0.02 (.002) (.002) (.002) (.002) 1.0 kort 0.07 (.013) (.013) (.013) (.013) 1.2 stern 0.02 (.011) (.011) (.011) (.011) 1.2 ˆρ (range) (0.27 to 0.48) (b) Cluster = vessel-year-month, correlations between areas gear 0.41 (.024) (.025) (.024) (.025) 1.3 A-units 0.25 (.020) (.020) (.020) (.020) 1.4 GPS 0.05 (.008) (.008) (.008) (.008) 1.1 plot-exp 0.02 (.002) (.002) (.002) (.002) 1.0 kort 0.07 (.011) (.011) (.011) (.011) 1.2 stern 0.02 (.010) (.010) (.010) (.010) 1.3 ˆρ (range) ( 0.75 to 0.36) (c) Cluster = vessel-year, correlations between month-area gear 0.41 (.039) (.042) (.038) (.039) 1.9 A-units 0.25 (.033) (.035) (.032) (.033) 2.4 GPS 0.05 (.012) (.013) (.012) (.012) 1.5 plot-exp 0.02 (.003) (.003) (.003) (.003) 1.5 kort 0.07 (.017) (.018) (.017) (.017) 1.7 stern 0.02 (.014) (.016) (.015) (.014) 1.6 ˆρ (range) ( 0.51 to 0.79) (d) Cluster = vessel-area, correlations between year-months gear 0.41 (.037) (.036) (.036) (.033) 1.5 A-units 0.25 (.030) (.032) (.029) (.029) 1.8 GPS 0.05 (.010) (.009) (.010) (.008) 1.0 plot-exp 0.02 (.003) (.003) (.003) (.002) 1.0 kort 0.07 (.016) (.013) (.016) (.013) 1.2 stern 0.02 (.012) (.011) (.012) (.010) 1.3 ˆρ (range) ( 0.17 to 0.44) (e) Cluster = vessels, correlations between area-year-month (time within area) gear 0.41 (.055) (.058) 2.5 not not A-units 0.25 (.042) (.045) 2.6 estimable estimable GPS 0.05 (.013) (.012) 1.7 plot-exp 0.02 (.004) (.004) 2.0 kort 0.07 (.024) (.016) 1.5 stern 0.02 (.017) (.014) 1.8 ˆρ In the models with intermediate cluster sizes (clusters defined as vessel-year and vesselarea), the coefficients for gear length, GPS and kort nozzle varied between models with different correlation structures (Table 2, c and d), as did the coefficients for vessel length (not shown). In the model with largest cluster size (vessels), coefficients were quite unstable across correlation structures within each cluster (Table 2, e). Estimates of the impact of GPS in the first year, and of kort nozzles were most sensitive to the different correlation models. The coefficients for vessel length, kort nozzle and GPS differed most between the exchangeable model (vessel length 0.287, kort nozzle 0.033, GPS 0.016) and the independence model (vessel length 0.015, kort nozzle 0.068, GPS 0.047) within this cluster = vessels model.

15 ANALYSIS OF THE IMPACT OF NEW TECHNOLOGY ON A TRAWL FISHERY 173 TABLE 3 Analysis of residuals from GEE independence model: variance components, random effects model Variance Ratio to error variance % of total variance vessel vessel-area vessel-month vessel-year vessel-year-area error total Standard errors for gear and A-units were notably greater in the cluster = vessels model compared to those in the model with the simpler cluster structure of cluster = vessel-year-area. Computation of models with unstructured working correlation matrices was not achievable for the model with cluster = vessel. According to our criteria for best cluster model, the two simplest cluster models vessel-year-month (correlations between areas) and vessel-year-area (correlations between months) were equally acceptable. The medium complexity vessel-year model showed some changes in parameter estimates; it was therefore less acceptable. The two most complex cluster definitions vessel-area and vessels produced unacceptable results. In summary, the vessel-year-area or vessel-year-month models appeared to be most likely to include the most parsimonious model Analysis of residuals There were three conclusions from the analysis of residuals from the independence model. First, the analysis of residuals supported the choice of one of the models with clustering, rather than none at all, because variance that could be explained by the vessels term as a random effect was not negligible (Table 3). Further, the main factors vessel, year, month and area were not the only major variation sources. Their two- and three-way interactions explain further variation, suggesting that a complicated mechanism is causing the random variation. Second, the analysis supported the choice of the models with vessels, vessel-year or vessel-year-area as clusters because they accounted for substantial variance in the residuals (Table 3). Third, the analysis of residuals helped to rule out some candidate models. The model with vessel-month as clusters accounted for negligible variance in the residuals. When vesselyear-month was included in the random effects model of residuals, the variance estimate was large and negative. These two observations lead us to reject as unsuitable both models that contained the term for months. In summary, the vessel-year-area model remained as the preferred cluster structure Working correlation structures The working correlation structures were investigated within the preferred vessel-yeararea model. Among the models with different correlation structures within the vessel-year-area cluster model, the coefficients, model-based standard errors and robust standard errors were similar, to two decimal places, across all correlation structures within the clusters (Table 2a).

16 174 JANET BISHOP, DAVID DIE AND YOU-GAN WANG # " ) 4 - ) / H JA #!! " # ' & & ' & ' ' ' ' ' ' ' ' '! ' ' " ' ' # ' ' $ Figure 3. Pearson residuals from the preferred model for area 5 (Groote), over four months per year (August November) for nine years. Values for the boats depicted in Figure 2 are joined within years. The results from the three-dependent correlation structure were very similar to those from other correlation structures within this cluster. To choose the correlation structure within the vessel-year-area model, we examined the ratios of standard errors (robust to model-based). Model-based standard errors from the independence model were underestimated by as much as half (compared to robust standard errors) for A-units and vessel length. Similarly, model-based standard errors from the exchangeable model were underestimated by as much as half (compared to robust standard errors) for A- units and vessel length (Table 2a). The unstructured correlation model had the smallest ratios, so was the best model on that basis. However, the exchangeable model gave very similar results, but with slightly larger ratios. We preferred the exchangeable model on the basis of parsimony Preferred model: vessel-year-area with exchangeable correlation The scale parameter was very high (8.8), indicating that serious overdispersion had to be taken into account in the analysis (as was done) to provide reasonable standard errors for the estimates. Goodness of model fits were otherwise adequate according to several criteria. There were no alarming patterns in the residuals (Figure 3), nor in the covariance matrix. Table 4 shows parameter estimates and robust standard errors from the preferred model (vessel-year-area-exchangeable). 5. Conclusion and discussion In this paper we have attempted to develop a method to find the best cluster and correlation structure to account for the variation not explained by model factors. We aimed for robust estimation, computational feasibility and a reasonable approximation of complexities. We started with simple assumptions about the cluster and correlation structures that might be appropriate, because we did not know how much variation had been accounted for already by those factors in the model that seek to describe catching unit and population characteristics.

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

GEE for Longitudinal Data - Chapter 8

GEE for Longitudinal Data - Chapter 8 GEE for Longitudinal Data - Chapter 8 GEE: generalized estimating equations (Liang & Zeger, 1986; Zeger & Liang, 1986) extension of GLM to longitudinal data analysis using quasi-likelihood estimation method

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use Modeling Longitudinal Count Data with Excess Zeros and : Application to Drug Use University of Northern Colorado November 17, 2014 Presentation Outline I and Data Issues II Correlated Count Regression

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM Paper 1025-2017 GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM Kyle M. Irimata, Arizona State University; Jeffrey R. Wilson, Arizona State University ABSTRACT The

More information

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,

More information

SAS Macro for Generalized Method of Moments Estimation for Longitudinal Data with Time-Dependent Covariates

SAS Macro for Generalized Method of Moments Estimation for Longitudinal Data with Time-Dependent Covariates Paper 10260-2016 SAS Macro for Generalized Method of Moments Estimation for Longitudinal Data with Time-Dependent Covariates Katherine Cai, Jeffrey Wilson, Arizona State University ABSTRACT Longitudinal

More information

Model Assumptions; Predicting Heterogeneity of Variance

Model Assumptions; Predicting Heterogeneity of Variance Model Assumptions; Predicting Heterogeneity of Variance Today s topics: Model assumptions Normality Constant variance Predicting heterogeneity of variance CLP 945: Lecture 6 1 Checking for Violations of

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

Trends in Human Development Index of European Union

Trends in Human Development Index of European Union Trends in Human Development Index of European Union Department of Statistics, Hacettepe University, Beytepe, Ankara, Turkey spxl@hacettepe.edu.tr, deryacal@hacettepe.edu.tr Abstract: The Human Development

More information

Efficiency of generalized estimating equations for binary responses

Efficiency of generalized estimating equations for binary responses J. R. Statist. Soc. B (2004) 66, Part 4, pp. 851 860 Efficiency of generalized estimating equations for binary responses N. Rao Chaganty Old Dominion University, Norfolk, USA and Harry Joe University of

More information

Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED.

Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED. Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED. Maribeth Johnson, Medical College of Georgia, Augusta, GA ABSTRACT Longitudinal data refers to datasets with multiple measurements

More information

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study Science Journal of Applied Mathematics and Statistics 2014; 2(1): 20-25 Published online February 20, 2014 (http://www.sciencepublishinggroup.com/j/sjams) doi: 10.11648/j.sjams.20140201.13 Robust covariance

More information

Chapter 3 ANALYSIS OF RESPONSE PROFILES

Chapter 3 ANALYSIS OF RESPONSE PROFILES Chapter 3 ANALYSIS OF RESPONSE PROFILES 78 31 Introduction In this chapter we present a method for analysing longitudinal data that imposes minimal structure or restrictions on the mean responses over

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 251 Nonparametric population average models: deriving the form of approximate population

More information

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 217, Boston, Massachusetts Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control

More information

Longitudinal Data Analysis of Health Outcomes

Longitudinal Data Analysis of Health Outcomes Longitudinal Data Analysis of Health Outcomes Longitudinal Data Analysis Workshop Running Example: Days 2 and 3 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1 Count models A classical, theoretical argument for the Poisson distribution is the approximation Binom(n, p) Pois(λ) for large n and small p and λ = np. This can be extended considerably to n approx Z

More information

Approximate Median Regression via the Box-Cox Transformation

Approximate Median Regression via the Box-Cox Transformation Approximate Median Regression via the Box-Cox Transformation Garrett M. Fitzmaurice,StuartR.Lipsitz, and Michael Parzen Median regression is used increasingly in many different areas of applications. The

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model Applied and Computational Mathematics 2014; 3(5): 268-272 Published online November 10, 2014 (http://www.sciencepublishinggroup.com/j/acm) doi: 10.11648/j.acm.20140305.22 ISSN: 2328-5605 (Print); ISSN:

More information

Warwick Business School Forecasting System. Summary. Ana Galvao, Anthony Garratt and James Mitchell November, 2014

Warwick Business School Forecasting System. Summary. Ana Galvao, Anthony Garratt and James Mitchell November, 2014 Warwick Business School Forecasting System Summary Ana Galvao, Anthony Garratt and James Mitchell November, 21 The main objective of the Warwick Business School Forecasting System is to provide competitive

More information

Generalized Estimating Equations (gee) for glm type data

Generalized Estimating Equations (gee) for glm type data Generalized Estimating Equations (gee) for glm type data Søren Højsgaard mailto:sorenh@agrsci.dk Biometry Research Unit Danish Institute of Agricultural Sciences January 23, 2006 Printed: January 23, 2006

More information

Generalized Quasi-likelihood (GQL) Inference* by Brajendra C. Sutradhar Memorial University address:

Generalized Quasi-likelihood (GQL) Inference* by Brajendra C. Sutradhar Memorial University  address: Generalized Quasi-likelihood (GQL) Inference* by Brajendra C. Sutradhar Memorial University Email address: bsutradh@mun.ca QL Estimation for Independent Data. For i = 1,...,K, let Y i denote the response

More information

Biostatistics 301A. Repeated measurement analysis (mixed models)

Biostatistics 301A. Repeated measurement analysis (mixed models) B a s i c S t a t i s t i c s F o r D o c t o r s Singapore Med J 2004 Vol 45(10) : 456 CME Article Biostatistics 301A. Repeated measurement analysis (mixed models) Y H Chan Faculty of Medicine National

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Robust Bayesian Variable Selection for Modeling Mean Medical Costs

Robust Bayesian Variable Selection for Modeling Mean Medical Costs Robust Bayesian Variable Selection for Modeling Mean Medical Costs Grace Yoon 1,, Wenxin Jiang 2, Lei Liu 3 and Ya-Chen T. Shih 4 1 Department of Statistics, Texas A&M University 2 Department of Statistics,

More information

RESAMPLING METHODS FOR LONGITUDINAL DATA ANALYSIS

RESAMPLING METHODS FOR LONGITUDINAL DATA ANALYSIS RESAMPLING METHODS FOR LONGITUDINAL DATA ANALYSIS YUE LI NATIONAL UNIVERSITY OF SINGAPORE 2005 sampling thods for Longitudinal Data Analysis YUE LI (Bachelor of Management, University of Science and Technology

More information

Model and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data

Model and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 1 Model and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data Dr Jisheng Cui Public Health

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next Book Contents Previous Next SAS/STAT User's Guide Overview Getting Started Syntax Details Examples References Book Contents Previous Next Top http://v8doc.sas.com/sashtml/stat/chap29/index.htm29/10/2004

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

STAT 526 Advanced Statistical Methodology

STAT 526 Advanced Statistical Methodology STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized

More information

Longitudinal analysis of ordinal data

Longitudinal analysis of ordinal data Longitudinal analysis of ordinal data A report on the external research project with ULg Anne-Françoise Donneau, Murielle Mauer June 30 th 2009 Generalized Estimating Equations (Liang and Zeger, 1986)

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011 INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA Belfast 9 th June to 10 th June, 2011 Dr James J Brown Southampton Statistical Sciences Research Institute (UoS) ADMIN Research Centre (IoE

More information

BOOTSTRAPPING WITH MODELS FOR COUNT DATA

BOOTSTRAPPING WITH MODELS FOR COUNT DATA Journal of Biopharmaceutical Statistics, 21: 1164 1176, 2011 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2011.607748 BOOTSTRAPPING WITH MODELS FOR

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 Applied Statistics I Time Allowed: Three Hours Candidates should answer

More information

Models for Clustered Data

Models for Clustered Data Models for Clustered Data Edps/Psych/Soc 589 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline Notation NELS88 data Fixed Effects ANOVA

More information

Models for Clustered Data

Models for Clustered Data Models for Clustered Data Edps/Psych/Stat 587 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2017 Outline Notation NELS88 data Fixed Effects ANOVA

More information

A Practitioner s Guide to Generalized Linear Models

A Practitioner s Guide to Generalized Linear Models A Practitioners Guide to Generalized Linear Models Background The classical linear models and most of the minimum bias procedures are special cases of generalized linear models (GLMs). GLMs are more technically

More information

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data Quality & Quantity 34: 323 330, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 323 Note Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions

More information

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Libraries 1997-9th Annual Conference Proceedings ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Eleanor F. Allan Follow this and additional works at: http://newprairiepress.org/agstatconference

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS Page 1 MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 An Introduction to Multilevel Models PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 Today s Class Concepts in Longitudinal Modeling Between-Person vs. +Within-Person

More information

RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed. Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar

RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed. Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar Paper S02-2007 RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar Eli Lilly & Company, Indianapolis, IN ABSTRACT

More information

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p ) Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

FAQ: Linear and Multiple Regression Analysis: Coefficients

FAQ: Linear and Multiple Regression Analysis: Coefficients Question 1: How do I calculate a least squares regression line? Answer 1: Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables so that one variable

More information

Multilevel Methodology

Multilevel Methodology Multilevel Methodology Geert Molenberghs Interuniversity Institute for Biostatistics and statistical Bioinformatics Universiteit Hasselt, Belgium geert.molenberghs@uhasselt.be www.censtat.uhasselt.be Katholieke

More information

VOYAGE (PASSAGE) PLANNING

VOYAGE (PASSAGE) PLANNING VOYAGE (PASSAGE) PLANNING Introduction O Passage planning or voyage planning is a procedure of developing a complete description of a vessel's voyage from start to finish. O Production of a passage plan

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

DISPLAYING THE POISSON REGRESSION ANALYSIS

DISPLAYING THE POISSON REGRESSION ANALYSIS Chapter 17 Poisson Regression Chapter Table of Contents DISPLAYING THE POISSON REGRESSION ANALYSIS...264 ModelInformation...269 SummaryofFit...269 AnalysisofDeviance...269 TypeIII(Wald)Tests...269 MODIFYING

More information

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

10. Time series regression and forecasting

10. Time series regression and forecasting 10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Pseudo-score confidence intervals for parameters in discrete statistical models

Pseudo-score confidence intervals for parameters in discrete statistical models Biometrika Advance Access published January 14, 2010 Biometrika (2009), pp. 1 8 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asp074 Pseudo-score confidence intervals for parameters

More information

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD Paper: ST-161 Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop Institute @ UMBC, Baltimore, MD ABSTRACT SAS has many tools that can be used for data analysis. From Freqs

More information

Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling

Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling J. Shults a a Department of Biostatistics, University of Pennsylvania, PA 19104, USA (v4.0 released January 2015)

More information

Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model

Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model Olubusoye, O. E., J. O. Olaomi, and O. O. Odetunde Abstract A bootstrap simulation approach

More information

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering

More information

Testing Homogeneity Of A Large Data Set By Bootstrapping

Testing Homogeneity Of A Large Data Set By Bootstrapping Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012

More information

Liang Li, PhD. MD Anderson

Liang Li, PhD. MD Anderson Liang Li, PhD Biostatistics @ MD Anderson Behavioral Science Workshop, October 13, 2014 The Multiphase Optimization Strategy (MOST) An increasingly popular research strategy to develop behavioral interventions

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

Non-independence in Statistical Tests for Discrete Cross-species Data

Non-independence in Statistical Tests for Discrete Cross-species Data J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

On the econometrics of the Koyck model

On the econometrics of the Koyck model On the econometrics of the Koyck model Philip Hans Franses and Rutger van Oest Econometric Institute, Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR, Rotterdam, The Netherlands Econometric Institute

More information

The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January

More information

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 217, Chicago, Illinois Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control

More information

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION Ernest S. Shtatland, Ken Kleinman, Emily M. Cain Harvard Medical School, Harvard Pilgrim Health Care, Boston, MA ABSTRACT In logistic regression,

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis Chapter 12: An introduction to Time Series Analysis Introduction In this chapter, we will discuss forecasting with single-series (univariate) Box-Jenkins models. The common name of the models is Auto-Regressive

More information

ATINER's Conference Paper Series STA

ATINER's Conference Paper Series STA ATINER CONFERENCE PAPER SERIES No: LNG2014-1176 Athens Institute for Education and Research ATINER ATINER's Conference Paper Series STA2014-1255 Parametric versus Semi-parametric Mixed Models for Panel

More information

The Scope and Growth of Spatial Analysis in the Social Sciences

The Scope and Growth of Spatial Analysis in the Social Sciences context. 2 We applied these search terms to six online bibliographic indexes of social science Completed as part of the CSISS literature search initiative on November 18, 2003 The Scope and Growth of Spatial

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Integrated approaches for analysis of cluster randomised trials

Integrated approaches for analysis of cluster randomised trials Integrated approaches for analysis of cluster randomised trials Invited Session 4.1 - Recent developments in CRTs Joint work with L. Turner, F. Li, J. Gallis and D. Murray Mélanie PRAGUE - SCT 2017 - Liverpool

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 Hierarchical clustering Most algorithms for hierarchical clustering

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

SRMR in Mplus. Tihomir Asparouhov and Bengt Muthén. May 2, 2018

SRMR in Mplus. Tihomir Asparouhov and Bengt Muthén. May 2, 2018 SRMR in Mplus Tihomir Asparouhov and Bengt Muthén May 2, 2018 1 Introduction In this note we describe the Mplus implementation of the SRMR standardized root mean squared residual) fit index for the models

More information

Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes

Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes Philip Jonathan Shell Technology Centre Thornton, Chester philip.jonathan@shell.com Paul Northrop University College

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information