Modeling Child Mortality through Multilevel Structured Additive Regression with Varying Coefficients for Asia and Sub Saharan Africa

Size: px
Start display at page:

Download "Modeling Child Mortality through Multilevel Structured Additive Regression with Varying Coefficients for Asia and Sub Saharan Africa"

Transcription

1 Modeling Child Mortality through Multilevel Structured Additive Regression with Varying Coefficients for Asia and Sub Saharan Africa Kenneth Harttgen, Stefan Lang, Judith Santer, and Johannes Seiler * ETH Zurich University of Innsbruck May 3, 2017 Despite the decline of under 5 mortality rates throughout developing countries in the last three decades, the rates in Sub Saharan Africa remain significantly higher. Additionally, besides Sub Saharan Africa, Asia was identified as another region that did not achieve the Millennium Development Goal (MDG) target of reducing under 5 mortality by two thirds by For determinants of child well being often a linear relationship was assumed for the effect on child health. However, it is doubtful that this assumption is valid for all covariates, therefore we allow for potential non linearities. We analyze a large data set consisting of 33 Sub-Saharan-African countries and 13 Asian countries, using a multilevel discrete time survival model that takes advantage of a recently developed multilevel framework with structured additive predictor in a Bayesian setting. In total we consider data from 116 individual surveys from 1992 to 2014, and compare under 5 mortality rates across countries allowing for potential non linear effects, and cluster specific heterogeneity within one model. We find strong non linear effects for the household size, the age of the mother, the Body Mass Index (BMI) of the mother, and the birth order. Additionally, we find considerable differences between Asian and Sub Saharan Asian countries, with respect to the baseline hazard, the birth order, and the development over time. Keywords: Child mortality Asia Sub Sahara Africa multilevel STAR models Bayesian inference JEL-Classification: C11, I15, O57 * J. S. acknowledges financial support from a scholarship by the University of Innsbruck. 1

2 1. Introduction Declining mortality rates could be observed in developing countries throughout the last three decades, in particular, most regions saw under 5 mortality rates decline significantly since This behavior has been further confirmed from the United Nations Inter-agency Group for Child Mortality Estimation (UNIGME), who testified that substantial global progress has been observed in decreasing child mortality since 1990 (You et al. 2012, 2014, You, New & Wardlaw 2015). Most regions experience an accelerated decline since 2000 compared to the period The annual rate of reduction increased in Sub-Saharan Africa from 1.6% in to 4.1% in , in Central Asia from 1.4% to 4.6%, and in Southern Asia from 3.2% to 3.6%. Despite this great progress towards achieving MDG 4, high under 5 mortality rates are now concentrated in these three regions and Oceania, where these rates remain higher as targeted by the MDGs for 2015 (You, New & Wardlaw 2015). Among these regions, countries showed a large dispersion in reducing under 5 mortality. Analyzing these cross country differences was addressed by Jamison et al. (2016), who tried to answer why, in developing countries, under 5 mortality decreased in such great variation between countries. In their panel, consisting of 95 developing countries and the time from , under 5 mortality declined by about 2.7% per year. One major cause, which was found after controlling for socioeconomic and geographic factors, was found to be the rate of technical progress which differed across countries. However, potential non linearities where not taken into account. Besides this shortcoming, another flaw in the empirical literature to this topic in Asia and Sub Saharan Africa is that most contributions to the literature focus only on single countries or a small number of countries (see for instance Adebayo & Fahrmeir 2005, Ayele et al. 2015, Chamarbagwala 2010, Madise et al. 1999). A comparison across countries and geographic regions was regarded as not helpful by some authors (e.g. Black et al. 2003), due to different causes and different magnitude of the effects on under 5 mortality. However, the structure of the Demographic and Health Survey (DHS) data sets which are standardization across countries and provide information for single households, and recent theoretical advances in hierarchical modeling in a Bayesian framework (Lang et al. 2014, Belitz et al. 2015) makes it possible to jointly analyze under 5 mortality in Asia and Sub-Saharan Africa on an individual level. Furthermore it is possible to account for regional or country specific heterogeneity, in more detail Markov chain Monte Carlo (MCMC) simulation techniques are used, to implement this multilevel framework in a Bayesian setting. Addressing these issues is one focus of the paper since we account for both possible non linearities and heterogeneity across countries. Conducting an analysis across countries gives rise to data quality issues. For instance Limin (2003) mentions that depending on the data source variables show huge differences for a given country and a given year. Considering this issue is necessary to get correct empirical estimates. DHS data offers a standardized methodology which is consistent across countries. In analyzing empirically under 5 mortality at an individual level, the analyzed outcome is accordingly binary (0=alive, 1=deceased). The correct empirical specification 2

3 for such an outcome are either logit, or probit models. Applying this type of models is a bit problematic in the sense that sampled observations have to be fully exposed to under 5 mortality, whereas observations that are not fully exposed to under 5 mortality can not be included in the sample, since otherwise a sampling error would be introduced. Due to this, several studies try to overcome this critique by considering a discrete time survival model to account for right censoring of those observations not fully exposed to under 5 mortality (see for instance Adebayo & Fahrmeir 2005, Kandala & Ghilagaber 2006, Kandala et al. 2014) however, these studies apply (geoadditive) discrete survival models only to a specific country. By extending this type of model, and taking advantage of the recently developed multilevel structured additive regression (STAR) approach (Lang et al. 2014) we are able to systematically analyse under 5 mortality on an individual level, take right censoring of the data into account, and compare the two geographic regions where child mortality was found to be highest. Accounting for these issues, we apply a (multilevel) discrete time survival model, which is essentially a regression model with categorial response. In contrast to the afore mentioned caveat, not only observations fully exposed to under 5 mortality can be included in the analysis, instead, this approach allows to account for right censoring of the data. Analyzing and understanding the link between the health outcomes of children and the determinants of child mortality is of great interest for future policies and aid. Comparing the two geographic regions of the world where under 5 mortality remains highest will give new insights, and bring out potential differences between these regions and the causes of under 5 mortality. A better understanding of this relationship and applying this knowledge to future policy can effectively help to achieve the newly formulated Sustainable Development Goals (SDGs) 1 in the long run. In addition, through more purposeful aid the health outcome of children can be improved in the short-term. Building on this literature, in our research we will allow for possible non linear covariate effects on under 5 mortality in our analysis, and we will jointly analyze the two geographic regions where under 5 was found to be highest. Our goal, and the major research question, besides analyzing non linear covariate effects, is to systematically analyze the two geographic regions where child mortality remains highest in the world and find potential differences of the mechanism causing these high mortality rates. Such broad comparison across regions will add new insights in the understanding of the determinants of under 5 mortality. Additionally, we will take advantage of the structure of the DHSs and analyze under 5 mortality on an individual level, and simultaneously account for region and country specific heterogeneity in a Bayesian framework. The remainder of the paper is organized as follows: Section 2 gives an overview of the used data set and a description of the covariates. This is followed by the description of the method. Section 3 includes an overview of the estimated model and the methodology used. The model is estimated within BayesX (Belitz et al. 2015) and later the results are imported into R (R Core Team 2016) for further analysis of the results in 1 The SDGs where adopted by the United Nations in 2015 and can be seen as successor of the MDGs. In contrast to the MDGs they cover not only development related topics, instead the focus of the SDGs is on sustainable development 3

4 R2BayesX (Umlauf et al. 2015). This section is followed by a thorough discussion of the expected effects. Section 5 summarizes the results. Finally, in Section 6 concluding remarks and a discussion will be offered. 2. Data 2.1. Source of the data For conducting our analysis, we use DHS data that is representative on the national level. DHS data is collected by Macro International Inc., Calverton, Maryland in cooperation with local administration and is funded by United States Agency for International Development (USAID). The data is collected in waves, with currently the 7 th wave ongoing, and normally a period of 5 years between the waves. Currently more than 250 standard DHS for over 90 Low and Middle Income Countries (LMICs) are available, and the usual sample size ranging between 5,000 and 30,000 households. DHS are standardized across countries and collect data on topics such as child health, infant and child mortality, education, and maternal health to household s durables, and quality of the household s sanitation and dwelling. Our study contains information on 825,076 children from 46 countries which were sampled in 116 surveys over the period from 1992 to Additionally, the economic indicators are taken from the Penn World Tables (PWT) (Version 8.0) (Feenstra et al. 2015).Our sample consists of 13 Asian 2 and 33 Sub Saharan African countries. These countries can further be divided into into 348 distinct regions. Figure 1 depicts information about the included countries, and countries that could not be included. Table 7 subsumes the number of children and mortality rates per country and survey year. Main descriptive statistics, coding of the variables, and the level in which the covariate enters are found Table 2 and Table 3. A descriptive overview of the dependent variable can be found in Table Dependent variable Analyzing under 5 morality is equivalent to model the probability of dying between birth and the fifth birthday (Rutstein & Rojas 2006). Information on under 5 mortality comes from DHS data which offer basic information for deceased children (sex, data of birth, and date of death) and more detailed information for living children such as the children s health and nutritional status. To apply a discrete time survival model, it is necessary to augment the data. We define an indicator of under 5 mortality for child i that is equal to 0, if the child is still 2 Asian countries include Central-Asian countries as Armenia, Azerbaijan, Kazakhstan, Kyrgyzstan, Turkey, Uzbekistan, South-Asian and South-East-Asian countries as Bangladesh, Cambodia, India, Maldives, Nepal and Pakistan and countries from the Arabian Peninsula geographically located in Asia as Jordan, see Deaton (2007) 4

5 Fig. 1: Map of included countries: Boundary information of the map is taken from Light gray colored countries indicate included countries; Dark gray colored countries indicate Arabian countries not located (completely) in Asia; Gray countries indicate countries with missing variables or countries where regional structure changed considerably and a harmonization was not possible. alive in time interval t and 1 if the child died in time interval t: { 1, if t = t i, and dead i = 1 dead it = 0, otherwise. Taking advantage of a discrete time survival model, our sample is not restricted to observations fully exposed to under 5 mortality, and therefor observations are not limited 5

6 Tab. 1: Under 5 mortality rates by continent Sample Number of observations (n) Mortality rate (per 1,000 children) Asia 238, Sub Sahara Africa 586, Total 825, The sample contains 825,076 observations nested in 348 regions and 46 countries. Source: DHS data sets; calculations by authors. to the restriction to be born at least five years previous the survey, as suggested by (Bhalotra 2006). Applying a discrete time survival model takes censoring into account, and the sample is not restricted to children fully exposed to under 5 mortality, instead observations younger than 60 month at the date of the interview are considered. In Table 1 under 5 mortality rates by continent are presented. Particularly noticeable is that the observed mortality rate of African countries is 1.5 times as high, when compared to the observed mortality rate Asian countries are faced with. A similar pattern was already pointed out by Klasen (2000) and Misselhorn & Harttgen (2006) Independent variables Explanatory variables can be grouped into child specific factors, environmental and socio-economic characteristics, household specific and demographic factors, and maternal characteristics. These characteristics can either be available as discrete covariates, or in the form of continuous covariates. An early survey on determinants of child mortality and candidates for potential covariates can be found in Mosley & Chen (1984). Child specific factors Child specific determinants include categorial covariates for the sex. It can be seen illustratively in Figure 2 that potential differences between female and male born children exist. To control for this potential gender gap, including this variable seems to be valid. The same plot suggests that mortality is not evenly distributed over the whole age range from 0 to 59 month. This effect will be captured in the baseline effect of the discrete survival model. In addition to the sex of the children, a categorial indicator whether an older sibling of observation i died is included into our model, as it is found to be positively related with under 5 mortality. The children s nutritional status is found to be a major cause of child mortality (see for instance Pelletier & Frongillo 2003, Black et al. 2003). However, since information on the children s nutritional status is only available for non deceased children this variable enters the model only at the second level. Unfortunately, this yields to a potential bias, since the nutritional situation is potentially overestimated and one can assume that deceased children have a lower z-score. Despite this, Alderman et al. (2011) found no evidence for this bias. In that simulation study, with national representative data for Indian children, no statistical significant evidence was found that 6

7 selective mortality influences the anthropometric indicator on a large scale. Including as anthropometric indicator the z-score for stunting on the regional level seems to be the best option to capture the effect of the nutritional status of the child on under 5 mortality. As mentioned the z-score is not observed for deceased children and therefore can only enter the hierarchical model at the regional level. For a detailed explanation on the calculation of the z-score see Harttgen et al. (2015). In our analysis we used z- stunting, which is the height for age z-score. It is used for two reason: First according to de Onis et al. (2012) prevalence of malnutrition is higher for stunted children compared to the other two anthropometric indicators. Second, more observations on the regional level are available for z-stunting in contrast to z-underweight and z-wasting. Including instead z-wasting or z-underweight would have reduced the sample to less than 348 regions, as in some regions only z-stunting was observed. Environmental socio economic covariates Several studies found a significant difference in the health status of children when comparing the location of the household, see for instance Black et al. (2003) and more recently Harttgen & Günther (2011). To control for differences between rural (countryside, villages) and urban (cities, towns) areas, an effect coded covariate is included to capture potential differences. Several recent studies (see for instance Baird et al. 2010, Bhalotra 2006) found a negative relation between a countries economic wealth, measured as Gross Domestic Product (GDP) and not only under 5 mortality, but child mortality in general. A country with a higher GDP can potentially invest more in the country s health system, in food programs, and schooling and education which yield to decreasing mortality rates. Moreover, we assume that the included covariate effects are not equal across regions and countries, such that we cluster the data geographically according the observations affiliation to a particular region and country (Jain et al. 1999, Korenromp et al. 2004). Clustering the observations into regions and countries is, in particular, useful to model potential heterogeneity between regions and countries. In addition, since one main focus of our study is to find possible differences in the covariate effects between Asian and African countries, we include a categorial variable that indicates on which continent (Asia, Sub Sahara Africa) an observation is located. This variable will than be used as effect modifier to capture potential differences in the covariates between Asian and Sub Saharan countries through varying coefficients. Hastie & Tibshirani (1993) first introduced this model where the effect of variable used as effect modifier varies smoothly over the range of the continuous variable. Household specific characteristics A household s material wealth is positively related with the health status of the children living in this household. To measure the wealth of a household, an asset index is included into the analysis. This index is derived through a principal components analysis based on assets (e.g. television, fridge or motorbike) possessed by the household of observation i and dwelling and sanitation characteristics of the household. For instance Limin (2003) finds that having access to flush toilets is reflected in a declining mortality rate. Quality of the sanitation facilities besides 7

8 variables like the households source of drinking water and the possession of various assets is compromised in the asset index. For a detailed detailed description how the index is derived see Filmer & Pritchett (2001). The effect of the household s wealth enters the hierarchical model in the first level as deviation from the regional specific mean and it should be an approximation for the household s economic wealth. In addition, the regional mean of the asset index is included in the regional specific level of the hierarchical estimation. To capture the effect of the size of the household, the total number of household members will be included in the model. The effect is ambiguous as, on the one hand, in a larger household more people potentially can look after young children, while on the other hand the conflict about the distribution of the household s wealth is bigger. As well, a categorial indicator whether the household is lead by a female or a male person will be included. Maternal characteristics Further maternal characteristics include the mother s nutritional status which is measured as the BMI, which is found to influence the health of the child. The same is assumed for the age of the mother birth at the birth of her child. Especially young mothers do not possess the experience to raise children, which also influences their health status. Maternal characteristics also include, the number of years of education of the mother. This covariate is assumed to have a decreasing effect on the child s development status. This can be attributed to two major thoughts. First, women with higher education can better assess relevant information which, in the end, will benefit her children s health and overall status. Second, higher education has a monetary payoff, since the chance to find better jobs increases. Finally, to account for declining mortality rates in the last two decades (Lozano et al. 2011) and to control for a time trend in under-5 mortality, which potentially declined in a non linear way, the year the survey was conducted is included in the model. To view the development of under 5 mortality in the last two decades, in Figure 3 the trend of under 5 mortality over time, can be seen in an informative way Coding of Covariates Categorial covariates Categorial explanatory variables will be effect coded, since as it is pointed out by Umlauf et al. (2015), the sampling and convergence is improved compared to a dummy variable coding of the explanatory variables. As an example, for the variable continent, which indicates whether a particular observation is located in an Asian or in Sub Saharan African country this is done es follows: 1, if country i is located in Sub Saharan Africa continent = 1, if country i is located in Asia 0, else. 8

9 Nelson Aalen cumulative hazard Female SSA Male SSA Female Asia Male Asia Under 5 mortality rates (per 1000 live births) Asia Sub Saharan Africa Age in months Year Fig. 2: Nelson-Aalen cumulative hazard by sex and continent. Source: DHS data sets; calculations by authors Fig. 3: Development under-5 mortality over time. Smooth lines are based on locally weighted regression. Source: DHS data sets; calculations by authors Where the corresponding value of the variable takes 1, if country i is located in a Sub Saharan Africa, and -1 if the country i is located in an Asia, which will be the reference category. Obviously, the variable continent will only take values of 1 and -1 as only Asian and Sub Saharan African countries are analyzed. Descriptive statistics for the our sample can be seen in Table 2, with the following covariates entering the model as categorial covariates: Whether an older sibling of observation i deceased, whether the household has a female leader, the sex of the children, and the place of residence. Additionally to detect potential differences between Asian and Sub Saharan African countries a binary indicator is included as the interaction variable in the varying coefficient terms. Tab. 2: Descriptive statistic categorial variables Definition categorial covariates Share in % Level Child has older dead sibling (=1) 9.49 Level 1 Household has female head (=1) Level 1 Sex of child (1=male) Level 1 Place of living (1=urban) Level 1 Continent (1=Sub Saharan Africa) Level 3 The sample contains 825,076 observations nested in 348 regions and 46 countries. Source: DHS data sets; calculations by authors. 9

10 Continuous covariates All other variables enter the model as continuous covariates. Included variables are: the deviation from the regional mean of the asset index, the birth order of the children within a household, the years of education of the mother, the numbers of people living in household i, the mother s age at birth of children i, the mother s BMI, and to capture a trend over time the year the survey was conducted at the first level. All these variables are found to influence under 5 child mortality by various authors. At the second level the height for age z score is included into the analysis. At the highest level of the hierarchical model we consider, level 3, the countries average GDP is included. Table 3 provides some basic descriptive statistics and the level at which the variable enters our baseline model. Tab. 3: Descriptive statistic continuous variables Definition continuous covariates Mean std. dev. Level Asset index deviation regional mean Level 1 Birth order within the household Level 1 Years of education of mother Level 1 Number of people in household (includig children) Level 1 BMI mother Level 1 Age mother at birth in years Level 1 Z-score stunting regional mean Level 2 Real GDP p.c., at 2005 constant prices 2, , Level 3 Gini Coefficient Level 3 The sample contains 825,076 observations in 348 regions and 46 countries. Source: DHS data sets, Penn World Table 8.0; calculations by authors 3. Method & model 3.1. Discrete survival model Discrete survival models can be specified as regression models with binary response, for instance as logit or probit model. However to make an estimation feasible, the data needs to be augmented, such that the number of entries of observation i are equal to the number of time intervals t observation i is exposed to the risk of dying. DHS data provides only monthly information on the age of the child at the date of the interview. Hence, information on the duration is only measured on a discrete time scale. Additionally, due to restricted memory, and to make an estimation tractable in BayesX, we further divide the duration time scale into semiannual time intervals, each of length of six months, [0, 1, 2, 3, 4, 5), [6, 7, 8, 9, 10, 11),..., [54, 55, 56, 57, 58, 59). Duration time T can be considered as discrete with T {1, 2,..., q; q = 10}. Besides the duration time T, the vector of covariates x i is observed, then x it = (x i1, x i2,..., x it ) denotes the information 10

11 at each time interval up to time interval t. According to Adebayo & Fahrmeir (2005), the discrete hazard function can be written as follows: λ (t; x it) = P (T = t T t, x it), t = 1,..., 10, (1) where λ (t; x it) is the discrete hazard function rewritten as the conditional probability of death in time interval t, given observation i survived until time interval t. In total 825,076 children are sampled, with i = 1,..., 825,076, the observed survival interval t i of length of six month, with t i {1,..., 10}. The event indicator dead it, with dead it = 0 if child i is alive in time interval t, and dead it = 1 child i died in time interval t. In the former case, t i corresponds to the current semiannual age interval of child i at the interview, and in the latter case t i corresponds to the observed age interval the child died. Thus the binary event indicator dead it of the survival status at time interval t i for child i is defined as follows: { 1, if t = t i, and dead i = 1 dead it = (2) 0, otherwise. Thus the underlying survival process of child i can be interpreted as series of dichotomous choices at each duration interval t i. Assuming a probit link for the discrete time survival model, the response function is given by π it = φ(η it ), where φ(η it ) is the cumulative distribution function (cdf) of the standard normal distribution and the link function is given by φ 1 (π it ) = η it (Fahrmeir et al. 2009, Chapter 4). Thus the linear predictor for the variable of interest dead it is given by η it, and as in Adebayo & Fahrmeir (2005) the partially linear predictor can be written as follows: η lin i = γ 0 (t) + γ 1 x it γ q x itq + ɛ it. (3) Using only the partially linear predictor ηit lin leads to the restriction that the relationships of continuous covariates have to be linear, which is not justified for all covariates. In view of this pitfall, the linear predictor ηit lin is substituted by the more general predictor η it, in which the continuous covariates are modeled through a potentially non linear function f. Which yields to the following apparent additive model: η it = x itγ + f 0 (t) + f 1 (x it1 ) f p (x itp ) + ɛ it, (4) where x itγ is the set of covariates that enter the model linearly (the effect coded categorial covariates), f 0 (t) captures the baseline effect, and f(x it1 ) to f(x itq ) are continuous covariates that enters the model with an assumed non linear relationship. Modeling now the relationship between the set of covariates and the variable of interest 11

12 the additive predictor can be written as follows: ηit add = f 0 (t) + f 1 (t)continent+ f 2 (aidrm it ) + f 3 (aidrm it )continent+ f 4 (bord it ) + f 5 (bord it )continent+ f 6 (eduyear it ) + f 7 (eduyear it )continent+ f 8 (hhs it ) + f 9 (hhs it )continent+ f 10 (magebirth i ) + f 11 (magebirth it )continent+ f 12 (mbmi it ) + f 13 (mbmi it )continent+ f 14 (surveyyear) + f 15 (surveyyear it )continent+ γ 1 deadsib it + γ 2 fhh it + γ 3 male it + γ 4 urban it + f 16 (stuntingrm it ) + f 17 (stuntingrm it )continent+ (5) f 18 (GDP cm it ) + f 19 (GDP cm it )continent + γ 5 continent + ɛ it = ηit individual + f 16 (stuntingrm it ) + f 17 (stuntingrm it )continent+ f 18 (gdpcm it ) + f 19 (gdpcm it )continent + γ 5 continent it + f 20 (ginicm it ) + f 21 (ginicm it )continent + γ 5 continent it + ɛ it The categorial covariates continent, male, rural, fhh, and deadsib are effect coded and keep their assumed linear relationship. f 0 (t) is the baseline effect of the age of the children and f 1 (t)continent is the time varying effect for the categorial covariate continent, as we assume differing baseline effects for Asian and Sub-Saharan African countries. Additionally, it is assumed that the relationship between the continuous covariates aidrm, bord, gdpcm, ginicm, eduyear, hhs, magebirth, mbmi, stuntingrm, and surveyyear and the variable of interest, dead it is potentially non linear. The non linear functions f of the covariates aidrm, bord, gdpcm, ginicm, eduyear, hhs, magebirth, mbmi, stuntingrm, and surveyyear in Equation 5 are modeled with the help of a penalized spline approach by a linear combination of basis functions (Lang et al. 2014). To detect differences between Asian and Sub Saharan African countries we include a varying coefficients term for each continuous covariate where the continent specific effect varies smoothly over the range of the continuous variable. Varying coefficients where first introduced by Hastie & Tibshirani (1993) Modelling the non linear function f As mentioned previously the potential non linear terms will be modeled through a non linear function f that can be estimated using penalized splines (P splines) (Fahrmeir et al. 2009, Chapter 7). In this setting the penalized splines will be approximated by using an adequate linear combination of basis functions that take the form: f(x) = d γ j B j (x) (6) j=1 12

13 The basic idea behind P splines, is to divide the range of the data into a relatively large number of intervals. The boundaries between two intervals are called knots, we used ten knots as the default choice for the number of knots. At the same time a penalty term is included to prevent overfitting. In a frequentist setup this model could be estimated considering a standard maximum likelihood procedure. In a Bayesian setup the model will be estimated using a simulation procedure through MCMC. For a more thorough description of the estimation see for instance (Fahrmeir et al. 2009, Chapter 7 and Chapter 8) or Harttgen et al. (2015) Capturing differences between Asian and Sub Saharan African countries with varying coefficients For determining potential differences between Asian and Sub Saharan African countries a varying coefficient term for the geographic location is included in the estimation. Including varying coefficients was first proposed by Hastie & Tibshirani (1993) and applications of varying coefficients can be found in Lang & Sunder (2003) or Lang et al. (2014), where for the second National Family Health Survey in India, determinants of children s malnutrition with respect to potential gender differences are analyzed. In this context, the effect of a covariate, e.g. the BMI of the mother can vary with respect to binary indicator for Asian respectively Sub Saharan African countries. This will be modeled as follows, η add it =... + f 12 (mbmi it ) + f mbmi continent (mbmi it )(continent) x itγ + ɛ it. where the cagetorial covariate continent is the so called interaction variable and the continuous covariate mbmi the effect modifier. The varying coefficients term now allows the effect of continent to vary smoothly across the range of mbmi. Including this varying coefficient into the estimation should detect potential differences in the effect of a specific covariate for Asian and African countries. The covariate continent is used as the interaction variable with Asian countries being the reference category Multilevel STAR framework To account for the hierarchical structure of the data, and to allow for heterogeneity between countries, a multilevel STAR model is used. The 825,076 observed children are encased in 348 regions, which are again nested into 46 countries in Asia and Sub Saharan Africa. This aggregation of the observations into regions and countries is called clustering (Jain et al. 1999, Korenromp et al. 2004). Lang et al. (2014) developed a multilevel approach that allows to account for heterogeneity on the regional and country level, as well as to include non linear covariate effects. Expanding the additive model of Equation 5 beyond the individual level and including country specific information in 13

14 the level 2 equation yields the following model. Level 1: η it = ηit individual + f 16 (stuntingrm it )+ f 17 (stuntingrm it )continent + f 18 (country it ) + ɛ it Level 2: f 18 (country) = f 18,1 (GDP country ) + f 18,2 (GDP country )continent+ f 18,3 (gini country ) + f 18,4 (gini country )continent+ (7) f 18,5 (country) + γ 18,1 continent + ɛ 18 One can extend Equation 7) by accounting for the regional structure; doing so yields the model formulated in Equation 8. This model accounts for the hierarchical structure of the data and the fact that some variables are only observable as regional aggregates or country aggregates, respectively. In the end, this gives the following hierarchical STAR model, witch includes a varying coefficient to account for the differences of Asian and Sub Saharan African countries. Level 1: η it = ηit individual + f 18 (region it ) + ɛ it Level 2: f 18 (region) = f 18,1 (stunting region ) + f 18,2 (stunting region )continent+ f 18,3 (country) + f 18,4 (region) + ɛ 18 Level 3: f 18,3 (country) = f 18,3,1 (GDP country ) + f 18,3,2 (GDP country )continent+ f 18,3,3 (gini country ) + f 18,3,4 (gini country )continent+ (8) f 18,5 (country) + γ 18,3,1 continent + ɛ 18,3 Accounting for heterogeneity Up to this point the baseline model captures cluster specific heterogeneity only by including i.i.d. Gaussian random effects of the region (f 18,4 (region) N (0; σ 18,4), 2 and the country (f 18,5 (country) N (0; σ 18,5). 2 To capture potential heterogeneity of the individual specific non linear effects f q, and the individual specific linear effects γ q, for each individual specific covariate a country specific random effect is included. Models that include this type of interaction for an unordered grouping factor, such as countries, are also called models with random slopes (Belitz et al. 2015). This can be of particular use when accounting for country or region specific heterogeneity. The basic idea is then to add random slopes to the individual specific covariates of Equation 7. For example, for the effect of the BMI of the mother, the model is rewritten as follows: η it =... + f 12 (mbmi it ) + f 12 (country)(mbmi it ) + f 13 (mbmi it )(continent) ɛ it. (9) Where f 12 (country) N (0; σ 11) 2 is i.i.d. normal distributed with inverse Gamma distributed variance σ A more recent, alternative method to model the cluster specific variation of the non linear terms is to use multiplicative country specific random effects, as they were applied in recent papers, see for instance Wechselberger et al. (2008), Brunauer et al. (2013), Lang et al. (2015). In this variant the country specific random effect is written as follows, 14

15 η it =... + ( 1 + f 12 (country it ) ) f 12 (mbmi it ) + f 13 (mbmi it )(continent) ɛ it. (10) A random effect smaller than zero ( f q (country i ) < 0) yields to a scaling down of the main effect, whereas a random effect greater than zero ( f q (country i ) > 0) yields the main effect to be scaled up. For another application of multiplicative country specific random effects see Harttgen et al. (2016). 4. Expected effects 4.1. Expected effects discrete covariates The covariates entering the model will affect the outcome in a specific way. Effect coded discrete variables will influence the outcome either positive, or negative. For example, Klasen (2000) and Misselhorn & Harttgen (2006) mention that mortality rate in African countries is almost twice as high as in Asian countries. Due to this knowledge, the effect of the variable continent is assumed to be positive, which means that a children born in Sub Saharan Africa is confronted with a higher mortality risk as a children born in an Asian country Expected effects of non linear terms Individual level The first level of the hierarchical model includes the following covariates: The household s asset index which is measured as deviation from the regional mean, the birth order, the mother s years of education, the number of people living in the household, the age of the mother at birth, the mother s BMI, and to capture the time trend the year the survey was conducted. Black et al. (2013) mentions, for instance, a mother being undernourished is associated with a higher mortality risk of the child. Additionally, they found that the same relation holds when the mother is obese. This relation can be described to be U-shaped for a range of the mother s BMI where the risk of deceasing for the child is lowest in a certain range in the middle of the range of the BMI. This same U-shaped pattern is assumed for the age of the mother, for a similar line of reasoning. The indicator for the household s wealth, the asset index, is assumed to be monotonically decreasing with child mortality. The same relationship is assumed for the mother s years of education. Additionally, the birth order is included. We assume that this indicator is positively related to the children s mortality risk. The higher an children is in the birth order the higher is the personal risk of deceasing for the child. Regional level Regional specific heterogeneity is modeled through the equation at the second level of the model. As covariate the average z score of the region is included. It is expected that the variable is negatively associated with child mortality. This means 15

16 that in regions a lower average z score, the probability for a child to die within the first 60 months after birth is higher with respect to regions with higher values for these indicators. Country level To account for heterogeneity on the country level, in Equation 7 the model includes country specific effects f 17 (country) that is modeled in the country level of the model. At the country level the only variable which is included is the GDP per capita. A recent study by Baird et al. (2010) found a negative relation between a countries per capita income and under 5 mortality. In particular, a decline of the country s GDP caused by a recession resulted in an increase in mortality rates. The increase is found to be notably larger for female children (Baird et al. 2010). Tab. 4: Expected effect of the covariates on under 5 mortality Definition variable Classification Level Expected effect Child has dead siblings (deadsib) discrete Level 1 positive Household has female head (f hh) discrete Level 1 positive Sex of child (male) discrete Level 1 positive Place of living (urban) discrete Level 1 negative Asset index deviation regional mean (aidrm) continuous Level 1 decreasing Birth order within household (bord) continuous Level 1 U shaped Years of education mother (eduyear) continuous Level 1 decreasing Number of people in household (hhs) continuous Level 1 increasing Age of mother at birth (magebirth) continuous Level 1 U shaped BMI mother (mbmi) continuous Level 1 U shaped Year of the survey (surveyyear) continuous Level 1 decreasing Z stunting regional mean (stuntingrm) continuous Level 2 decreasing Continent (continent) discrete Level 3 positive GDP per capita country mean (gdpcm) continuous Level 3 decreasing gini coefficient country mean (ginicm) continuous Level 3 increasing Categorial covariates: The expected effect of the discrete variables on child mortality are assumed to be positive, negative or neutral. Continuous covariates: The expected effect of the continuous variables on child mortality are assumed to be increasing, decreasing or U-shaped. 5. Results Model selection of the final model was accopmlished using as selection criteria Bayesian credible bands Krivobokova et al. (2010), to identify significance of covariates, and the Deviance Information Criterion (DIC) Spiegelhalter et al. (2002), to compare the different specifications. In the model building phase, the MCMC sampler was set to 45,000 16

17 and a burn in period of 5,000 s was used. Depending on the complexity of the model, estimation in this stage takes approximately between 900 and 1,800 minutes on a laptop operating on Windows basis, and Intel i7-5600u 2.60 GHz processor. In the final model we used 110,000 s and a burn in phase of 10,000 s for the MCMC sampler. Before having a closer look at the results of the estimated models, which are throughout all specifications rather similar, the model selection will be discussed. In Table 5 the difference of the DIC of the estimated models to a baseline model of Equation 5 can be seen Model selection Simultaneous Bayesian credible bands Krivobokova et al. (2010) together with the Deviance Information Criterion Spiegelhalter et al. (2002) are used for model selection and analyzing significance of the covariates. The DIC is considered to be the generalization of the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) for hierarchical models. Moreover, it is pointed out by Spiegelhalter et al. (2014) the AIC can not be calculated in hierarchical models, whereas it is feasible to calculate the DIC. As Bayesian models became feasible in the late 1990s a model comparison criterion was still missing, and the DIC was proposed by Spiegelhalter et al. (2002) in Selection process It compromises a trade off between the fit of the data and the complexity of the model. The lower the DIC the better the model describes the data. Spiegelhalter et al. (2002, 2014) suggest, as a rule of thumb to favor the model with the lower DIC if the difference is at least 10. If the difference is between 5 and 10 the differences are of substantial nature, and if the differences are smaller then 5 both models should be considered. In a further step, a variable that enters the model linearly is considered to be significant if the 95% credible band does not fully include zero. If the covariate is modeled as a non linear effect, the variable is called significant if the 95% credible band does not include the zero line over the complete range of the covariate. Simultaneous Bayesian confidence bands are obtained for effect f i from the effect s posterior distribution which is observed from the data, where the effect f i is estimated using MCMC simulation techniques (Krivobokova et al. 2010). Intuitively, the wider the credible bands in a certain range of the data, the more inaccurate f i describes the data due to the scarce information the data provides. The narrower the credible bands the better the information on the curve of f i. To discriminate between the estimated models and to select the model that offers the best trade off between model complexity and model fit the DIC is consulted, since it can be calculated for hierarchical models. Table 5 shows the differences of the DIC of the estimated models to the baseline model of Equation 5. 17

18 Selection of the baseline model Selecting between a solely additive model and multilevel models with either two or three levels, the DIC strictly favors the multilevel models of Equation 7, and Equation 8 compared to the simple additive model of Equation 5. The DIC of the additive model is roughly between 1,950 and 3,750 points higher in comparison to the two multilevel models with either two or three levels, and without random effects. A comparison of these two models and accounting not only for the country specific structure of the data, and instead accounting for the fact, that individuals are nested into regions, which are clustered into countries reveals the following: Incorporating, besides the country specific structure of the data, the regional specific structure yields an improvement, lowering the DIC by approximately 1,800 compared to the model of Equation 7. Allowing for different effect sizes of countries So far cluster specific heterogeneity is only considered through the country, respectively, region specific random intercept. However as mentioned earlier, it is possible to allow for cluster specific effect sizes for all covariates. Doing so, and, including country random slopes as for instance, described in Brezger et al. (2003) into Equation 7 and Equation 8 show a clear decrease of the DIC of more than 4,000, respectively 3,500 points in comparison to the models without random slopes. Additionally, modeling country specific heterogeneity of the continuous covariates through multiplicative random effects yields a further improvement. The DIC decreases by around 900 points in comparison to the models capturing heterogeneity with random slopes. This clearly emphasizes the advantage of modeling country specific heterogeneity with multiplicative random effects. Taking advantage of the rule of thumb for the DIC by Spiegelhalter et al. (2002) the multilevel model that includes covariates on the individual, regional and country level is always favorable in comparison to models where only the the individual level or the individual and country level is considered. Furthermore, accounting for heterogeneity yields to further improvements of the model. Accounting for heterogeneity is most efficient when multiplicative random effects are used. In favor of this decision rule, results arediscussed for the baseline model of Equation 8 including multiplicative country random effects. However, the results are rather similar with respect to shape and magnitude of the effects for all other specified models Results linear terms Table 6 summarizes the results of the linear linear covariates, the table shows the posterior mean together with the 95% credible interval. The only effect found to be insignificant was whether the household is lead by a female or male, all other linear covariates are found to be significant. Keep in mind, as mentioned earlier, a linear effect is assumed to be significant if zero is not included in the 95% credible interval. We begin with the covariate showing the largest effect, the indicator of whether in the household of observation i an older sibling died or not. Growing up in a household where an older sibling of observation i died is associated with a higher mortality risk 18

19 Model Tab. 5: Model selection: Differences DIC to baseline model of Equation 5 DIC additive model (AM) Equation 5 0 multilevel AM (MAM) Equation 7-1,960 MAM Equation 8-3,750 MAM Equation 7, including country random slopes -6,030 Equation 8, including country random slopes -7,410 MAM Equation 7, including multiplicative country random effects (MRE) -7,000 MAM Equation 8, including MRE -8,390 MAM Equation 8, including MRE, and interaction gender continent -8,320 MAM Equation 8, including MRE, and interaction urban continent -8,310 MAM Equation 8, including MRE, varying coefficient SSA C. ASIA S. ASIA -8,820 Source: DHS data; calculation by authors in the first 60 months, compared to a household where no such fatality occurred. The estimated posterior mean is which is in line with our expectations, as well as the literature, see for instance Hobcraft et al. (1985), Harttgen et al. (2016). The estimated posterior mean of of the gender of the child is in line with our expectations and significant. Male born children tend to have a slightly higher risk of dying in the first five years of living, compared to female born observations. This can also be illustrated by plotting the Nelson Aalen cumulative hazard for the individual waves. In Figure 2 the cumulative hazard rate is plotted separately for male and female observations from Asia, and for male and female observations Sub Saharan African countries. It can be seen that the hazard rates for girls is always slightly below the hazard rate for boys. Children living in urban areas, have higher survival probabilities compared to their counterparts in rural areas, the estimated posterior mean of is significant and negative, however of low magnitude. This estimates reveal that living in rural areas reflects that children below five years are confronted with a higher mortality risk. This gap between urban and rural areas might be best explained through generally poorer health status in rural areas and a poorer basic health infrastructure in rural areas (Harttgen & Günther 2011). This pattern and possible explanations are already results of previous research by the above mentioned authors. Although the estimated results for the gender of the the child, and the area of living should be interpreted with great care. Especially when large sample sizes are analyzed already relatively small effects tend to become significant. Both effects are significant but scarcely important, considering the usual predictor range of probit models between -2 and 2. It can be assumed that the effect size of the categorial covariates is not homogeneous across countries, and we included country random slopes for the categorial covariates. Figure 4 to Figure 6 shows the results of the random slopes of the categorial covariates. Especially, for the covariate deadsibling, the place of residence, and the children s gender 19

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Addressing Corner Solution Effect for Child Mortality Status Measure: An Application of Tobit Model

Addressing Corner Solution Effect for Child Mortality Status Measure: An Application of Tobit Model Addressing Corner Solution Effect for Child Mortality Status Measure: An Application of Tobit Model Hafiz M. Muddasar Jamil Shera & Irum Sajjad Dar College of Statistics and Actuarial Sciences, University

More information

Geoadditive Latent Variable Modelling of Child Morbidity and Malnutrition in Nigeria

Geoadditive Latent Variable Modelling of Child Morbidity and Malnutrition in Nigeria Ludwig Fahrmeir & Khaled Khatab Geoadditive Latent Variable Modelling of Child Morbidity and Malnutrition in Nigeria Technical Report Number 020, 2008 Department of Statistics University of Munich http://www.stat.uni-muenchen.de

More information

Marriage Institutions and Sibling Competition: Online Theory Appendix

Marriage Institutions and Sibling Competition: Online Theory Appendix Marriage Institutions and Sibling Competition: Online Theory Appendix The One-Daughter Problem Let V 1 (a) be the expected value of the daughter at age a. Let υ1 A (q, a) be the expected value after a

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

Structured Additive Regression Models: An R Interface to BayesX

Structured Additive Regression Models: An R Interface to BayesX Structured Additive Regression Models: An R Interface to BayesX Nikolaus Umlauf, Thomas Kneib, Stefan Lang, Achim Zeileis http://eeecon.uibk.ac.at/~umlauf/ Overview Introduction Structured Additive Regression

More information

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist

More information

Jun Tu. Department of Geography and Anthropology Kennesaw State University

Jun Tu. Department of Geography and Anthropology Kennesaw State University Examining Spatially Varying Relationships between Preterm Births and Ambient Air Pollution in Georgia using Geographically Weighted Logistic Regression Jun Tu Department of Geography and Anthropology Kennesaw

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

Market access and rural poverty in Tanzania

Market access and rural poverty in Tanzania Market access and rural poverty in Tanzania Nicholas Minot International Food Policy Research Institute 2033 K St. NW Washington, D.C., U.S.A. Phone: +1 202 862-8199 Email: n.minot@cgiar.org Contributed

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Measuring Poverty. Introduction

Measuring Poverty. Introduction Measuring Poverty Introduction To measure something, we need to provide answers to the following basic questions: 1. What are we going to measure? Poverty? So, what is poverty? 2. Who wants to measure

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Introduction to Linear Regression Analysis

Introduction to Linear Regression Analysis Introduction to Linear Regression Analysis Samuel Nocito Lecture 1 March 2nd, 2018 Econometrics: What is it? Interaction of economic theory, observed data and statistical methods. The science of testing

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Technical Track Session I: Causal Inference

Technical Track Session I: Causal Inference Impact Evaluation Technical Track Session I: Causal Inference Human Development Human Network Development Network Middle East and North Africa Region World Bank Institute Spanish Impact Evaluation Fund

More information

Multilevel modeling and panel data analysis in educational research (Case study: National examination data senior high school in West Java)

Multilevel modeling and panel data analysis in educational research (Case study: National examination data senior high school in West Java) Multilevel modeling and panel data analysis in educational research (Case study: National examination data senior high school in West Java) Pepi Zulvia, Anang Kurnia, and Agus M. Soleh Citation: AIP Conference

More information

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection Model comparison Patrick Breheny March 28 Patrick Breheny BST 760: Advanced Regression 1/25 Wells in Bangladesh In this lecture and the next, we will consider a data set involving modeling the decisions

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Exam Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) One important feature of the world's population with the most significant future implications

More information

Motorization in Asia: 14 countries and three metropolitan areas. Metin Senbil COE Researcher COE Seminar

Motorization in Asia: 14 countries and three metropolitan areas. Metin Senbil COE Researcher COE Seminar Motorization in Asia: 14 countries and three metropolitan areas Metin Senbil COE Researcher COE Seminar - 2006.10.20 1 Outline Background Motorization in Asia: 14 countries Kuala Lumpur, Manila, Jabotabek

More information

GIS in Locating and Explaining Conflict Hotspots in Nepal

GIS in Locating and Explaining Conflict Hotspots in Nepal GIS in Locating and Explaining Conflict Hotspots in Nepal Lila Kumar Khatiwada Notre Dame Initiative for Global Development 1 Outline Brief background Use of GIS in conflict study Data source Findings

More information

Unit 6: Development and Industrialization. Day 1: What is development?

Unit 6: Development and Industrialization. Day 1: What is development? Unit 6: Development and Industrialization Day 1: What is development? What is Development? The process of improving the material conditions of people through the diffusion of knowledge and technology More

More information

Working Papers in Economics and Statistics

Working Papers in Economics and Statistics University of Innsbruck Working Papers in Economics and Statistics Simultaneous probability statements for Bayesian P-splines Andreas Brezger and Stefan Lang 2007-08 University of Innsbruck Working Papers

More information

Measuring Disaster Risk for Urban areas in Asia-Pacific

Measuring Disaster Risk for Urban areas in Asia-Pacific Measuring Disaster Risk for Urban areas in Asia-Pacific Acknowledgement: Trevor Clifford, Intl Consultant 1 SDG 11 Make cities and human settlements inclusive, safe, resilient and sustainable 11.1: By

More information

Summary Article: Poverty from Encyclopedia of Geography

Summary Article: Poverty from Encyclopedia of Geography Topic Page: Poverty Definition: poverty from Dictionary of Energy Social Issues. the fact of being poor; the absence of wealth. A term with a wide range of interpretations depending on which markers of

More information

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial Jin Kyung Park International Vaccine Institute Min Woo Chae Seoul National University R. Leon Ochiai International

More information

A4. Methodology Annex: Sampling Design (2008) Methodology Annex: Sampling design 1

A4. Methodology Annex: Sampling Design (2008) Methodology Annex: Sampling design 1 A4. Methodology Annex: Sampling Design (2008) Methodology Annex: Sampling design 1 Introduction The evaluation strategy for the One Million Initiative is based on a panel survey. In a programme such as

More information

Where Do Overweight Women In Ghana Live? Answers From Exploratory Spatial Data Analysis

Where Do Overweight Women In Ghana Live? Answers From Exploratory Spatial Data Analysis Where Do Overweight Women In Ghana Live? Answers From Exploratory Spatial Data Analysis Abstract Recent findings in the health literature indicate that health outcomes including low birth weight, obesity

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

Indicators of sustainable development: framework and methodologies CSD Indicators of sustainable development 1996

Indicators of sustainable development: framework and methodologies CSD Indicators of sustainable development 1996 Indicators of sustainable development: framework and methodologies CSD Indicators of sustainable development 1996 Keywords: mountain areas, mountain development, natural resources management, sustainable

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

Technical Appendix C: Methods

Technical Appendix C: Methods Technical Appendix C: Methods As not all readers may be familiar with the multilevel analytical methods used in this study, a brief note helps to clarify the techniques. The general theory developed in

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

The Cultural Landscape: An Introduction to Human Geography, 10e (Rubenstein) Chapter 2 Population

The Cultural Landscape: An Introduction to Human Geography, 10e (Rubenstein) Chapter 2 Population The Cultural Landscape: An Introduction to Human Geography, 10e (Rubenstein) Chapter 2 Population 1) One important feature of the world's population with the most significant future implications is that

More information

Agro Ecological Malaria Linkages in Uganda, A Spatial Probit Model:

Agro Ecological Malaria Linkages in Uganda, A Spatial Probit Model: Agro Ecological Malaria Linkages in Uganda, A Spatial Probit Model: IFPRI Project Title: Environmental management options and delivery mechanisms to reduce malaria transmission in Uganda Spatial Probit

More information

Technical Appendix C: Methods. Multilevel Regression Models

Technical Appendix C: Methods. Multilevel Regression Models Technical Appendix C: Methods Multilevel Regression Models As not all readers may be familiar with the analytical methods used in this study, a brief note helps to clarify the techniques. The firewall

More information

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0 Introduction to Econometrics Midterm April 26, 2011 Name Student ID MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. (5,000 credit for each correct

More information

Departamento de Economía Universidad de Chile

Departamento de Economía Universidad de Chile Departamento de Economía Universidad de Chile GRADUATE COURSE SPATIAL ECONOMETRICS November 14, 16, 17, 20 and 21, 2017 Prof. Henk Folmer University of Groningen Objectives The main objective of the course

More information

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

A multivariate multilevel model for the analysis of TIMMS & PIRLS data A multivariate multilevel model for the analysis of TIMMS & PIRLS data European Congress of Methodology July 23-25, 2014 - Utrecht Leonardo Grilli 1, Fulvia Pennoni 2, Carla Rampichini 1, Isabella Romeo

More information

Fertility Transitions and Wealth in Comparative Perspective. Sarah Staveteig. Demographic and Health Surveys, Futures Institute PRELIMINARY DRAFT

Fertility Transitions and Wealth in Comparative Perspective. Sarah Staveteig. Demographic and Health Surveys, Futures Institute PRELIMINARY DRAFT Fertility Transitions and Wealth in Comparative Perspective Sarah Staveteig Demographic and Health Surveys, Futures Institute PRELIMINARY DRAFT April 2014 PAA 2014 Session 23: Fertility Transitions Sarah

More information

One Economist s Perspective on Some Important Estimation Issues

One Economist s Perspective on Some Important Estimation Issues One Economist s Perspective on Some Important Estimation Issues Jere R. Behrman W.R. Kenan Jr. Professor of Economics & Sociology University of Pennsylvania SRCD Seattle Preconference on Interventions

More information

IInfant mortality rate (IMR) is the number of deaths

IInfant mortality rate (IMR) is the number of deaths Proceedings of the World Congress on Engineering 217 Vol II, July 5-7, 217, London, U.K. Infant Mortality and Economic Growth: Modeling by Increasing Returns and Least Squares I. C. Demetriou and P. Tzitziris

More information

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics

More information

Income elasticity of human development in ASEAN countries

Income elasticity of human development in ASEAN countries The Empirical Econometrics and Quantitative Economics Letters ISSN 2286 7147 EEQEL all rights reserved Volume 2, Number 4 (December 2013), pp. 13-20. Income elasticity of human development in ASEAN countries

More information

BIOS 312: Precision of Statistical Inference

BIOS 312: Precision of Statistical Inference and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample

More information

A Meta-Analysis of the Urban Wage Premium

A Meta-Analysis of the Urban Wage Premium A Meta-Analysis of the Urban Wage Premium Ayoung Kim Dept. of Agricultural Economics, Purdue University kim1426@purdue.edu November 21, 2014 SHaPE seminar 2014 November 21, 2014 1 / 16 Urban Wage Premium

More information

VII APPROACHES IN SELECTING A CORE SET OF INDICATORS

VII APPROACHES IN SELECTING A CORE SET OF INDICATORS HANDBOOK ON RURAL HOUSEHOLDS LIVELIHOOD AND WELL-BEING VII APPROACHES IN SELECTING A CORE SET OF INDICATORS VII.1 Introduction In Chapters III to VI of this Handbook, and in associated annexes, numerous

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Field Course Descriptions

Field Course Descriptions Field Course Descriptions Ph.D. Field Requirements 12 credit hours with 6 credit hours in each of two fields selected from the following fields. Each class can count towards only one field. Course descriptions

More information

Chapter 9: Looking Beyond Poverty: The Development Continuum

Chapter 9: Looking Beyond Poverty: The Development Continuum Chapter 9: Looking Beyond Poverty: The Development Continuum Using measures such as Gross Domestic Product (GDP), Gross National Income (GNI), and more recently the Human Development Index (HDI), various

More information

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points EEP 118 / IAS 118 Elisabeth Sadoulet and Kelly Jones University of California at Berkeley Fall 2008 Introductory Applied Econometrics Final examination Scores add up to 125 points Your name: SID: 1 1.

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

A Joint Tour-Based Model of Vehicle Type Choice and Tour Length

A Joint Tour-Based Model of Vehicle Type Choice and Tour Length A Joint Tour-Based Model of Vehicle Type Choice and Tour Length Ram M. Pendyala School of Sustainable Engineering & the Built Environment Arizona State University Tempe, AZ Northwestern University, Evanston,

More information

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data Today s Class: Review of concepts in multivariate data Introduction to random intercepts Crossed random effects models

More information

Table B1. Full Sample Results OLS/Probit

Table B1. Full Sample Results OLS/Probit Table B1. Full Sample Results OLS/Probit School Propensity Score Fixed Effects Matching (1) (2) (3) (4) I. BMI: Levels School 0.351* 0.196* 0.180* 0.392* Breakfast (0.088) (0.054) (0.060) (0.119) School

More information

7.1 The Hazard and Survival Functions

7.1 The Hazard and Survival Functions Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence

More information

A STUDY OF HUMAN DEVELOPMENT APPROACH TO THE DEVELOPMENT OF NORTH EASTERN REGION OF INDIA

A STUDY OF HUMAN DEVELOPMENT APPROACH TO THE DEVELOPMENT OF NORTH EASTERN REGION OF INDIA ABSTRACT A STUDY OF HUMAN DEVELOPMENT APPROACH TO THE DEVELOPMENT OF NORTH EASTERN REGION OF INDIA Human development by emphasizing on capability approach differs crucially from the traditional approaches

More information

Has the Family Planning Policy Improved the Quality of the Chinese New. Generation? Yingyao Hu University of Texas at Austin

Has the Family Planning Policy Improved the Quality of the Chinese New. Generation? Yingyao Hu University of Texas at Austin Very preliminary and incomplete Has the Family Planning Policy Improved the Quality of the Chinese New Generation? Yingyao Hu University of Texas at Austin Zhong Zhao Institute for the Study of Labor (IZA)

More information

Summary prepared by Amie Gaye: UNDP Human Development Report Office

Summary prepared by Amie Gaye: UNDP Human Development Report Office Contribution to Beyond Gross Domestic Product (GDP) Name of the indicator/method: The Human Development Index (HDI) Summary prepared by Amie Gaye: UNDP Human Development Report Office Date: August, 2011

More information

Growth is good for health - in democracies

Growth is good for health - in democracies Growth is good for health - in democracies Andreas Kammerlander* Günther G. Schulze University of Freiburg, Germany Institute of Economics Department of International Economic Policy Platz der Alten Synagoge

More information

Table 1. Answers to income and consumption adequacy questions Percentage of responses: less than adequate more than adequate adequate Total income 68.7% 30.6% 0.7% Food consumption 46.6% 51.4% 2.0% Clothing

More information

CHAPTER 2: KEY ISSUE 1 Where Is the World s Population Distributed? p

CHAPTER 2: KEY ISSUE 1 Where Is the World s Population Distributed? p CHAPTER 2: KEY ISSUE 1 Where Is the World s Population Distributed? p. 45-49 Always keep your vocabulary packet out whenever you take notes. As the term comes up in the text, add to your examples for the

More information

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation NELS 88 Table 2.3 Adjusted odds ratios of eighth-grade students in 988 performing below basic levels of reading and mathematics in 988 and dropping out of school, 988 to 990, by basic demographics Variable

More information

Adjusting the HDI for Inequality: An Overview of Different Approaches, Data Issues, and Interpretations

Adjusting the HDI for Inequality: An Overview of Different Approaches, Data Issues, and Interpretations Adjusting the HDI for Inequality: An Overview of Different Approaches, Data Issues, and Interpretations Milorad Kovacevic Statistical Unit, HDRO HDRO Brown bag Seminar, September 28, 2009 1/33 OBJECTIVES:

More information

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX Well, it depends on where you're born: A practical application of geographically weighted regression to the study of infant mortality in the U.S. P. Johnelle Sparks and Corey S. Sparks 1 Introduction Infant

More information

Predicting bond returns using the output gap in expansions and recessions

Predicting bond returns using the output gap in expansions and recessions Erasmus university Rotterdam Erasmus school of economics Bachelor Thesis Quantitative finance Predicting bond returns using the output gap in expansions and recessions Author: Martijn Eertman Studentnumber:

More information

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical

More information

A General Multilevel Multistate Competing Risks Model for Event History Data, with. an Application to a Study of Contraceptive Use Dynamics

A General Multilevel Multistate Competing Risks Model for Event History Data, with. an Application to a Study of Contraceptive Use Dynamics A General Multilevel Multistate Competing Risks Model for Event History Data, with an Application to a Study of Contraceptive Use Dynamics Published in Journal of Statistical Modelling, 4(): 145-159. Fiona

More information

Technical Track Session I:

Technical Track Session I: Impact Evaluation Technical Track Session I: Click to edit Master title style Causal Inference Damien de Walque Amman, Jordan March 8-12, 2009 Click to edit Master subtitle style Human Development Human

More information

MN 400: Research Methods. CHAPTER 7 Sample Design

MN 400: Research Methods. CHAPTER 7 Sample Design MN 400: Research Methods CHAPTER 7 Sample Design 1 Some fundamental terminology Population the entire group of objects about which information is wanted Unit, object any individual member of the population

More information

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102 Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical

More information

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid Applied Economics Regression with a Binary Dependent Variable Department of Economics Universidad Carlos III de Madrid See Stock and Watson (chapter 11) 1 / 28 Binary Dependent Variables: What is Different?

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Ch 7: Dummy (binary, indicator) variables

Ch 7: Dummy (binary, indicator) variables Ch 7: Dummy (binary, indicator) variables :Examples Dummy variable are used to indicate the presence or absence of a characteristic. For example, define female i 1 if obs i is female 0 otherwise or male

More information

The Index of Human Insecurity

The Index of Human Insecurity The Index of Human Insecurity A Project of the Global Environmental Change and Human Security Program (GECHS) Steve Lonergan, Kent Gustavson, and Brian Carter Department of Geography University of Victoria

More information

Applied Microeconometrics (L5): Panel Data-Basics

Applied Microeconometrics (L5): Panel Data-Basics Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics

More information

C/W Qu: How is development measured? 13/6/12 Aim: To understand how development is typically measured/classified and the pros/cons of these

C/W Qu: How is development measured? 13/6/12 Aim: To understand how development is typically measured/classified and the pros/cons of these C/W Qu: How is development measured? 13/6/12 Aim: To understand how development is typically measured/classified and the pros/cons of these Starter: Comment on this image Did you spot these? Rubbish truck

More information

EMERGING MARKETS - Lecture 2: Methodology refresher

EMERGING MARKETS - Lecture 2: Methodology refresher EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different

More information

Exploring the Association Between Family Planning and Developing Telecommunications Infrastructure in Rural Peru

Exploring the Association Between Family Planning and Developing Telecommunications Infrastructure in Rural Peru Exploring the Association Between Family Planning and Developing Telecommunications Infrastructure in Rural Peru Heide Jackson, University of Wisconsin-Madison September 21, 2011 Abstract This paper explores

More information

Time-series small area estimation for unemployment based on a rotating panel survey

Time-series small area estimation for unemployment based on a rotating panel survey Discussion Paper Time-series small area estimation for unemployment based on a rotating panel survey The views expressed in this paper are those of the author and do not necessarily relect the policies

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Gibbs Sampling in Latent Variable Models #1

Gibbs Sampling in Latent Variable Models #1 Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor

More information

Apéndice 1: Figuras y Tablas del Marco Teórico

Apéndice 1: Figuras y Tablas del Marco Teórico Apéndice 1: Figuras y Tablas del Marco Teórico FIGURA A.1.1 Manufacture poles and manufacture regions Poles: Share of employment in manufacture at least 12% and population of 250,000 or more. Regions:

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Poverty, Inequality and Growth: Empirical Issues

Poverty, Inequality and Growth: Empirical Issues Poverty, Inequality and Growth: Empirical Issues Start with a SWF V (x 1,x 2,...,x N ). Axiomatic approaches are commen, and axioms often include 1. V is non-decreasing 2. V is symmetric (anonymous) 3.

More information

More on Roy Model of Self-Selection

More on Roy Model of Self-Selection V. J. Hotz Rev. May 26, 2007 More on Roy Model of Self-Selection Results drawn on Heckman and Sedlacek JPE, 1985 and Heckman and Honoré, Econometrica, 1986. Two-sector model in which: Agents are income

More information

Cornelia F.A. van Wesenbeeck Amsterdam Centre for World Food Studies, VU University, Amsterdam. Study for SWAC/OECD

Cornelia F.A. van Wesenbeeck Amsterdam Centre for World Food Studies, VU University, Amsterdam. Study for SWAC/OECD Cornelia F.A. van Wesenbeeck Amsterdam Centre for World Food Studies, VU University, Amsterdam Study for SWAC/OECD Policies to improve FNS require solid empirical base At least headcounts of people below/above

More information

Data Integration Model for Air Quality: A Hierarchical Approach to the Global Estimation of Exposures to Ambient Air Pollution

Data Integration Model for Air Quality: A Hierarchical Approach to the Global Estimation of Exposures to Ambient Air Pollution Data Integration Model for Air Quality: A Hierarchical Approach to the Global Estimation of Exposures to Ambient Air Pollution Matthew Thomas 9 th January 07 / 0 OUTLINE Introduction Previous methods for

More information

Impact Evaluation Technical Workshop:

Impact Evaluation Technical Workshop: Impact Evaluation Technical Workshop: Asian Development Bank Sept 1 3, 2014 Manila, Philippines Session 19(b) Quantile Treatment Effects I. Quantile Treatment Effects Most of the evaluation literature

More information

Income Distribution Dynamics with Endogenous Fertility. By Michael Kremer and Daniel Chen

Income Distribution Dynamics with Endogenous Fertility. By Michael Kremer and Daniel Chen Income Distribution Dynamics with Endogenous Fertility By Michael Kremer and Daniel Chen I. Introduction II. III. IV. Theory Empirical Evidence A More General Utility Function V. Conclusions Introduction

More information

Introduction to Development. Indicators and Models

Introduction to Development. Indicators and Models Introduction to Development Indicators and Models First World vs. Third World Refers to economic development Diversity and complexity of economy High per capita income Developed during the Cold War First

More information

Presentation by Thangavel Palanivel Senior Strategic Advisor and Chief Economist UNDP Regional Bureau for Asia-Pacific

Presentation by Thangavel Palanivel Senior Strategic Advisor and Chief Economist UNDP Regional Bureau for Asia-Pacific Presentation by Thangavel Palanivel Senior Strategic Advisor and Chief Economist UNDP Regional Bureau for Asia-Pacific The High-Level Euro-Asia Regional Meeting on Improving Cooperation on Transit, Trade

More information

The World Bank Health System Performance Reinforcement Project (P156679)

The World Bank Health System Performance Reinforcement Project (P156679) Public Disclosure Authorized AFRICA Cameroon Health, Nutrition & Population Global Practice IBRD/IDA Investment Project Financing FY 2016 Seq No: 3 ARCHIVED on 14-Apr-2017 ISR27518 Implementing Agencies:

More information

ESRI 2008 Health GIS Conference

ESRI 2008 Health GIS Conference ESRI 2008 Health GIS Conference An Exploration of Geographically Weighted Regression on Spatial Non- Stationarity and Principal Component Extraction of Determinative Information from Robust Datasets A

More information

GCSE 4231/01 GEOGRAPHY (Specification A) FOUNDATION TIER UNIT 1: Core Geography

GCSE 4231/01 GEOGRAPHY (Specification A) FOUNDATION TIER UNIT 1: Core Geography Surname Centre Number Candidate Number Other Names 0 GCSE 4231/01 GEOGRAPHY (Specification A) FOUNDATION TIER UNIT 1: Core Geography S16-4231-01 P.M. TUESDAY, 24 May 2016 1 hour 45 minutes For s use Question

More information