The Use of Random Geographic Cluster Sampling to Survey Pastoralists Kristen Himelein, World Bank Addis Ababa, Ethiopia January 23, 2013
Paper This presentation is based on the paper by Kristen Himelein (World Bank), Stephanie Eckman (Institute for Employment Research Germany), and Siobhan Murray (World Bank). Submitted to Journal of Official Statistics and available for comment upon request. 2
Note on language Throughout the presentation and the paper the word nomad is used to denote a person that does not have a fixed dwelling. It is not a pejorative term in English (though it can be translated as such into Amharic). It is used to differentiate for pastoralist which denotes someone owning / raising livestock, regardless of dwelling status. 3
Background In many parts of the world, livestock play integral role in livelihoods of vulnerable populations Act as main source of food and transportation Store of wealth Coping mechanism in response to shocks But populations most reliant are also most difficult to accurately measure due to their pastoralist / semi-pastoralist lifestyle Traditional census-based sampling frames may not be sufficient 4
Location: Afar, Ethiopia Afar region (NE Ethiopia) highly pastoralist More than 40 percent of respondents reported owning 10 or more cattle in last AgSS High density of camels, goats Bounded by national borders to the north and east, and by mountains to the west (and on all sides by ethnic differences) 5
Research Objectives Can we collect better data than standard census frame, using an alternative sampling frame and data collection field plan? Are we able to capture populations missed by a dwelling based sampling frame? How do our figures compare with other sources of information? 6
Location: Afar, Ethiopia Livestock movements vary by season, with different responses to changing conditions. As reflected in the map, general pattern of movement is toward river or water source (or Amhara border) during the dry season, out to pasture after rains start. Survey was intended to take advantage of congregation of herds around water sources during dry season. Map reproduced from Ahmed Endris thesis, 2007. 7
Random Geographic Cluster Sampling [RGCS] 1 st stage: select random geographic points 2 nd stage: survey all eligible respondents within given radius RGCS or similar designs used: Agricultural statistics agencies (such as USDA) Livestock studies in developing world (Soumarea et al, 2007; von Hagen, 2002) Surveys of forests (Winrock International) 8
Stratification (1) Strata Inputs: distance to water, vegetation index, land cover (towns and settled agriculture) Circle radius varies by strata. 9
Stratification (2) Stratum 1 (high likelihood) : towns Stratum 2 (almost no possibility) : settled agricultural areas / commercial farms Stratum 3 (high likelihood) : within 2 km of major river or swamps Stratum 4 (medium likelihood) : within 10 km of major river or swamps Stratum 5 (low likelihood) : all other land 10
Stratum 1 Likely (1 km radius) 11
12
Stratum 3 Unlikely (5 km radius) 13
14
Stratum 5 Towns (0.5 km radius) 15
Field Work Interviewers drove as close as possible Covered remainder on foot with GPS device Selected points pre-loaded Alarm indicated when interviewer inside radius Interview all eligible respondents within radius Only HHs with livestock eligible Livestock questions related to cattle, camels, goats Ownership, vaccination, theft, death, etc. 16
Implementation Challenges Early start to rainy season Field workers unaccustomed to technique Unexpected challenges Ethnic conflict / kidnapping Volcanoes River crossings Trouble with vehicles 17
Results of Data Collection (1) 125 circles canvassed 39% contained at least 1 HH with livestock 793 households with livestock interviewed Total livestock found per circle represented on map 18
Results of Data Collection (2) Stratum Description Selected Points Visited Circles Households in Circles Circles without Livestock 1 High likelihood: towns 10 10 69 4 2 Almost no possibility: settled agricultural areas / 15 14 113 8 commercial farms 3 High likelihood: within 2 km of major river or swamps 60 49 232 24 4 Medium likelihood: within 10 km of major river or swamps 30 22 188 6 5 Low likelihood: all land not in another stratum 10 7 191 1 Total 125 102 793 43 19
Results of Data Collection (3) Of 793 households, 25 reported having no members permanent dwelling. An additional 4 households have at least one member without a permanent dwelling. This givens a weighted estimate of approximately 5000 pastoralists individuals without a permanent dwelling. 20
Weight Development In its simplest form, the probability of selection for an RGCS would be the area selected (and visited) in a given stratum divided by the total area in the stratum. Pr( selection ) 2 cr totalarea where c is the number of points visited and r is the radius. 21
Weight Development In reality, however, many points are close enough to stratum boundaries to have been selected in multiple strata. The weights must account for the overlap as well. Stratum A r A Stratum B r B Household i Household j 22
Weight Development The weight formula becomes (formally): though generally a point is only selectable from one or two strata and therefore the other terms will drop out. For example, a point in stratum 1 that is selectable from both stratum 1 and 5 has a probability of selection equal to: 23
Coverage (1) Even those sites which have been selected may not have been fully covered. Interviewers must walk the circles to try to find all the eligible respondents. In every circle, there may be areas which the team was not able to observe. On the supervisor questionnaire, supervisors are asked to report what percentage they observed. 24
Coverage (2) In addition, Viewshed analysis can be used by overlaying all the tracks from the GPS device into a map. A computer program can then estimate how much of the circle a team was able to actually observe given the paths that they walked. 25
The white lines indicate the tracks walked by the teams. The light gray and brown are areas observable by the team. The black indicates areas not observable. Coverage (3) 26
Coverage (4) 27
Coverage (5) Comparison between supervisor reported and Viewshed predicted area covered. Correlation 58.4 28
Coverage (6) The weights may need to be adjusted depending on the Viewshed analysis. If the areas visited were exactly like the other areas within the circle, livestock would have been missed because the circles were not completely covered. The weights would need to be adjusted. If the areas were not visited because they were flooded or because the vegetation was too thick for either humans or livestock to pass, the uncovered areas would have zero livestock. The weights would not need to be adjusted. 29
Comparison of Means (1) RGCS (unadjusted weights) Cattle 10.5 (1.6) Camels 8.0 (1.5) Goats 20.2 (3.1) Mean (SE) RGCS (adjusted weights) 10.9 (1.8) 7.7 (1.4) 19.7 (3.0) ERSS 15.1 (3.2) 6.1 (1.9) 20.7 (2.7) 30
Comparison of Means (2) There are no statistically significant differences between the means between either RGCS (adjusted or unadjusted) and the ERSS means. The standard errors, and therefore the confidence intervals, are smaller with the RGCS because of the larger sample size. 31
Comparisons of Totals (1) RGCS (unadjusted weights) Cattle 156,779 (36,544) Camels 92,094 (27,413) Goats 568,608 (153,805) Total (SE) RGCS (adjusted weights) 189,495 (51,068) 139,532 (37,005) 816,871 (221,527) ERSS 1,025,298 (365,859) 214,270 (101,703) 2,143,533 (571,410) 32
Comparison of Totals (2) There are huge discrepancies between the total estimates of livestock between the RGCS estimates and the ERSS estimate. Two hypotheses are to the gap: Low effort among interviewer team led to missed circles and livestock within observed circles. Problems in the calculations of the ERSS weights substantially overestimate the totals. 33
Low Effort (1) significant findings not significant Selected site visited (logit model) Kilometers to main road - *** Kilometers to nearest locality, kilometers to river, relief Circle radius - * roughness, historical mean EVI value Number of households interviewed - conditional on visiting (OLS model) Kilometers to main road, kilometers to river, total rainfall in Kilometers to nearest locality - ** week prior to survey, current mean EVI value, historical mean EVI value, percentage of circle covered by teams Circle radius + ** Relief Roughness + * Percent of circle observed (OLS model) Circle radius - *** Kilometers to main road, kilometers to nearest locality, total rainfall in week prior to survey, relief roughness, current mean EVI value note: *** p<0.01, ** p<0.05, * p<0.1 34
Low Effort (2) In addition, substantial team fixed effects and oversight effect 35
Low Effort (3) The findings generally seem to support the low effort hypothesis at least for some teams. Difficult circles such as those farther from the main road or of larger size are less likely to be covered. Strong evidence of team effects Strong effect of oversight by survey coordinator. 36
ERSS weights(1) The second possible hypothesis for the difference between the RGCS and ERSS totals is a miscalculation in the ERSS weights. In the ERSS, the rural sample was selected from holders listing, and small towns from the household listing regardless of holder status. The rural sample picks 10 of the 12 from list of holders and 2 randomly from non-holders in rural areas (if available). Otherwise, all12 are chosen from holders. 37
ERSS weights(2) Under this design, in clusters where there are both holders and non-holders, the probabilities of selection for the two groups would be different, and we would see variation in the weight within clusters. In the Afar ERSS data, there is no variation within clusters even though not all respondents are livestock holders. 38
ERSS weights(2) This indicates the weights are almost certainly miscalculated and contributing to an upward bias to the total numbers seen in the comparison table. 39
Remaining questions Which set of totals (ERSS, RGCS adjusted, RGCS unadjusted) are most reasonable given auxiliary data? Are there other hypotheses that could explain the discrepancy? Can this methodology be adapted / differently implemented in Ethiopia (or other places) in the future? 40
Thank you. Contact: Kristen Himelein - khimelein@worldbank.org 41
Stratification 42
Table 1: Stratification of Afar Region Stratu m Description Radiu s (km) Points Selecte d Total area (km 2 ) Percent of total landscape 1 High likelihood: towns 0.1 10 33 <1% 2 Almost no possibility: settled 0.5 15 930 2% agricultural areas / commercial farms 3 High likelihood: within 2 km of major river or swamps 1 60 3,538 6% 4 Medium likelihood: within 10 km of major river or swamps 5 Low likelihood: all land not in another stratum 2 30 6,921 12% 5 10 45,152 80% Total 125 56,574 a 100% 43