The Geography of Social Change

The Geography of Social Change Alessandra Fogli Stefania Marcassa VERY PRELIMINARY DRAFT Abstract We investigate how and when social change arises. We use data on the spatial diffusion of the fertility transition across US counties to identify the contribution of coordination and learning in the emergence of a new family model. We provide several measures of local and global spatial correlation to establish the existence of a significant geographic pattern in the data. We propose a mechanism in which cultural assimilation is the engine of the fertility transition. Using Census data starting from 1850, we estimate the speed of fertility assimilation for the different ethnic groups to show that their process of convergence is a crucial channel to explain the aggregate decline of fertility rate over time and across space. JEL Classification: C33, J11, J13, N32, O15, R23 Keywords: Fertility rate, fertility transition, State Economic Areas. Corresponding author. Federal Reserve Bank of Minneapolis, Minneapolis, MN, US. Email: afogli00@gmail.com. Université de Cergy-Pontoise THEMA (UMR CNRS 8184), 33 boulevard du Port, 95011 Cergy- Pontoise cedex, FR. Email: stefania.marcassa@u-cergy.fr. 1

1 Introduction In this paper, we investigate the transition process of the fertility rate in the U.S., and estimate the effect of diffusion on geographic variations in fertility. We provide several measures of local and global spatial correlation to establish the existence of a significant geographic pattern in the data. Moreover, we use a spatial-diffusion model to assess the effect of diffusion in shaping fertility variation across about 400 state economic areas from 1870 to 1930. The variation in fertility levels and the fertility potential for each state economic area are measured. Fertility potential is a spatial-effects variable that summarizes each state economic area s geographic proximity to the influence of other high or low fertility areas. The empirical findings support a diffusionist model of fertility. Even when controlling for demographics and economic variables, fertility levels remain sensitive to fertility level of other SEAs, especially proximate ones. This is consistent with the operation of diffusion process like those described by Watkins and Coale (1986) and Tolnay (1995). That is, spatial similarity in fertility can result from the spread of fertility related knowledge, or from the diffusion of changing norms related to family size within marriage. Moreover, we explore the role of ethnic heterogeneity in the decline of the fertility rate over the last century in the US. At the beginning of the 1900s, immigrants were arriving from European countries characterized by very different fertility rates. In particular, some of these countries had not yet experienced the fertility transition and others had. The fertility rate of women in the former group decreased at double the rate of the rest of the population, accounting for a substantial part of the decline in the aggregate fertility rate. We propose a mechanism in which cultural assimilation is the engine of the fertility transition. The speed of cultural assimilation is affected by the geographical location and the density of the ethnic groups. Using Census data starting from 1850, we analyze about 500 state economic areas that differ in their initial fertility rates and in the degree of ethnic heterogeneity. We then estimate the speed of fertility assimilation for the different ethnic groups and find that their process of convergence is a crucial channel to explain the aggregate 2

decline of fertility rate over time and across space. 2 Data The data used throughout this paper is based on a 1% Integrated Public Use Microdata Series, produced by the Minnesota Population Center. 1 A measure close to fertility is Children Ever Born. The age-census combinations are summarized in Table 1. Table 1: Census Year and Age Group used for Each Cohort 2 Birth Cohort Cohort Label Age Census Year CEB St.Dev. Obs. 1845-1855 1850 45-54 1900 5.39 1.21 358 1855-1865 1860 45-54 1910 5.10 1.27 418 1865-1875 1870 35-44 1910 4.23 1.02 437 1875-1885 1880 55-64 1940 3.15 1.05 451 1885-1895 1890 45-54 1940 2.75 0.78 467 1895-1905 1900 45-54 1950 2.91 0.78 460 1905-1915 1910 35-44 1950 2.49 0.60 467 1915-1925 1920 55-64 1980 2.82 0.53 401 1925-1935 1930 45-54 1980 3.24 0.64 406 For the years 1900 to 1990 3 one of the questions asked on the census was, for each woman, how many children they had during their lifetime. This is probably the best source of actual fertility decisions made by women. Since the age of the respondent is 1 The data are available for download at http://usa.ipums.org/usa/ (King M. and M. (2004)). 2 The source is authors computations from IPUMS data. 3 We only use this variable up to 1980, as the geographic unit we consider, i.e. SEA, is available only up to 1980. To be precise, SEA is available up to 1950, missing in 1960, and can be constructed from CNTYGP97 and CNTYGP98 for 1970 and 1980. 3

12 10 8 6 4 2 0 also available, this allows us to obtain estimates of actual realized fertility for women by birth cohort. For the purpose of our analysis, this data is most useful for women who have completed their planned fertility. We define a cohort to be ten years of birth years. Data on some cohorts are available from multiple censuses. For each cohort, the mean number of children ever born (CEB) was derived by computing the average of CEB for all women in that cohort that are part of households (and not in different group quarters) when answering the census survey. Figure 1 shows the time series of the average CEB, standard deviation, and coefficient of variation by birth cohort. The coefficient of variation shows the increase in dispersion of the CEB from 1880 to 1910. Feuille1 Figure 1: Inverse U-Shaped Coefficient of Variation 6 Mean 0,35 5 Coeff.Var. 0,3 Mean/Std. Dev. 4 3 2 Std.Dev. 0,25 0,2 0,15 0,1 Coefficient of Variation 1 0,05 0 0 1850 1860 1870 1880 1890 1900 1910 1920 1930 Birth Cohort Source: Authors computations. We explore the geographic patterns of CEB using State Economic Area (SEA)- level U.S. data. The data source is still the Integrated Public Use Microdata Series, 4

produced by the Minnesota Population Center. SEAs are generally either single counties or groups of contiguous counties within the same state that had similar economic characteristics when they were originally defined, just prior to the 1950 census. SEA boundaries are based upon the economic characteristics of counties in 1950. Counties within a particular SEA may or may not have been as economically homogeneous in previous years. Even when this is not the case, SEA is still useful for consistently identifying geographical units smaller than states and larger than counties. The SEA is the smallest geographic unit that can be consistently identified over the century, as the county identifier stops in 1930. The county level dataset (see Haines (2004)) on the other hand has not information on children ever born, but only aggregate number of population by age group. The SEA identifiers are available up to 1950, and are not available for 1960. For 1970, we use the variable CNTYGP97 that identifies the 1970 county group, a geographic area with at least 250,000 residents. In particular, we make use of the list of counties included in the county groups provided by the IPUMS 4 and the detailed composition of the SEAs 5 to determine the county groups located in each SEA. We use a similar procedure for 1980. 6 There are 470 SEA in the U.S. After excluding Alaska, Hawaii, and Indian Territories, 467 SEAs remain. Furthermore, we restrict the sample to the SEAs that have at least 10 observations (or 10 women in the cohort considered), and we dropped missing CEB observations. 2.1 Empirical Analysis The geographic predictions of our theory are distinctive features: the decline of CEB started in few locations and gradually spread to nearby areas. The salient features of data are the following. The maps in the Appendix show the CEB for each SEA every decennial. Darker colors indicate higher levels of CEB. There are three salient features of the data. First, the level of CEB are not uniform. In 4 See http://usa.ipums.org/usa/volii/1970cgcct.shtml for more details. 5 See http://usa.ipums.org/usa/volii/seacodes.shtml for more details. 6 See http://usa.ipums.org/usa/volii/ctygrp.shtml for more details. 5

particular, Northeastern SEAs are characterized by an average CEB level that is about half of that of the Southern SEAs. Second, the changes in the CEB are not uniform. While some areas decreased their CEB dramatically from the Cohort of 1850 to the Cohort of 1880 (for example the Northeastern SEAs ), some Southeastern SEA s show persistent high fertility level. Third, there is spatial clustering. SEA s where the CEB is lower than 2 (or higher than 4) tend to be geographically close to other such SEAs. 2.2 Spatial Diffusion In this part, we provide empirical evidence of the spatial diffusion of the fertility decline. To quantify the spatial features of the data, we first compute the global and local spatial correlation of the CEB for each decade. The global spatial correlation measure is also known as Moran s I (Moran 1950) and summarizes spatial pattern over the entire study area. This measure can identify whether spatial structure (i.e. clustering, autocorrelation, uniformity) exists, but cannot identify where the clusters are, nor it quantifies how spatial dependency varies from one place to another. To overcome these limits, we also compute the local Moran s I which detects local spatial autocorrelation. It can be used to identify local clusters (regions where adjacent areas have similar values) or spatial outliers (areas distinct from their neighbors). The local Moran s I statistic decomposes global Moran s I into contributions for each SEA. The sum of the Moran s I for all observations is proportional to Moran s I, an indicator of global pattern. Thus, there can be two interpretations of local Moran s statistics, as indicators of local spatial clusters and as a diagnostic for outliers in global spatial patterns. All of the correlation measures are computed using a distance matrix for the 400 SEAs in the sample. Each element in the matrix is equal to 1 for SEAs that are within 80 miles from the centroid of the considered SEA. The matrix is symmetric and by convention, has all zeros on the main diagonal. Moreover, we transform it so to have row-sums of unity. This is referred to as standardized distance matrix. 6

The formula of the global Moran s I is: MoranI = n n n i=1 j=1 w ij(x i x)(x j x) S n 0 i=1 (x (1) i x) 2 where w ij is the weight between observation i and j, and S 0 is the sum of all w ij s: S 0 = n n w ij. (2) i=1 j=1 Figure 2: Global Moran s I 0.7 Global Moran I 0.65 0.6 0.55 0.5 0.45 0.4 0.35 1850 1860 1870 1880 1890 1900 1910 1920 1930 Birth Cohort Source: Authors computations. Figure 2 shows the pattern of the global Moran s I. First, it is highly significant in all of the decades indicating that adjacent observations are highly correlated. 7

Second, it shows a first increasing pattern in correspondence to the initial decline of the decline of CEB rate (see Figure 1 from cohort 1850 to cohort 1870), and a second increasing pattern in correspondence of the increase in CEB rate observed from the birth cohort of 1910. To identify the locations of clusters of points with values similar in magnitude and those clusters of points with very heterogeneous values, we compute the local Moran s I.7 The values for the 1870 birth cohort are shown in Figure 3. Figure 3: Local Moran s I - Cohort 1870 Legend sea1940_clustersoutliers13 LMiZScore < -2.58 Std. Dev. -2.58 - -1.96 Std. Dev. -1.96 - -1.65 Std. Dev. -1.65-1.65 Std. Dev. 1.65-1.96 Std. Dev. 1.96-2.58 Std. Dev. > 2.58 Std. Dev. sea1940 Source: Authors computations. The Local Moran s index can only be interpreted within the context of the com7 This is done using ArcGIS spatial statistics tools. 8

puted Z score. High Index values are associated with clusters of similar values; low index values are associated with spatial outliers. The Z scores are measures of standard deviation. 8 For example, a Z score of +2.5 is interpreted as +2.5 standard deviations away from the mean. Darker colors in Figure 3 are evidence of local non random clustering, or they identify the hot spots. We can see that they are concentrated in the Northeastern SEAs characterized by the lowest level of CEB rate, and in the Southern SEAs characterized by the highest level of CEB rate. Having established the existence of spatial correlation in the CEB rate levels, we compute a measure of geographic heterogeneity not attributable to observable economic features. That is, for each SEA i and cohort t we compute Moran s I- statistic for spatial correlation in the residuals of the following regression model: CEB it = β 1t + β 2t controls it + u it (3) As control variables, we use industrial composition, race, education level (or percentage of illiterate population), urbanization, income, and farm status. Summary statistics are reported in the Appendix. The control variables refers to the Census decennial in which women of the cohort considered were taking fertility decisions. We approximate that each woman took the decision on having the first child when she was twenty years old. For example, for women in the 1850 cohort, we use control variables from the Census decennial 1870. In Figure 4 we show the pattern of the Moran s I on the residuals. The measure is positive and significant, and provides support to the idea of spatial diffusion. The least-squares relationship in (3) ignores the spatial contiguity information, and to captures the strength of the potential interaction between locations we estimate a model of spatial autocorrelation (SAR) that would allow for this type of variation. The specification is the following: CEB it = ρw ij CEB jt + βcontrols it + u it (4) 8 The null hypothesis for pattern analysis essentially states that there is no pattern; the expected pattern is one of hypothetical random chance. The Z Score is a test of statistical significance that helps you decide whether or not to reject the null hypothesis. 9

where u it N(0, σ 2 I n ), W ij is the standardized distance matrix, and the controls are the same we use in 3. Moreover, the term in W ij CEB jt is equivalent to a weighted sum of the CEB rates at time t of the contiguous counties with all contiguous counties receiving the same weight. We estimate equation (4) on each cohort data including all the SEA for which we have complete information. The value of the coefficient ρ is shown in Figure 4, and its pattern is very similar to the one of the Moran s I on the residuals. Figure 4: Moran s I on Residuals 0.7 0.6 Moran I Moran I on residuals SAR 0.5 0.4 0.3 0.2 0.1 0 1850 1860 1870 1880 1890 1900 1910 1920 1930 Birth Cohort Source: Authors computations. 10

Table 2: The dependent variable is CEB VARIABLES (1) (2) (3) (4) (5) ρ 0.773*** 0.775*** 0.861*** 0.603*** (0.018) (0.034) (0.028) (0.068) λ 1.181*** 0.963*** (0.013) (0.004) φ -24.859*** (0.102) CEB t 1 0.078*** 0.068*** (0.014) (0.014) Ethnolinguistic Fractionalization Index % -0.471*** 0.674*** 0.677*** 0.663*** 0.464*** (0.121) (0.110) (0.115) (0.141) (0.131) Farm population % 0.541*** -0.602*** -0.537*** -0.517*** -0.566*** (0.066) (0.068) (0.069) (0.066) (0.064) Illiterates % 2.448*** 2.588*** 2.773*** 2.562*** 2.523*** (0.136) (0.203) (0.247) (0.220) (0.224) Occupational Income Score 0.007*** -0.008-0.013* -0.030*** -0.022*** (0.007) (0.007) (0.007) (0.011) (0.007) Women in Labor Force % -1.060** -0.064 0.079-1.000-0.553 (0.524) (0.595) (0.664) (0.630) (0.602) Men in Labor Force % -1.995*** -2.537*** -2.282*** -1.722*** -1.925*** (0.403) (0.606) (0.622) (0.323) (0.388) Manufacturing establishments -74.302*** -82.682*** -105.399*** -92.516*** -96.218*** (5.344) (6.398) (12.181) (9.776) (9.273) Constant 1.104*** 0.854*** 0.217 4.068*** (0.017) (0.034) (0.570) (0.297) Spatially Lagged Regressors: Urban population % 0.144 (0.420) White population % 0.779 (0.531) Illiterates % -1.988*** (0.473) Occupational Income Score -0.022 (0.057) Women in Labor Force % 0.644 (1.750) Men in Labor Force % -0.865 (2.675) Manufacturing establishments 74.826*** (18.555) Observations 3519 3519 3519 3519 3519 Number of sea 391 391 391 391 391 Standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1 11

2.3 Spatial and Temporal Diffusion In this section we add a temporal component to the the model we estimated above. That is, we consider the possible effect of the temporal transmission in addition to the effect of the spatial diffusion. Hence, we combine the temporal focus of the behavior over time within a given SEA, and the tendency for what happens in one SEA to influence what happens in others. The complete specification is the following: CEB it = ρceb it 1 + δw ij CEB jt 1 + βcontrols it + u it (5) where we allow for county-specific time fixed effect. 9 Table 3: The dependent variable is CEB VARIABLES (1) (2) CEB t 1 0.129*** 0.0551*** (0.0128) (0.0178) Potential CEB 0.793*** 0.320*** (0.0170) (0.0278) Urban population % 0.516*** (0.159) White population % 1.035*** (0.323) Illiterates % 2.175*** (0.225) Farm population % -0.124 (0.179) Occupational Income Score -0.000157 (0.000169) Manufacturing establishments -59.19*** (8.804) (0.0923) Constant 0.236*** 2.168*** (0.0366) (0.348) Observations 3071 3071 R 2 0.741 0.781 Number of sea 417 417 Standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1 9 Year fixed effects are always highly significant and we do not report the coefficients. 12

We estimate equation (5) using the complete (but unbalanced) panel. The summary statistics are in the Appendix. It includes SEAs over 9 cohorts (1850-1930) for a total of 3071 observations. The results are in Table 3. As we can see, the measure of potential CEB is statistically significant and positive even after controlling on all other variables. Moreover, the temporally lagged CEB rate is also highly significant in both of the specifications. 3 Concluding Remarks We document the spatial evolution of fertility in the U.S. We show that there is a geographic pattern in the decline of fertility rate. The fertility decline started in few locations and gradually spread to nearby areas. We do not draw inferences about the role of spatial diffusion in promoting fertility decline. It seems reasonable that cross-sectional effect of diffusion on fertility variation may reflect past influences of diffusion on fertility change. Also, it might be argued that the significant effect of diffusion identified in the analysis is due to the influence of some omitted variables. A possible social explanation that we plan to explore in the future focuses on the differences between in fertility between natives and immigrants at the turn of the century. Previous literature on fertility summarized in Watkins (1994) provides evidence of fertility behaviors that differed across ethnic groups. A comparison of the timing of fertility declines in Europe and the U.S. leads to expect that fertility would be higher among the foreign-born in the U.S. in 1910 (as their analysis focuses on that Census year) than among the native whites with native parents. Many of the foreign-born came from areas or social groups in which the long-term decline in the fertility in married couples had not yet begun at the time of their departure. Fertility decline was evident at the national level in Ireland, Poland, and Italy only after the turn of the century, and even later among groups that supplied many migrants at the turn of the century. Moreover, there are differences between ethnic groups. In particular, the fertility rate is higher among new immigrants from Eastern, Central, and Southern Europe, than among the old immigrants from Britain, Germany, and Scandinavia. The fertility decline occurred later in Eastern, 13

Central, and Southern Europe, as well as in Ireland, than in Northwestern Europe. Most of the immigrants from Eastern, Central, and Southern Europe were recent arrivals, and thus would have come before the onset of fertility decline at national level. 14

4 Appendix 4.1 Summary Statistics The source of the data is different decennials of King M. and M. (2004) for the variables CEB, White population %, Illiterates %, Urban population %, Farm population %, and Occupational Income Score. The source is Haines (2004) for the variable Manufacturing establishments. This variable is computed as the ratio of number of manufacturing and total population in each SEA, as no other data on industrial composition is available for the considered years. Cohort 1850 (Year 1870) Table 4: Summary statistics Mean Std. Dev. N CEB 5.423 1.177 412 Potential CEB 5.437 0.404 412 White population % 0.853 0.211 412 Illiterates % 0.257 0.226 412 Urban population % 0.173 0.281 412 Non Farm population % 0.532 0.217 412 Manufacturing establishments 0.006 0.004 412 Occupational Income Score 6.128 2.667 412 Cohort 1860 (Year 1880) CEB 5.148 1.228 439 Potential CEB 5.138 0.419 439 White population % 0.853 0.208 439 Illiterates % 0.204 0.194 439 Urban population % 0.195 0.229 439 Non Farm population % 0.513 0.213 439 Manufacturing establishments 0.004 0.003 439 Occupational Income Score 6.560 2.012 439 Cohort 1870 (Year 1890) CEB 4.265 1.032 448 Potential CEB 4.256 0.312 448 White population % 0.859 0.200 448 Illiterates % 0.170 0.163 448 15

Urban population % 0.266 0.258 448 Non Farm population % 0.534 0.208 448 Manufacturing establishments 0.004 0.003 448 Occupational Income Score 448 Cohort 1880 (Year 1900) CEB 3.104 0.899 419 Potential CEB 3.104 0.235 419 White population % 0.874 0.192 419 Illiterates % 0.125 0.131 419 Urban population % 0.299 0.26 419 Non Farm population % 0.551 0.229 419 Manufacturing establishments 0.006 0.003 419 Occupational Income Score 7.194 1.711 419 Cohort 1890 (Year 1910) CEB 2.735 0.776 450 Potential CEB 2.739 0.159 450 White population % 0.867 0.191 450 Illiterates % 0.095 0.096 450 Urban population % 0.301 0.254 450 Non Farm population % 0.601 0.22 450 Manufacturing establishments 0.004 0.002 450 Occupational Income Score 7.845 1.939 450 Cohort 1900 (Year 1920) CEB 2.908 0.755 457 Potential CEB 2.920 0.136 457 White population % 0.881 0.174 457 Illiterates % 0.069 0.071 457 Urban population % 0.386 0.27 457 Farm population % 0.606 0.232 457 Manufacturing establishments 0.002 0.001 457 Occupational Income Score 7.774 2.069 457 Cohort 1910 (Year 1930) CEB 2.479 0.578 462 Potential CEB 2.493 0.111 462 White population % 0.886 0.158 462 Illiterates % 0.047 0.05 462 Urban population % 0.413 0.262 462 Non Farm population % 0.656 0.22 462 Manufacturing establishments 0.001 0.001 462 16

Occupational Income Score 7.957 1.802 462 Cohort 1920 (Year 1940) CEB 2.844 0.339 417 Potential CEB 2.842 0.069 417 White population % 0.885 0.158 417 Illiterates % 0.144 0.043 417 Urban population % 0.418 0.243 417 Non Farm population % 0.667 0.204 417 Manufacturing establishments 0.001 0.001 417 Occupational Income Score 8.025 1.675 417 Cohort 1930 (Year 1950) CEB 3.205 0.338 416 Potential CEB 3.192 0.092 416 White population % 0.892 0.143 416 Illiterates % 0.041 0.007 416 Urban population % 0.481 0.241 416 Non Farm population % 0.728 0.184 416 Manufacturing establishments 0.001 0.001 416 Occupational Income Score 7.691 1.639 416 17

References Haines, M. R. (2004): HISTORICAL, DEMOGRAPHIC, ECONOMIC, AND SO- CIAL DATA: THE UNITED STATES, 1790-1970 [Computer file]. Inter-university Consortium for Political and Social Research [producer and distributor], Ann Arbor, MI. King M., Ruggles S., A. T. L. D., and S. M. (2004): Integrated Public Use Microdata Series, Current Population Survey: Version 2.0 [Machine-readable database], Discussion paper, Minneapolis, MN. Tolnay, S. E. (1995): The Spatial Diffusion of Fertility: A Cross-Sectional Analysis of Counties in the America South, 1940, American Sociological Review, 60. Watkins, S. C. (1994): After Ellis Island - Newcomoers and Natives in teh 1910 Census. Russell Sage Foundation, New York, NY. Watkins, S. C., and A. J. Coale (1986): The Decline of Fertility in Europe. Princeton University Press, Princeton, NJ. 18