Transferability of Household Travel Data Across Geographic Areas Using NHTS 2001 Jane Lin PhD Assistant Professor Department of Civil and Materials Engineering Institute for Environmental Science and Policy University of Illinois at Chicago Northwestern University Transportation Seminar May 04 2006 1
Background Household travel surveys are expensive instruments Data become outdated Small MPOs lack resources General approach of data transferability study involves data clustering and updating e.g. Greaves and Stopher (2000) Wilmot and Stopher (2001) Reuscher et al. (2002) Debates over model transferability have never stopped Spatial and temporal transferability Level of aggregation (e.g. Wilmot 1995) Model specification (e.g. Koppelman and Wilmot 1986) 2
Objective Focuses on data transferability To examine transferability of household travel survey data across geographic areas Impact of neighborhood type on household travel is of particular interest Urban versus suburban and rural Land use (e.g. density roads intersections) City low income versus suburban wealthy Hispanic versus White or Black etc. 3
Definition of transferability We define travel data transferability as the possibility of transferring household travel attributes across geographic areas as well as the trip generation models built on the transferred data which has the following form where y s m i β sm 0 Ks k 1 β smk smik y smi = ith dependent variable for household (i = 1 N sm ) in homogeneous group s (s = 1 S) and geographic area m (m = 1 M) smk = kth model coefficient of intercept (k=0) or household attribute (k = 1 2 K s ) x smik = kth continuous independent variable (household attribute) for household i in homogeneous group s and geographic area m smi = random error term for household i in homogeneous group s and geographic area m x ε smi (1) 4
Research questions Suppose we have already clustered households into homogeneous groups across geographic areas including the area of interest. The aggregated information at the group level is available for all of the areas let us call that background attributes associated with each group of the area. Can we associate the dependent variable y smi with the background attributes? Can we estimate coefficients smk s based on the area s background attributes given that individual household attributes are not available? If the predictions are accurate we say the travel data (and model coefficients smk s) are transferable. 5
In mathematical language β smk γ sk 0 Q q 1 γ skq w smkq υ smk (2) where w smkq = qth background attribute associated with coefficient smk (k = 1 2 K s ) of group s and geographic area m skq = qth weight for intercept (q = 0) or background attribute (q = 12 Q) smk = random error term associated with coefficient smk of group s and geographic area m 6
7 Final model Or in matrix form Y m = X m m + Z m m + m m = W m + m Y m = X m W m + Z m m + (X m m + m ) i m k k i m k m m j j i m j m k k i m q q k m q k k q q m q i m x z x w w y 0 0 0 0 00 m i j j m i j m K k k m i m k m i m ε z α x β β y s 1 0 k m Q q q k m q k k k m υ w γ γ β 1 0
This is a two-level hierarchical modeling problem. For any group s Level 2 Area 1 Area 2 Area M Level 1 Household 1 2 N 1 1 2 N 2 1 2 N M 8
Properties of random errors εm ~ N( 0 R) where R σ 2 I (7) υ m ~ N( 0 G) where G τ τ 00 K s τ 0 τ 0 k kk τ K s τ k 0K τ s K s K s (8) cov ( ε m υ m ) 0 (9) 9
Hypotheses Fixed effects ( kq and mj ) H 0 : kq = 0 and mj = 0 for mkq. That is there are no fixed effects of the covariates on the dependent variable if the null hypothesis is accepted. Random effects ( mk ) H 0 : mk = 0 for mk. If the null hypothesis is accepted there are no deviations from the fixed effects across all areas. We then say the household travel data (and the model coefficients) are transferable. Covariance components (R and G) H 0 : kl = 0 tests the goodness-of-fit of the hierarchical model. 10
Data National Household Travel Survey 2001 (NHTS 2001) 69817 household observations Five datasets of households persons travel day trips vehicles and long-distance trips Census Transportation Planning Package 2000 (CTPP 2000) Geographic levels: state county census tract block group MSA/CMSA and TAZ Three parts: place of residence place of work and journey to work 65315 census tracts 11
Define neighborhood types Clustering at the census tract level to define neighborhood types Data: CTPP 2000 Part I Sixty-four variables extracted Method: Two-step clustering method GIS spatial analysis (Lin and Long 2006 TRB CD-ROM) 12
13
Final ten neighborhood types 14
Distribution of neighborhood type Neighborhood type Number of Census Tracts % of Total Number of households % of Total Urban elite 2327 3.6 4279280 4.1 Urban/2 nd city poor non-hispanic 5263 8.1 5974185 5.7 Black dominant City low income primarily minority 2141 3.3 3394940 3.2 Suburban mid-income working class 10388 15.9 19056350 18.1 Suburban mid-age wealthy 8419 12.9 14666800 13.9 Suburban young 8876 13.6 15279555 14.5 Suburban retired 7469 11.4 13746620 13.0 Rural 14103 21.6 21779625 20.6 Natural Scenic 1537 2.4 1134052 1.1 Non-Black Hispanic dominant 4354 6.7 6226205 5.9 Valid Cases 64877 99.3 105537612 100 Excluded Cases 438 0.7 Total 65315 100.0 105537612 100 15
Visualization of ten neighborhood types 16
17
18
Clustering of 11 states at (a) CT and (b) BG level 19
Assign NHTS households to neighborhood types Neighborhood type Number of Households % of valid cases % of Total Urban elite 2388 3.4 3.4 Urban/2 nd city poor non-hispanic Black dominant 2403 3.5 3.4 City low income primarily minority 1994 2.9 2.9 Suburban mid-income working class 18531 26.8 26.5 Suburban mid-age wealthy 9304 13.4 13.3 Suburban young 10006 14.5 14.3 Suburban retired 8636 12.5 12.4 Rural 12977 18.7 18.6 Natural Scenic 981 1.4 1.4 Non-Black Hispanic dominant 2022 2.9 2.9 Valid Cases 69242 100.0 99.2 Excluded Cases 575 0.8 Total 69817 100.0 20
20 18 16 14 12 10 8 6 4 2 0 Number of trips per household 21 2+ adults no children 2+ adults retired no children One adult youngest child 0-5 One adult youngest child 6-15 One adult youngest child 16-21 2+ adult youngest child 0-5 2+ adults youngest child 6-15 2+ adults youngest child 16-21 urban elite Urban/2nd city poor non-hispanic Black dominant City low income primarily minority Suburban mid-income working class Suburban mid-age wealthy Suburban young Suburban retired Rural non-black Hispanic dominant Natural Scenic household trip rate One adult no children One adult retired no children
Household mode share #vehicles/ person Auto (%) Walk (%) Local transit (%) Bicycle (%) Urban elite 0.8 70.8 18.9 5.3 1.5 3.6 Urban/2 nd city poor non- 0.6 77.0 13.8 5.1 0.4 3.6 Hispanic Black dominant City low income primarily 0.3 33.2 38.4 22.4 0.8 5.0 minority Suburban mid-income 1.0 89.8 5.7 0.2 0.8 3.6 working class Suburban mid-age wealthy 1.0 87.7 7.5 0.6 0.8 3.5 Suburban young 0.9 87.4 8.4 1.0 0.9 2.3 Suburban retired 0.9 88.2 7.9 0.6 0.8 2.5 Rural 1.0 89.6 5.6 0.1 0.7 3.9 Other (%) Non-Black Hispanic 0.7 82.9 11.6 2.0 0.9 2.6 dominant Natural scenic 0.8 78.5 15.3 2.0 1.1 3.2 22
Travel time and distance Travel time (minutes) To work To School Shopping Trip distance (miles) Travel time (minutes) Trip distance (miles) Travel time (minutes) Trip distance (miles) Urban elite 25 8.1 18 3.7 14 3.9 Urban/2nd city poor 23 7.6 19 3.2 17 6.7 non-hispanic Black City low income 36 5.6 22 1.7 17 2.5 primarily minority Suburban midincome working class 25 14.8 17 5.9 15 7.5 Suburban mid-age 27 16.9 16 5.1 14 6.6 wealthy Suburban young 21 10.5 16 4.8 13 5.0 Suburban retired 21 10.4 16 5.2 13 6.0 Rural 22 14.7 19 7.4 16 8.6 non-black Hispanic 27 9.8 19 3.1 16 4.4 dominant Natural scenic 17 8.6 16 4.7 14 6.8 23
Hierarchical modeling using NHTS (1) 14 MSAs/CMSAs with population greater than 3 million NHTS MSA/CMSA code MSA/CMSA name 0520 Atlanta GA 1122 Boston-Worcester-Lawrence MA-NH-ME-CT 1602 Chicago-Gary-Kenosha IL-IN-WI 1922 Dallas-Fort Worth TX 2162 Detroit-Ann Arbor-Flint MI 3362 Houston-Galveston-Brazoria TX 4472 Los Angeles-Riverside-Orange County CA 4992 Miami-Fort Lauderdale FL 5602 New York- Northern New Jersey-Long Island NY-NJ-CT-PA 6162 Philadelphia-Wilmington-Atlantic City PA-NJ-DE-MD 6200 Phoenix-Mesa AZ 7362 San Francisco-Oakland-San Jose CA 7602 Seattle-Tacoma-Bremerton WA 8872 Washington-Baltimore DC-MD-VA-WV 24
Hierarchical modeling using NHTS (2) Dependent variable: root square household auto work trips Neighborhood background attributes Sixty-four variables Household variables Name of the variable HHSIZE HHVEHCNT DRVRCNT NUMKID NUMADLT LOWINC Household size Household number of vehicles Household number of drivers Household number of kids Household number of adults Description Household is a low income household (LOWINC=1 i.e. household income less than $45000 a year) otherwise 0 WHITE Household head is White (WHITE=1) otherwise 0 COLLEGE Household head has college degree (COLLEGE=1) otherwise 0 N_HOWN Household owns a house (N_HOWN=1) otherwise 0 25
Hierarchical modeling results Models were fitted across the 14 MSAs/CMSAs by neighborhood type Only three are presented here Suburban retired Urban elite Suburban wealthy 26
Fixed effects (1) yˆ m i ˆ 00 ˆ 0 q wm0 q ˆ k0 ˆ k q wm k q xm i k ˆ q k q j m j z m i j a. Suburban retired (obs = 759) -2 Res Log Likelihood = 1284.0 Effect Estimate Standard Error t value Pr > t Intercept 1.3343 0.04332 30.80 <0.0001 Housing density (w 1 ) -0.00004 0.000011-3.87 0.0001 Percent population with age 75 years and -0.3331 0.2129-1.56 0.1181 older (w 2 ) Number of kids (x 1 ) 0.03053 0.03817 0.80 0.4240 Housing density * number of kids (w 1 *x 1 ) 1.018E-6 0.000010 0.10 0.9216 Percent population with age 75 years and 0.5909 0.1551 3.81 0.0002 older*number of kids (w 2 *x 1 ) Low income (z 1 ) -0.07807 0.02852-2.74 0.0063 Owned house (z 2 ) 0.1005 0.02758 3.64 0.0003 27
Fixed effects (2) b. Urban elite (obs = 552) -2 Res Log Likelihood = 1434.2 Effect Estimate Standard Error t value Pr > t Intercept 0.9062 0.1394 6.50 <0.0001 Worker density (w 1 ) 0.000017 9.671E-6 1.79 0.0747 Intersection density (w 2 ) -0.00083 0.000341-2.43 0.0152 Average auto work trip travel time (w 3 ) -0.00231 0.004555-0.51 0.6129 Household size (x 1 ) 0.09148 0.04486 2.04 0.0419 Household number of vehicles (x 2 ) -0.01946 0.06705-0.29 0.7718 Worker density*household size (w 1 *x 1 ) 6.297E-6 3.35E-6 1.88 0.0607 Intersection density*household size (w 2 *x 1 ) 0.000203 0.000131 1.55 0.1213 Average auto work trip travel time*household -0.00419 0.001410-2.97 0.0031 size (w 3 *x 1 ) Worker density*household number of vehicles -0.00002 6.085E-6-3.16 0.0017 (w 1 *x 2 ) Intersection density*household number of 0.000418 0.000175 2.39 0.0172 vehicles (w 2 *x 2 ) Average auto work trip travel time*household number of vehicles (w 3 *x 2 ) 0.008392 0.002111 3.97 <0.0001 28
Fixed effects (3) c. Suburban mid-age wealthy (obs = 2669) -2 Res Log Likelihood = 5306.6 Effect Estimate Standard Error t value Pr > t Intercept 0.9333 0.06618 14.10 <0.0001 Average auto work trip travel time (w 1 ) 0.005130 0.002119 2.42 0.0156 Number of kids (x 1 ) 0.05595 0.02397 2.33 0.0197 Number of household vehicles (x 2 ) 0.1454 0.02648 5.49 <0.0001 Average auto work trip travel time*number of kids (w 1 * x 1 ) 0.000752 0.000683 1.10 0.2710 Average auto work trip travel time*number of household vehicles (w 1 * x 2 ) -0.00185 0.000855-2.16 0.0307 Low income (z 1 ) -0.07940 0.02160-3.68 0.0002 29
Random effects: suburban retired 30
Random effects: urban elite 31
Random effects: suburban wealthy 32
Conclusions We have showed that transferability can be formulated into a two-level random coefficient model and thus transferability can be statistically tested. Transferability is affected not only by the internal household characteristics but also the surrounding environment. Neighborhood approach allows us to test that The case study of transferability across the fourteen MSAs/CMSAs showed that in general the coefficient variability across geographic areas can be ignored. But there are exceptions. The model results also confirm the influence of neighborhood specific features on travel behavior of the households living within 33
Limitations We assembled a set of household and neighborhood variables available to us for the modeling Do they meet the minimum model specification? It is important to keep in mind when interpreting the model results that the model does not explain the causal relationships between the dependent variable and covariates. Only household auto work trip rates were examined. Different conclusions may be drawn for other measures such as mode share and number of shopping trips. 34
Acknowledgement This study is part of the transferability study funded by the Federal Highway Administration (FHWA). 35
Thank you! 36
Cluster Selection Criteria Number of Schwarz's Bayesian Clusters Criterion (BIC) BIC Change a 1 2879422.1 Ratio of BIC Changes b Ratio of Distance Measures c 2 2462489.4-416932.7 1.000 2.677 3 2307613.6-154875.8 0.371 1.155 4 2173654.9-133958.7 0.321 1.323 5 2072734.7-100920.2 0.242 1.428 6 2002463.1-70271.6 0.169 1.319 7 1949518.2-52944.9 0.127 1.244 8 1907246.5-42271.7 0.101 1.206 9 1872451.7-34794.8 0.083 1.401 10 1848029.9-24421.8 0.059 1.007 11 1823775.1-24254.8 0.058 1.054 12 1800835.1-22940.0 0.055 1.117 13 1780446.6-20388.5 0.049 1.164 14 1763124.7-17321.9 0.042 1.014 15 1746056.6-17068.1 0.041 1.032 16 1729567.5-16489.0 0.040 1.159 17 1715530.0-14037.5 0.034 1.297 18 1705034.1-10495.9 0.025 1.025 19 20 1694825.4 1684621.9-10208.7-10203.5 0.024 0.024 1.000 1.038 37
Travel time and distance Travel time (minutes) To work To School Shopping Trip distance (miles) Travel time (minutes) Trip distance (miles) Travel time (minutes) Trip distance (miles) Mean Std dev Mean Std dev Mean Std dev Mean Std dev Mean Std dev Mean Std dev Urban elite 25 24 8.1 19.3 18 14 3.7 5.9 14 17 3.9 6.5 Urban/2nd city poor non- Hispanic Black City low income primarily minority Suburban midincome working class 23 25 7.6 10.2 19 17 3.2 6.4 17 32 6.7 29.9 36 37 5.6 9.8 22 23 1.7 8.8 17 21 2.5 15.4 25 26 14.8 24.9 17 15 5.9 7.5 15 21 7.5 20.3 Suburban midage 27 28 16.9 75.0 16 14 5.1 7.4 14 18 6.6 15.9 wealthy Suburban young 21 19 10.5 14.1 16 15 4.8 8.4 13 14 5.0 13.5 Suburban retired 21 19 10.4 14.3 16 15 5.2 8.8 13 19 6.0 17.9 Rural 22 31 14.7 22.4 19 19 7.4 11.3 16 20 8.6 17.5 non-black Hispanic dominant 27 39 9.8 18.2 19 19 3.1 6.3 16 20 4.4 10.0 Natural scenic 17 18 8.6 13.3 16 19 4.7 13.5 14 18 6.8 19.2 38