Principal component analysis of urban traffic characteristics and meteorological data S.M. Shiva Nagendra, Mukesh Khare * Department of Civil Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110 016, India Abstract Principal component analysis (PCA) is used to analyze one-year traffic, emission and meteorological data for an urban intersection in the Delhi. The 1997 data include meteorological, traffic and emission variables. In urban intersections the complexities of site, traffic and meteorological characteristic may result in a high cross correlation among the variables. In such situations, PCA can provide an independent linear combination of the variables. Here it is used to analyze 1, 8 and 24 h average emission, traffic and meteorological data. It shows that four principal components for the 24 h average have the highest loadings for traffic and emission variables with a strong correlation between them. PC loadings for the 1 and 8 h data indicate the least variation among them. Keywords: Factor analysis; Data reduction; Air quality control region; Autocorrelation 1. Introduction Statistical analysis the air quality data vary from simple contingency tables through regression and multiple regression models to time series techniques (Benarie, 1980). Merz et al. (1972), Chock et al. (1975) and Roch and Pellerin (1982), however, report substantial evidence of a high autocorrelation between the series of concentration of atmospheric pollutants and weather data. 1 As a result, atmospheric scientists often prefer linear stochastic (Box-Jenkins) models
286 S.M. Shiva Nagendra, M. Khare / Transportation Research Part D 8 (2003) 285-297 (Chock et al., 1975). However, there were disadvantages of this approach e.g., the values as well as the statistical significance of the intervention or transfer function coefficients are sensitive to the choice of univariate model. In addition, Box-Jenkins models are devoid of parsimony, important when a large multivariate data set is used, when they include deterministic trends (Milionis and Davis, 1994). Henry and Hidy (1979) recognized the advantages of principal component analysis (PCA) techniques for data reduction and interpretation in these circumstances. They applied it to identify representative independent variables for regression modelling of large meteorological and air quality data of Los Angles and New York. Roscoe et al. (1982) used the technique to identify the types of error and interrelations present in large monitored air quality and meteorological data sets and Poissant et al. (1996) it when interpreting air quality and meteorological data from a rural site in Canada. Statheropoulos et al. (1998) identified that air pollution in the city of Athens is highly correlated with humidity and low wind speeds. Here, one year meteorological and traffic data, recorded at a air quality control region (AQCR) in the Delhi, is analyzed using PCA to determine underlying components, physical interpretations and interrelationships among traffic, and emission and meteorological variables. Principal components (PCs) for 1, 8 and 24 h time intervals are extracted from a large data set, comprising traffic, emission and meteorology information. 2. Methodology Fig. 1 shows the location of the AQCR, adjacent to the kerbside of the highly trafficked Bhadur Shah Zafar Marg, known as the income tax office intersection (ITO intersection). This intersection has a number of governmental and non-governmental office buildings along with reputed educational institutions adjacent to it. The site has a complex road geometry involving intense activity where air quality has been graded as 'worst' in the Delhi city (Sharma, 1998). The meteorological data (i.e., hourly observations of cloud cover, pressure, mixing height, sunshine hours, visibility, temperature, humidity, rainfall, wind speed and direction) were obtained from the Indian Meteorological Department, New Delhi for January 1997-December 1997. The Pasquill-Gifford stability scheme is used to determine hourly stability classes (Schnelle and Dey, 2000) see Table 1. Hourly traffic data were collected from Central Road Research Institute, New Delhi, and vehicles classified as; two wheeler, three wheeler, four wheeler gasoline powered and four wheeler diesel powered. Emission factors developed by the Indian Petroleum Corporation (Pundir et al., 1994) are used to estimate CO and NO 2 sources. Table 2 provides hourly maximum, minimum and daily mean monthly traffic volumes and Fig. 2 shows the diurnal and weekly cycles of traffic and emission variables at ITO intersection. The non-cyclic wind direction variable is converted into a cyclic variable using; Wind direction = 1 þ sinðh þ P=4Þ ð1þ where, '0' is the wind direction expressed in radians (Ziomass et al., 1995). Rainfall data has not been included. All the variables are standardized to take into account the equal importance of respective variables (Henry and Hidy, 1979).
S.M. Shiva Nagendra, M. Khare / Transportation Research Part D 8 (2003) 285-297 287 O Maulana Azad Medical College Fig. 1. Location of the AQCR. 3. Principle component analysis PCA is a descriptive tool that reduces the dimensionality of a number of interrelated variables, while retaining the maximum variability present in the data. This is done by transforming the data into a new set of orthogonal variables (PCs) arranged in decreasing order of importance and that can be computed from covarinace or correlation matrices. A PC represents nothing more than a pattern association and once determined, that pattern is removed from the data set. The residual data is re-examined to determine if any remaining association among variables exists. Here, PCs are determined from the Spearmen correlation matrix, because this does not require a normal distribution of the data (Poissant et al., 1996). A set of intercorrelated variables is transformed into a set of independent uncorrelated variables by means of orthogonal transformation. If the variables are independent then the correlation matrix becomes an identity (diagonal) matrix. The
288 S.M. Shiva Nagendra, M. Khare / Transportation Research Part D 8 (2003) 285-297 Table 1 Seasonal daily mean values of meteorological parameters, 1st January-31st December 1997 Sl. no. 1 2 3 4 5 6 7 8 9 10 11 a Indian Meteorological Department synoptic codes for representing visibility. Code 90 91 92 93 94 95 96 97 98 99 Parameter Cloud cover, okta Humidity, % Pasquill stability category (A-F: 1-6) Pressure, mba Rainfall, mm/day Sunshine hours, h/day Temperature, C 8Visibility a Wind direction, degree Wind speed, m s" 1 Mixing height, m Visibility <50 m 50 m 200 m 500 m 1000 m 2000 m 4000 m 10 kms 20 kms Ss 50 kms Annual average 3.28 66.79 3.8 484.5 1.9 5.4 23.6 94.76 132.21 1.3 278.7 Winter 2.8 71.4 3.9 490.5 0.84 3.7 16.2 94 118.8 1.0 281.1 Summer 2.68 48 3.7 480.1 2.23 8.2 29.5 95.4 172.0 1.8 279.7 Monsoon 5.1 73.4 3.5 476.3 4.12 4.8 30.2 95.3 145.8 1.6 299.8 Post monsoon 3.0 77.8 3.8 487.3 1.07 6.3 25.4 95.1 112.5 0.9 233.8 procedure is to diagonalize the correlation matrix (Hair et al., 1987) by finding the eigen values and eigen vectors of the correlation matrix. The PC associated with largest eigen value is called the first PC (PC1) and represents the linear combination of the variables accounting for the maximum total variability in the data. The second PC (PC2) explains a maximum of the variability, which is not accounted for by (PC1) and so on. Varimax rotation methods distribute the PC loadings such that their distribution is maximized by minimizing the number of large and small coefficients (Richman, 1986). This helps in adjusting the PC axes to achieve a pragmatically more meaningful data interpretation. PCs with an eigen values greater than one, are retained (Eder, 1989). Eq. 2 gives the estimation procedure for the PC scores (Verbeke et al., 1984). m 4=1 ikx kj; where PC i j is the PC score for the jth object on the ith component, w ik is the loading of the kth variable on the ith component, and x kj is the standardized value of the kth variable for the jth observation. ð2þ
Table 2 Hourly maximum, minimum and daily mean monthly traffic counts, 1st January-31st December 1997 Traffic flow, ITO intersection k. Statistics January February March April May June July August September October November December Maximum Minimum Average 14,038 326 1 : 2585 12,660 295 1 : 135 14,329 332 1 : 285 13,155 305 1 : 18 15,134 352 1 : 357 15,6 376 1 : 453 14,60516 372 1 : 4385 13,15 340 1 : 31 16,224 376 1 : 455 15,906 369 1 : 4265 15,642 364 1 : 403 13,305 310 B 1 : 193 xlo 5
290 S.M. Shiva Nagendra, M. Khare / Transportation Research Part D 8 (2003) 285-297 Time (hour) Time (hour) Time (hour) Time (hour) Fig. 2. Diurnal and weekly cycles of traffic and emission variables during the period ruary 1997. January to 2nd Feb-
S.M. Shiva Nagendra, M. Khare / Transportation Research Part D 8 (2003) 285-297 291 4. Results and discussion Three types of analysis are used to assess the behavior of meteorological, emission and traffic characteristics over the range of averaging time periods following ambient air quality standards. Table 3 shows the results of PCA upon the hourly mean of the meteorological, traffic and emission variables from 27th January to 2nd February 1997. Four PCs with eigen value greater than 1 are extracted and account for 83% of the variance. Parameters corresponding to maximum loadings are selected for further analysis (Jolliffe, 1986). PC1 indicates significant correlation among traffic and emission variables, which may be attributed to their matching diurnal cycles (Fig. 2) resulting in high PC loadings. PC2 shows a high correlation among meteorological variables i.e., mixing height, Pasquill-Gifford stability, sunshine hour, wind speed and direction. This may be due to their matching diurnal behavior elevated during daytime and depleted during nighttime (Fig. 3). PC3 correlates humidity, temperature and visibility with high loadings, possibly due to marginal variations in their diurnal behavior. PC4 shows a strong association between cloud cover and pressure, which are have constant diurnal values. Table 4 gives results of PCA based on 8-h mean values of meteorological and traffic variables between 14th January and 14th March 1997. Four PCs are extracted accounting for 84.3% of the total variance. These show similar correlations among variables as observed in 1-h PCA. Hence, it may be concluded that for short-term averaging periods, the PCA results will provide similar Table 3 PCA of hourly mean meteorological and traffic characteristics, 27th January-2nd February 1997 Sl. no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Parameters Eigen value Percentage iof total variance Cumulative percentage Cloud cover Humidity Mixing height Pressure Pasquill stability Sunshine hours Temperature Visibility Wind direction Wind speed Two wheeler Three wheeler Four wheeler (gasoline powered) Four wheeler (diesel powered) Source strength for CO Source strength for NO2 Principal components PC1 )0.056 )0.559 0.291 0.0986 )0.43 0.241 0.534 0.09 )0.0341 0.156 0.896 0.892 0.932 0.917 0.948 0.95 6.084 38.03 38.03 PC2 0.019 )0.369 0.693 0.046 )0.786 0.724 0.424 )0.0106 )0.66 0.8 0.233 0.255 0.153 0.216 0.207 0.211 3.293 20.58 58.61 PC3 )0.263 )0.636 0.242 )0.381 0.053 )0.077 0.631 0.917 )0.0606 0.255 0.116 0.295 0.229 )0.0589 0.188 0.0117 2.185 13.66 72.27 PC4 0.836 )0.058 0.254 )0.725 )0.0197 )0.270 0.2 )0.0486 0.517 0.0636 )0.182 )0.0709 )0.0573 0.0429 )0.0876 0.0155 1.729 10.81 83.08
292 S.M. Shiva Nagendra, M. Khare / Transportation Research Part D 8 (2003) 285-297 J Time (hour) 1 (h) 4 " \ (i) Time (hour) Fig. 3. Diurnal and weekly cycles of meteorological variables, 27th January-2nd February, 1997: (a) mixing height, (b) Pasquill-Gifford stability, (c) sunshine hours, (d) wind speed, (e) wind direction, (f) humidity, (g) temperature, (h) visibility, (i) cloud cover, (j) pressure.
S.M. Shiva Nagendra, M. Khare / Transportation Research Part D 8 (2003) 285-297 293 Table 4 PCA of 8-hourly mean meteorological and traffic characteristics, 14th January-14th March 1997 Sl. no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Parameters Eigen value Percentage < of total variance Cumulative percentage Cloud cover Humidity Mixing height Pressure Pasquill stability Sunshine hours Temperature Visibility Wind direction Wind speed Two wheeler Three wheeler Four wheeler (gasoline powered) Four wheeler (diesel powered) Source strength for CO Source strength for NO2 Principal components PC1 0.208 )0.523 0.564 0.128 )0.512 0.129 0.456 0.088 )0.23 0.234 0.939 0.956 0.954 0.886 0.953 0.914 6.48 40.55 40.55 PC2 )0.0444 )0.218 0.652 0.172 )0.774 0.897 0.176 0.0968 )0.58 0.669 0.225 0.192 0.181 0.374 0.218 0.324 3.15 19.65 60.2 PC3 )0.137 )0.748 0.147 )0.33 0.0836 0.0196 0.678 0.933 )0.245 0.304 0.102 0.19 0.212 0.16 0.187 0.167 2.38 14.86 75.06 PC4 0.806 0.07 0.184 )0.677 )0.207 )0.154 0.354 )0.0027 0.32 )0.164 )0.031 0.0154 0.0234 0.0809 0.0199 0.0664 1.48 9.26 84.32 Table 5 PCA of daily mean meteorological and traffic characteristics, 1st January-31st December, 1997 Sl. no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Parameters Cloud cover Humidity Mixing height Pressure Pasquill stability Sunshine hours Temperature Visibility Wind direction Wind speed Two wheeler Three wheeler Four wheeler (gasoline powered) Four wheeler (diesel powered) Source strength for CO Source strength for NO 2 Eigen value Percentage of total variance Cumulative percentage Principal components PC1 0.078 0.104 0.036 )0.15 )0.101 )0.049 0.23 0.15 )0.044 )0.074 0.955 0.993 0.99 0.906 0.991 0.953 5.72 35.78 35.78 PC2 0.468 )0.189 0.456 )0.816 )0.761 0.17 0.864 0.716 )0.057 0.218 )0.054 0.093 0.126 0.288 0.066 0.233 3.22 20.09 55.87 PC3 )0.801 )0.753 0.535 )0.0628 0.223 0.831 0.284 0.388 )0.229 0.127 )0.012 )0.028 )0.032 )0.052 )0.025 )0.046 2.55 15.91 71.78 PC4 0.065 )0.364 0.0713 )0.292 0.149 0.218 0.16 0.346 )0.829 0.868 0.069 0.007 )0.0064 )0.073 0.0199 )0.048 1.89 11.86 83.64
294 S.M. Shiva Nagendra, M. Khare / Transportation Research Part D 8 (2003) 285-297 o >- rj <*) 7 ui ( s i o o ) 0 ^ n p ) T U J U ) s c o o ) 0 ' - n 0 7 i o f l lime (day) Fig. 4. Daily average time series of meteorological variables, 1st January-31st December 1997: (a) temperature, (b) pressure, (c) Pasquill-Gifford stability, (d) visibility. correlations among variables. Table 5 shows results based on daily mean values of traffic and meteorological variables from January to December 1997. Four PCs are identified accounting for 83.6% of the variance. PC1 predominantly measures association among the traffic and emission variables. This may be due to their marginal seasonal variations as shown in Table 2. PC2 shows a significant loading for pressure, Pasquill-Gifford stability, temperature and visibility, indicating a strong correlation among these variables. The elements in Fig. 4 also supports the PCA results e.g., during the summer (corresponding to 170th day) temperatures are at a maximum, the pressure is at a minimum, the atmospheric stability is highly unstable and the visibility, on the basis visual observation technique, is 96 km. Further, there is correlation between sunshine hours, humidity, cloud cover and mixing height, as is evident from the loadings in PC3. Fig. 5 shows a typical relationship among them. PC4 shows a strong association between wind speed and direction, supporting a normal meteorological behavior of wind patterns i.e. with increase in variations of wind speed, fluctuations in wind direction also increase (Fig. 6).
S.M. Shiva Nagendra, M. Khare / Transportation Research Part D 8 (2003) 285-297 295 ooooooooooooooooooooo Fig. 5. Daily average time series of meteorological variables, 1st January-31st December 1997: (a) sunshine hours, (b) mixing height, (c) cloud cover, (d) humidity. Fig. 6. Daily average time series of meteorological variables, 1st January-31st December 1997: (a) wind speed, (b) wind direction.
296 S.M. Shiva Nagendra, M. Khare / Transportation Research Part D 8 (2003) 285-297 5. Conclusions The study shows the usefulness of PCA as tool for analyzing a large multivariate data set. It concludes that for short-term averaging periods, PCA provides similar correlations among the variables. The analysis also reveals the domination of traffic and emission variables in the first PC, irrespective of time averaging periods; while the remaining PCs have found to be significantly loaded by meteorological variables. Further, we find a weak correlation of traffic and emission variables with meteorological variables for all the three time averaging periods. Acknowledgements We thank the Indian Meteorological Department and Central Road Research Institute, New Delhi for providing meteorological and traffic data. References Benarie, M.M., 1980. Urban Air Pollution Modelling. MIT Press, London. Chock, D.P., Terrel, T.R., Levitt, S.B., 1975. Time series analysis of Riverside, California, air quality data. Atmospheric Environment 20, 989-993. Eder, B.K., 1989. A principal component analysis of SO4 precipitation concentrations over the eastern United States. Atmospheric Environment 23, 2739-2750. Hair, J.F., Anderson, R.E., Tatham, R.L., 1987. Multivariate Data Analysis, second ed. Macmillan Publishing Company, New York. Henry, R.C., Hidy, G.M., 1979. Multivariate analysis of particulate sulfate and other air quality variables by principal components part 1. Annual data from Los Angeles and New York. Atmospheric Environment 13, 1581-1596. Jolliffe, I.T., 1986. Principal Component Analysis. Springer, New York. Merz, P.H., Painter, H.J., Ryason, P.R., 1972. Aerometric data analysis-time series analysis and forecast and an atmospheric smog diagram. Atmospheric Environment 6, 319-342. Milionis, A.E., Davis, T.D., 1994. Regression and stochastic models for air pollution I: review comments and suggestion. Atmospheric Environment 28 (17), 2801-2810. Poissant, L., Bottenheim, J.W., Roussel, P., Reid, N.W., Niki, H., 1996. Multivariate analysis of a 1992 sontos data subset. Atmospheric Environment 30 (12), 2133-2144. Pundir, P.P., Jain, A.K., Gogia, D.K., 1994. Vehicle Emissions and Control Perspective in India. Indian Institute of Petroleum, Dehradun, India. Richman, M.B., 1986. Rotation of principal components. Journal of Climatology 6, 293-335. Roch, R., Pellerin, J., 1982. On long term air quality trends and intervention analysis. Atmospheric Environment 16, 161-169. Roscoe, B.A., Hopke, P.K., Dattner, S.L., Jenks, J.M., 1982. The use of principal component factor analysis to interpret particle compositional data sets. Journal of Air Pollution Control Association 32 (6), 637-642. Schnelle, K.B., Dey, P.R., 2000. Atmospheric Dispersion Modelling Compliance Guide. McGraw Hill Inc., New York. Sharma, P., 1998. Air quality modelling for an urban intersection of Delhi city. Ph.D. thesis, Department of Civil Engineering, Indian Institute of Technology Delhi, India. Statheropoulos, M., Vassiliadis, N., Pappa, A., 1998. Principal component and canonical correlation analysis for examining air pollution and meteorological data. Atmospheric Environment 36 (6), 1087-1095.
S.M. Shiva Nagendra, M. Khare / Transportation Research Part D 8 (2003) 285-297 297 Verbeke, J.S., Hartog, J.C.D., Dekker, W.H., Coomans, D., Buydens, L., Massart, D.L., 1984. The use of principal components analysis for the investigation of an organic air pollutants data set. Atmospheric Environment 18 (11), 2471-2478. Ziomass, I.C., Melas, D., Zerefas, C.S., Bais, A.F., 1995. Forecasting peak pollutant levels for meteorological variables. Atmospheric Environment 29, 3703-3711.