Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment of Civil and Environmental Engineering, The University of Michigan, 35, Beal Ave., Ann Arbor, Michigan, 489-225, U.S.A., hirotaka@engin.umich.edu, (2) Geohydrology Deartment, Sandia National Laboratories, PO Box 58 MS 735, Albuquerque, NM, 8785-735, U.S.A. Abstract This aer resents a methodology that combines logistic regression with kriging for incororating exhaustive secondary information into the maing of the risk of occurrence of unexloded ordnance (UXO). Logistic regression, which is aroriate for binary data (indicators) analysis, is used to derive the trend comonent in simle kriging with varying local means (). The technique is illustrated using two tyes of information: ) hard indicators samled along transects on a hyothetical UXO site generated using a doubly stochastic Poisson rocess, 2) exhaustive soft information obtained through the rocessing of a series of realizations generated by the same oint rocess. After risks are maed, ixels are flagged for further investigation if the estimated robability exceeds a given threshold. This classification is used to comare the erformance of the roosed technique with traditional cokriging (collocated cokriging). Fewer misclassifications and smaller false ositive rates are obtained for derived from logistic regression. The roortion of false negative is below 5% for both techniques. 2. Introduction Maing the risk of occurrence of unexloded ordnance (UXO) at any military sites is imortant esecially as these sites are reared for return to the ublic sector. Efficient and recise site characterization is necessary. In lace of classical statistical aroaches, geostatistical techniques should be referred because of their ability to take into account satial correlation and many different kinds of ancillary information. It is well recognized that site characterization imroves esecially when the rimary variable, which is often samled sarsely because of cost and time constraints, is sulemented with abundant (exhaustive) additional information (GOOVAERTS 2). A number of geostatistical techniques have been develoed to incororate exhaustive secondary data (GOOVAERTS 997). Among available algorithms, simle kriging with varying local means () rovides flexibility in trend modeling and mathematical simlicity. The basic idea of is the combination of deterministic trend modeling with geostatistical interolation of residuals. Residuals are usually modeled using a stationary random function so that simle kriging can be alied. The remaining question is then what kind of deterministic function should be used for the trend modeling. Linear regression is straightforward but is not aroriate if the rimary data are binary indicators because of several violations against classical linear regression assumtions (ALLISON 999). The logical choice is logistic regression, which has been secifically develoed for binary data. To date, logistic regression has never directly incororated into geostatistical techniques. This aer resents a new methodology to combine logistic regression with kriging for maing the risks of occurrence of UXO. The technique is illustrated using a hyothetical site contaminated with UXO and classification erformances are comared with cokriging results. 3. Stochastic simulation of UXO distribution The satial distribution of UXO should be viewed as a oint rocess since the location of UXO is the variable of interest, and the stochastic simulation of a Poisson rocess can be used to model the satial distribution of UXO.
The Poisson rocesses rovide a common class of models for objects distributed in sace according to a uniform intensity (homogeneous Poisson rocess). In reality, however, the satial distribution of UXO is not uniform since its intensity changes satially because of the existence of secific targets. In such a case, one of its variants (the doubly stochastic Poisson rocess or DSPP) is used to model the satial distribution of UXO. A simulator has been develoed to generate non-conditional UXO realizations as the sum of two rocesses:. A homogeneous Poisson rocess describing the background objects that dislay an uniform intensity across the satial domain. 2. DSPP describing the satially varying mean (e.g. higher intensity around targets) Two tyes of bombs can be considered: airborne and mortar bombs, and the satial distribution of both fragments and UXO is simulated. Target-secific arameters can be entered by the user, such as ) targets coordinates, 2) ordnance size, 3) orientation, and 4) intensity, for airborne ordnance. For mortar ordnance, three zones are simulated: a firing zone, a target zone, and a fan zone. One of the realizations generated by the Poisson simulator is used as the hyothetical (true) UXO site and the number of both UXO and fragments using a ixel size of 5 x 5 is maed in Figure (left). # of objects E-tye estimate 2 2 5. 5. 5. 5. X X Figure : Ma of the number of objects (UXO and fragments) at the hyothetical site and ma of E-tye estimates. A series of non-conditional realizations (UXO and fragments) is generated by the simulator, each of them being converted into an intensity ma according to a given ixel or block size. Then, for each ixel, the conditional cumulative distribution function (ccdf) of the UXO intensity is numerically aroximated from the series of simulated values. The mean (E-tye) and variance of ccdfs as well as the robability of exceeding an action level are comuted and used as exhaustive secondary information. Figure (right) shows the E-tye estimate ma obtained from a series of realizations using a ixel size of 5 x 5. 4. Geostatistical interolation The risk of occurrence of unexloded ordnance (UXO) at any location within the study area is maed using geostatistical interolation technique. The basic aroach is to estimate the robabilities of occurrence of UXO at unsamled locations using hard data samled from the UXO site. Since hard data are never exhaustive nor % certain, secondary information can hel imroving the site characterization. In the UXO site, samling locations are digged to find out whether or not any UXO is resent. Thus, the rimary data are indicators of 2
occurrence of at least one UXO at each location digged ( = at least one UXO, = no UXO). Traditionally robabilities at unsamled locations are estimated from these hard data using indicator kriging (JOURNEL 983). However if exhaustive secondary data are available, variants of indicator kriging can be used. Consider the roblem of estimating the robability of occurrence of an attribute at an unsamled location u, where u is a vector of satial coordinates. The information available consists of hard indicator values (binary data) at n locations u, i(u ), =,2,,n and different tyes of secondary data y j (, j =,2,,S at all estimation grid nodes (exhaustive information). The secondary variable considered in this study is the exhaustive ma of E-tye estimates. The most commonly used aroach to incororate secondary information is cokriging. While a number of variants of cokriging algorithms has been develoed, only collocated cokriging () is considered here because of its numerical stability and simlicity. The basic idea is to incororate only the secondary datum co-located with the location being estimated, that is: n( ( λ ( i( u ) + λ ( [ y( m + m = = where m I and m are the global means of rimary and secondary variables. The second term of equation () corresonds to a rescaling of the secondary variable to the mean of the rimary variable to ensure unbiased estimation. Another aroach consists of redicting the robability as a function of only the co-located secondary datum (e.g. linear relation). This tye of regression however assumes that the residual values are satially uncorrelated, which is not always true. Simle kriging with varying local means () allows one to take the satial correlation of residuals into account. It amounts at relacing the known stationary mean in the simle kriging estimate by known varying means m ( derived from the secondary information: ( m ( = n( λ = ( [ i( u ) m The local means m ( are often derived from linear regression using indeendent variables. However, linear regression is not aroriate when binary data are used as deendent variables because of several violations of underlying assumtions:. Prediction errors are not normally distributed because data take only two values. 2. The errors are heteroscedastic, which occurs when the variance of the deendent variable varies with values of indeendent variables. 3. The redicted robabilities can be greater than or less than if the linear regression model, which is inherently unbounded, is used. Usually those values are set to either or arbitrarily which may lead to non-otimal estimates. Logistic regression overcomes these roblems by using odds ratios O, which are defined as: O = (3) where is the robability that the event occurs. The logistic model is then exressed as: ln = + β X (4) where X is the vector of indeendent variables. Odds ratios are not bounded but estimated robabilities lie between and after the backtransform of estimated ratios: = + ex( β X) (5) ( u )] I ] (2) () 3
Unlike traditional linear regression, which minimizes the error variance, arameters β in logistic regression are chosen to maximize the likelihood function (Maximum Likelihood Estimator). Logistic regression is then used to derive the local means m ( in. 5. Site classification Maing of robabilities is not a goal er se, but a reliminary ste towards the delineation of the area where at least one UXO exists. The imact of different robability thresholds and interolation techniques ( and ) on decision-making was investigated using the following rocedure:. The rimary data are indicators of resence of at least one UXO for the ixels of transects ositioned according to rior information. These indicators ( = at least one UXO, = no UXO) are comuted from the true UXO distribution created using the UXO simulator. The secondary information is the exhaustive ma of E-tye estimate. 2. Probabilities of occurrence of at least one UXO are estimated at any ixel using both collocated cokriging and simle kriging with varying local means derived from logistic regression. 3. Pixels are flagged for further geohysical survey if the estimated robability exceeds a given threshold. If the robability is below the threshold, then the ixel is left for no further action. 4. Comarison of classification achieved at the revious ste with the true UXO distribution allows the comutation of roortions of correct classification, false negative, and false ositive. This is done for a series of robability thresholds. 5. 4. 3. 2... Hard indicators.. 2. 3. 4. 5. Figure 2: The location ma of hard data obtained along transects. Closed circles indicate at least one UXO found and oen circles imly no UXO found. Colocated Cokriging SK with varying local means...5.5 X X Figure 3: Mas of robability of occurrence of at least one UXO estimated using two kriging algorithms: collocated cokriging (left), simle kriging with varying local means (right). The exhaustive ma of E-tye estimates was used as secondary information. 6. Results and discussions Figure 2 shows locations where hard data are collected from the hyothetical UXO site. Six transects are ositioned according to rior information available (e.g. locations of targets). Figure 3 shows robability mas roduced by and. Both techniques reroduce well higher robabilities around targets and lower robabilities in surrounding areas. These 4
robability mas are then used for site classification. The roortions of correct classification, false negative and false ositive are comuted for a series of robability thresholds, see Figure 4. The term design reliability, R D is defined as -P UXO where P UXO corresonds to any robability threshold. Until a design reliability of.95, leads to a larger roortion of correct decision and less false ositive than collocated cokriging. Colocated CK SK with varying local means.8.8 Proortion of Decision.6.4 Correct False + Proortion of Decision.6.4.2.2.7.8.9.7.8.9 Figure 4: Proortions of correct classification (solid) and false ositive (dash) as a function of robability threshold. Results are obtained for two kriging algorithms: collocated cokriging (left) and simle kriging with varying local means derived using logistic regression (right). Since the ultimate goal of UXO site remediation is to leave zero UXO after remediation, riority should be given to minimization of the roortion of false negative. Figure 5 deicts the roortion of false negative over a range of design reliability values. The roortions of false negative are basically ket very low (less than 5%). These roortions are relatively higher for than for for a design reliability below.95. In this aer, the combination of logistic regression with geostatistical characterization of UXO site was investigated. Logistic regression was couled with simle kriging to ma the robability of occurrence of at least one UXO. Results indicate the benefit of logistic regression in terms of correct classification and false ositive. The technique can be easily exanded to incororate more than two additional variables. 7. Acknowledgment This work was suorted by the Strategic Environmental Research and Develoment Program (SERDP), UXO Cleanu rogram under grant UX-2. Sandia is a multirogram laboratory oerated by Sandia Cororation, a Lockheed Martin Comany, for the United States Deartment of Energy under contract DE-AC4-94-AL-85. 8. References Goovaerts P., 2: Geostatistical aroach for incororating elevation into the satial interolation of rainfall. Journal of Hydrology, 228,. 3-29. Goovaerts P., 997: Geostatistics for Natural Resources Evaluation. Oxford University Press: New ork (Oxford University Press),. 483. Allison, P.D., 999: Logistic Regression Using the SAS System: Theory and Alication. Cary, NC (SAS Institute),. 288. Journel, A. G., 983: Non-arametric estimation of satial distributions. Mathematical Geology, 5,. 445-468. 5 Proortion of Decision 5 4 3 2 False negative.7.8.9 Figure 5: The imact of design reliability over false negative roduced by two kriging ( and ).