Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Similar documents
An Analysis of Reliable Classifiers through ROC Isometrics

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Estimation of the large covariance matrix with two-step monotone missing data

4. Score normalization technical details We now discuss the technical details of the score normalization method.

On split sample and randomized confidence intervals for binomial proportions

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

Distributed Rule-Based Inference in the Presence of Redundant Information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations

Estimation of component redundancy in optimal age maintenance

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Hotelling s Two- Sample T 2

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

LOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

On Wald-Type Optimal Stopping for Brownian Motion

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

General Linear Model Introduction, Classes of Linear models and Estimation

An Outdoor Recreation Use Model with Applications to Evaluating Survey Estimators

Genetic Algorithms, Selection Schemes, and the Varying Eects of Noise. IlliGAL Report No November Department of General Engineering

State Estimation with ARMarkov Models

Linear diophantine equations for discrete tomography

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA

Radial Basis Function Networks: Algorithms

Probability Estimates for Multi-class Classification by Pairwise Coupling

A New Asymmetric Interaction Ridge (AIR) Regression Method

x and y suer from two tyes of additive noise [], [3] Uncertainties e x, e y, where the only rior knowledge is their boundedness and zero mean Gaussian

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Lower bound solutions for bearing capacity of jointed rock

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

SAS for Bayesian Mediation Analysis

Minimax Design of Nonnegative Finite Impulse Response Filters

Multivariable Generalized Predictive Scheme for Gas Turbine Control in Combined Cycle Power Plant

PERFORMANCE BASED DESIGN SYSTEM FOR CONCRETE MIXTURE WITH MULTI-OPTIMIZING GENETIC ALGORITHM

Algorithms for Air Traffic Flow Management under Stochastic Environments

Factors Effect on the Saturation Parameter S and there Influences on the Gain Behavior of Ytterbium Doped Fiber Amplifier

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION

Research of PMU Optimal Placement in Power Systems

Feedback-error control

ASSESSMENT OF NUMERICAL UNCERTAINTY FOR THE CALCULATIONS OF TURBULENT FLOW OVER A BACKWARD FACING STEP

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

Developing A Deterioration Probabilistic Model for Rail Wear

Performance of lag length selection criteria in three different situations

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population

Uncertainty Modeling with Interval Type-2 Fuzzy Logic Systems in Mobile Robotics

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations

On Fractional Predictive PID Controller Design Method Emmanuel Edet*. Reza Katebi.**

Notes on Instrumental Variables Methods

An Improved Generalized Estimation Procedure of Current Population Mean in Two-Occasion Successive Sampling

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

arxiv: v1 [physics.data-an] 26 Oct 2012

Asymptotic Properties of the Markov Chain Model method of finding Markov chains Generators of..

MULTIVARIATE STATISTICAL PROCESS OF HOTELLING S T CONTROL CHARTS PROCEDURES WITH INDUSTRIAL APPLICATION

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES

Brownian Motion and Random Prime Factorization

Temperature, current and doping dependence of non-ideality factor for pnp and npn punch-through structures

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Effective conductivity in a lattice model for binary disordered media with complex distributions of grain sizes

Supplementary Materials for Robust Estimation of the False Discovery Rate

Analysis of M/M/n/K Queue with Multiple Priorities

Ratio Estimators in Simple Random Sampling Using Information on Auxiliary Attribute

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

An Improved Calibration Method for a Chopped Pyrgeometer

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Hidden Predictors: A Factor Analysis Primer

AN OPTIMAL CONTROL CHART FOR NON-NORMAL PROCESSES

Optimal Recognition Algorithm for Cameras of Lasers Evanescent

Modelling of non-uniform DC driven glow discharge in argon gas

Introduction to Probability and Statistics

The Noise Power Ratio - Theory and ADC Testing

STABILITY ANALYSIS AND CONTROL OF STOCHASTIC DYNAMIC SYSTEMS USING POLYNOMIAL CHAOS. A Dissertation JAMES ROBERT FISHER

On parameter estimation in deformable models

MODEL-BASED MULTIPLE FAULT DETECTION AND ISOLATION FOR NONLINEAR SYSTEMS

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Information collection on a graph

Chapter 10. Supplemental Text Material

Evaluation of the critical wave groups method for calculating the probability of extreme ship responses in beam seas

A PEAK FACTOR FOR PREDICTING NON-GAUSSIAN PEAK RESULTANT RESPONSE OF WIND-EXCITED TALL BUILDINGS

Observer/Kalman Filter Time Varying System Identification

Session 5: Review of Classical Astrodynamics

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation

The Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions.

Estimating function analysis for a class of Tweedie regression models

%(*)= E A i* eiujt > (!) 3=~N/2

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators

Covariance Matrix Estimation for Reinforcement Learning

Applied Mathematics and Computation

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Adaptive estimation with change detection for streaming data

Uniform Law on the Unit Sphere of a Banach Space

MULTIVARIATE SHEWHART QUALITY CONTROL FOR STANDARD DEVIATION

One-way ANOVA Inference for one-way ANOVA

Chapter 1 Fundamentals

Plotting the Wilson distribution

Transcription:

Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment of Civil and Environmental Engineering, The University of Michigan, 35, Beal Ave., Ann Arbor, Michigan, 489-225, U.S.A., hirotaka@engin.umich.edu, (2) Geohydrology Deartment, Sandia National Laboratories, PO Box 58 MS 735, Albuquerque, NM, 8785-735, U.S.A. Abstract This aer resents a methodology that combines logistic regression with kriging for incororating exhaustive secondary information into the maing of the risk of occurrence of unexloded ordnance (UXO). Logistic regression, which is aroriate for binary data (indicators) analysis, is used to derive the trend comonent in simle kriging with varying local means (). The technique is illustrated using two tyes of information: ) hard indicators samled along transects on a hyothetical UXO site generated using a doubly stochastic Poisson rocess, 2) exhaustive soft information obtained through the rocessing of a series of realizations generated by the same oint rocess. After risks are maed, ixels are flagged for further investigation if the estimated robability exceeds a given threshold. This classification is used to comare the erformance of the roosed technique with traditional cokriging (collocated cokriging). Fewer misclassifications and smaller false ositive rates are obtained for derived from logistic regression. The roortion of false negative is below 5% for both techniques. 2. Introduction Maing the risk of occurrence of unexloded ordnance (UXO) at any military sites is imortant esecially as these sites are reared for return to the ublic sector. Efficient and recise site characterization is necessary. In lace of classical statistical aroaches, geostatistical techniques should be referred because of their ability to take into account satial correlation and many different kinds of ancillary information. It is well recognized that site characterization imroves esecially when the rimary variable, which is often samled sarsely because of cost and time constraints, is sulemented with abundant (exhaustive) additional information (GOOVAERTS 2). A number of geostatistical techniques have been develoed to incororate exhaustive secondary data (GOOVAERTS 997). Among available algorithms, simle kriging with varying local means () rovides flexibility in trend modeling and mathematical simlicity. The basic idea of is the combination of deterministic trend modeling with geostatistical interolation of residuals. Residuals are usually modeled using a stationary random function so that simle kriging can be alied. The remaining question is then what kind of deterministic function should be used for the trend modeling. Linear regression is straightforward but is not aroriate if the rimary data are binary indicators because of several violations against classical linear regression assumtions (ALLISON 999). The logical choice is logistic regression, which has been secifically develoed for binary data. To date, logistic regression has never directly incororated into geostatistical techniques. This aer resents a new methodology to combine logistic regression with kriging for maing the risks of occurrence of UXO. The technique is illustrated using a hyothetical site contaminated with UXO and classification erformances are comared with cokriging results. 3. Stochastic simulation of UXO distribution The satial distribution of UXO should be viewed as a oint rocess since the location of UXO is the variable of interest, and the stochastic simulation of a Poisson rocess can be used to model the satial distribution of UXO.

The Poisson rocesses rovide a common class of models for objects distributed in sace according to a uniform intensity (homogeneous Poisson rocess). In reality, however, the satial distribution of UXO is not uniform since its intensity changes satially because of the existence of secific targets. In such a case, one of its variants (the doubly stochastic Poisson rocess or DSPP) is used to model the satial distribution of UXO. A simulator has been develoed to generate non-conditional UXO realizations as the sum of two rocesses:. A homogeneous Poisson rocess describing the background objects that dislay an uniform intensity across the satial domain. 2. DSPP describing the satially varying mean (e.g. higher intensity around targets) Two tyes of bombs can be considered: airborne and mortar bombs, and the satial distribution of both fragments and UXO is simulated. Target-secific arameters can be entered by the user, such as ) targets coordinates, 2) ordnance size, 3) orientation, and 4) intensity, for airborne ordnance. For mortar ordnance, three zones are simulated: a firing zone, a target zone, and a fan zone. One of the realizations generated by the Poisson simulator is used as the hyothetical (true) UXO site and the number of both UXO and fragments using a ixel size of 5 x 5 is maed in Figure (left). # of objects E-tye estimate 2 2 5. 5. 5. 5. X X Figure : Ma of the number of objects (UXO and fragments) at the hyothetical site and ma of E-tye estimates. A series of non-conditional realizations (UXO and fragments) is generated by the simulator, each of them being converted into an intensity ma according to a given ixel or block size. Then, for each ixel, the conditional cumulative distribution function (ccdf) of the UXO intensity is numerically aroximated from the series of simulated values. The mean (E-tye) and variance of ccdfs as well as the robability of exceeding an action level are comuted and used as exhaustive secondary information. Figure (right) shows the E-tye estimate ma obtained from a series of realizations using a ixel size of 5 x 5. 4. Geostatistical interolation The risk of occurrence of unexloded ordnance (UXO) at any location within the study area is maed using geostatistical interolation technique. The basic aroach is to estimate the robabilities of occurrence of UXO at unsamled locations using hard data samled from the UXO site. Since hard data are never exhaustive nor % certain, secondary information can hel imroving the site characterization. In the UXO site, samling locations are digged to find out whether or not any UXO is resent. Thus, the rimary data are indicators of 2

occurrence of at least one UXO at each location digged ( = at least one UXO, = no UXO). Traditionally robabilities at unsamled locations are estimated from these hard data using indicator kriging (JOURNEL 983). However if exhaustive secondary data are available, variants of indicator kriging can be used. Consider the roblem of estimating the robability of occurrence of an attribute at an unsamled location u, where u is a vector of satial coordinates. The information available consists of hard indicator values (binary data) at n locations u, i(u ), =,2,,n and different tyes of secondary data y j (, j =,2,,S at all estimation grid nodes (exhaustive information). The secondary variable considered in this study is the exhaustive ma of E-tye estimates. The most commonly used aroach to incororate secondary information is cokriging. While a number of variants of cokriging algorithms has been develoed, only collocated cokriging () is considered here because of its numerical stability and simlicity. The basic idea is to incororate only the secondary datum co-located with the location being estimated, that is: n( ( λ ( i( u ) + λ ( [ y( m + m = = where m I and m are the global means of rimary and secondary variables. The second term of equation () corresonds to a rescaling of the secondary variable to the mean of the rimary variable to ensure unbiased estimation. Another aroach consists of redicting the robability as a function of only the co-located secondary datum (e.g. linear relation). This tye of regression however assumes that the residual values are satially uncorrelated, which is not always true. Simle kriging with varying local means () allows one to take the satial correlation of residuals into account. It amounts at relacing the known stationary mean in the simle kriging estimate by known varying means m ( derived from the secondary information: ( m ( = n( λ = ( [ i( u ) m The local means m ( are often derived from linear regression using indeendent variables. However, linear regression is not aroriate when binary data are used as deendent variables because of several violations of underlying assumtions:. Prediction errors are not normally distributed because data take only two values. 2. The errors are heteroscedastic, which occurs when the variance of the deendent variable varies with values of indeendent variables. 3. The redicted robabilities can be greater than or less than if the linear regression model, which is inherently unbounded, is used. Usually those values are set to either or arbitrarily which may lead to non-otimal estimates. Logistic regression overcomes these roblems by using odds ratios O, which are defined as: O = (3) where is the robability that the event occurs. The logistic model is then exressed as: ln = + β X (4) where X is the vector of indeendent variables. Odds ratios are not bounded but estimated robabilities lie between and after the backtransform of estimated ratios: = + ex( β X) (5) ( u )] I ] (2) () 3

Unlike traditional linear regression, which minimizes the error variance, arameters β in logistic regression are chosen to maximize the likelihood function (Maximum Likelihood Estimator). Logistic regression is then used to derive the local means m ( in. 5. Site classification Maing of robabilities is not a goal er se, but a reliminary ste towards the delineation of the area where at least one UXO exists. The imact of different robability thresholds and interolation techniques ( and ) on decision-making was investigated using the following rocedure:. The rimary data are indicators of resence of at least one UXO for the ixels of transects ositioned according to rior information. These indicators ( = at least one UXO, = no UXO) are comuted from the true UXO distribution created using the UXO simulator. The secondary information is the exhaustive ma of E-tye estimate. 2. Probabilities of occurrence of at least one UXO are estimated at any ixel using both collocated cokriging and simle kriging with varying local means derived from logistic regression. 3. Pixels are flagged for further geohysical survey if the estimated robability exceeds a given threshold. If the robability is below the threshold, then the ixel is left for no further action. 4. Comarison of classification achieved at the revious ste with the true UXO distribution allows the comutation of roortions of correct classification, false negative, and false ositive. This is done for a series of robability thresholds. 5. 4. 3. 2... Hard indicators.. 2. 3. 4. 5. Figure 2: The location ma of hard data obtained along transects. Closed circles indicate at least one UXO found and oen circles imly no UXO found. Colocated Cokriging SK with varying local means...5.5 X X Figure 3: Mas of robability of occurrence of at least one UXO estimated using two kriging algorithms: collocated cokriging (left), simle kriging with varying local means (right). The exhaustive ma of E-tye estimates was used as secondary information. 6. Results and discussions Figure 2 shows locations where hard data are collected from the hyothetical UXO site. Six transects are ositioned according to rior information available (e.g. locations of targets). Figure 3 shows robability mas roduced by and. Both techniques reroduce well higher robabilities around targets and lower robabilities in surrounding areas. These 4

robability mas are then used for site classification. The roortions of correct classification, false negative and false ositive are comuted for a series of robability thresholds, see Figure 4. The term design reliability, R D is defined as -P UXO where P UXO corresonds to any robability threshold. Until a design reliability of.95, leads to a larger roortion of correct decision and less false ositive than collocated cokriging. Colocated CK SK with varying local means.8.8 Proortion of Decision.6.4 Correct False + Proortion of Decision.6.4.2.2.7.8.9.7.8.9 Figure 4: Proortions of correct classification (solid) and false ositive (dash) as a function of robability threshold. Results are obtained for two kriging algorithms: collocated cokriging (left) and simle kriging with varying local means derived using logistic regression (right). Since the ultimate goal of UXO site remediation is to leave zero UXO after remediation, riority should be given to minimization of the roortion of false negative. Figure 5 deicts the roortion of false negative over a range of design reliability values. The roortions of false negative are basically ket very low (less than 5%). These roortions are relatively higher for than for for a design reliability below.95. In this aer, the combination of logistic regression with geostatistical characterization of UXO site was investigated. Logistic regression was couled with simle kriging to ma the robability of occurrence of at least one UXO. Results indicate the benefit of logistic regression in terms of correct classification and false ositive. The technique can be easily exanded to incororate more than two additional variables. 7. Acknowledgment This work was suorted by the Strategic Environmental Research and Develoment Program (SERDP), UXO Cleanu rogram under grant UX-2. Sandia is a multirogram laboratory oerated by Sandia Cororation, a Lockheed Martin Comany, for the United States Deartment of Energy under contract DE-AC4-94-AL-85. 8. References Goovaerts P., 2: Geostatistical aroach for incororating elevation into the satial interolation of rainfall. Journal of Hydrology, 228,. 3-29. Goovaerts P., 997: Geostatistics for Natural Resources Evaluation. Oxford University Press: New ork (Oxford University Press),. 483. Allison, P.D., 999: Logistic Regression Using the SAS System: Theory and Alication. Cary, NC (SAS Institute),. 288. Journel, A. G., 983: Non-arametric estimation of satial distributions. Mathematical Geology, 5,. 445-468. 5 Proortion of Decision 5 4 3 2 False negative.7.8.9 Figure 5: The imact of design reliability over false negative roduced by two kriging ( and ).