GEOG4340 Q. Cheng Geographic Information System Lecture Nine Data Integration Spatial Decision Support System (SDSS) Mapping areas for drilling in mining industry Multivariate Logistic Regression
3D Simulation and Decision Making Various Types of Data Integration Combining multi-layers layers of geoinformation for decision making Multi-sources Multi-scales Multi-formats Multi-owners Multi-temporal temporal captures Spatial Decision Support System Multiple Map Modeling Decision Theory is concerned with the logic by which one arrives at a choice between alternatives. Alternative Actions Alternative hypotheses Alternative objects so on Potential Applications Site Selection Suitability Assessment Favorability Assessment Probability Assessment Spatial Decision Support System (SDSS) GIS Data Integration for Prediction Suitability Map for Planning School GIS Data Sources DMS DMS DMS DMS Data Preprocessing Interpreting Information Extraction Processing Evidential Layers (X) Geophysical Geochemical.. Geographical Geological Remote Sensing Modeling (F) Integration Output Data (S) Potential 2
A General Spatial Modeling Processes Stating the problem reaking the problem down Exploring input datasets Determining analysis processes Verifying the model s result Implementing the result and reporting 3
Model of Processes for finding Distance from rec. facilities Model of Processes for finding Distance beyond existing schools Recreational Site uffer Rec. Site uffer Schools uffer School uffer Reclassify Distance to Rec. Site Reclassify Distance to School Model of Processes for finding Relative flat area Model of Processes for finding Suitable landuse type Elevation Slope Slope Map Landuse Map Reclassify Landuse Classes Reclassify Slope Classes Model Constraints Normalization:. Convert maps into comparable unit x i =, 2,..., 0, yes, present, suitable x i = 0, no, absent, notsuitable Model of Processes for combining Diverse maps Landuse Slope 2. Assigning weights for each map as % Dist. Rec. Site Calculator Suitability Map w + w 0 w i 2 +... + w n = Dist. School 4
Combine Maps Grid calculator with equation S = 0.50 rec_site +0.25 dist_school + 0.25rec_landuse + 0.25 rec_slope Map S has values between -0 Model Validation Are the criteria reasonable? Is the model valid? Does the result meet the requirement? Are there errors related to the result? Are all data used necessary? General Data Integration Model for SDSS S = F(x, x 2,, x n ) S Index map showing Suitability Probability x i - maps or evidences w i - weights Simple Linear Model S = w x + w2x2 +... + w n x n S Index map showing Suitability Probability x i - maps or evidences w i - weights Model Constraints Normalization:. Convert maps into comparable unit, yes, yes x i = x i = 0.5, unknown x i =,2,..., 0 0, no 0, no 2. Weights showing relative importance of maps w + w +... + w = 0 w i 2 n 5
Methods for Calculating Weights for Data Integration Data Driven Methods: Weights of evidence Logistic regression Artificial Neural network Knowledge driven Methods: Fuzzy logic Hybrid Methods: Fuzzy weights of evidence Model Types. Probabilistic S random variable showing probability 0 S with uncertainty 2. Deterministic S Score 0 S Relationships etween Different Models Spatial Data Modeler Extension: Arc-SDM Simple Overlay Model (Union, Intersect, Identity) Linear Model (adding weights) Logistic Model (Weights of Evidence, Logistic Regression) Fuzzy Logic model (various operators) Weights of Evidence Logistic Regression Fuzzy Logic Neural Network 6
A A not not A not A not A not A not A ID 2 3 4 Area 7 3 35 45 PolyA A A nota nota Poly not not Points 4 2 3 points/area 0.57 0.5 0.02 0.06 Prior probability total number of point / total area 0/00 = 0.0 (0%) Concept of Prior probability and Posterior probability 0.57 not A 0.06 0.02 not A not 0.5 A not Prior probability: total number of point / total area 0/00 = 0.0 (0%) Posterior probability: number of point /pattern area (density of point/area) - P(D A) Percentage of points: # points on pattern/total # of points P(A D) Posterior probability: number of point /pattern area 4
Three patterns: trees, lake and road buffer 0.(0.2) not A not A C A not notc A not C nota not notc not A not C A 0.4(0.42) 0.2(0.8) A not not A C nota notc A notc 0.3 P ( A (0.28) D) = 0.5 not A Percentage of points not A 0.4 (0.42) 0.2 (0.8) 0.6 not A 0.3 (0.28) 0. (0.2) 0.4 0.7 0.3 Percentage of points not A 0.4 (0.42) P(A D) 0.2 (0.8) P(A not D) 0.6 P(A D) not A 0.3 (0.28) P(notA D) 0. (0.2) 0.4 0.7 P( D) 0.3 P(not D) P(notA not D) P(notA D) Joint probability marginal probability Percentage of points of independent events Percentage of points on A = % points on A * % points on P(A D) = P(A D) P( D) Joint probability marginal probability 5
A 0.07(0.0) not A 0.3 P ( A (0.0) D) = 0.5 0.45(0.42) A not Percentage of Areas 0.35(0.38) not A not not A 0.07 (0.0) 0.45 (0.42) 0.52 not A 0.3 (0.0) 0.35 (0.38) 0.48 0.2 0.8 Percentage of areas not A 0.07 (0.0) 0.45 (0.42) 0.52 not A 0.3 (0.0) P(A) P(notA ) P(A not) P(A) 0.35 (0.38) P(notA not) 0.48 P(notA) Joint probability 0.20 0.80 P() P(not) marginal probability Percentage of areas of independent events % Area of A = % Area of A * % Area of Joint probability P(A) = P(A) P() marginal probability ayes s Rule: Probability map P(D A) = P(D)P(A D)/P(A) P(D nota) = P(D)P(notA D)/P(notD) P(D A) = P(D) P(A D)/P(A) P( D)/P() P(D Anot) = P(D)P(A D)/P(A) P(not D)/P(not) P(D not A) = P(D)P(notA D)/P(notA) P( D)/P() P(D not A not) = P(D)P(notA D)/P(notA) P(not D)/P(not ) 6
ayes s Rule: If A,, C are conditionally independent then Log (Probability) log[p(d A)] = log[p(d)]+log[p(a D)/P(A)] = Log[P(D)] + W + A Log[P(D nota)] = log[p(d)] + log[p(nota D)/P(notD)] = Log[P(D)] + W - A Where W A+ = log[p(a D)/P(A)] W A- = log[p(nota D)/P(notA)] Log[P(D A)] = log[p(d)] +W A+ + W + Log[P(D A not)] = log[p(d)] +W A+ + W - Log[P(D nota)] = log[p(d)] +W A- + W + Log[P(D nota not )] = log[p(d)] +W A- + W - P(A D) W A+ = log[p(a D)/P(A)] = Log[ ] P(A) % points on A = Log[ ] % Area of A W + A > 0 positive correlation between A and points W + A = 0 no correlation between A and points W + A < 0 negative correlation between A and points Spatial Association Index Contrast C = W A+ -W - A () - < C < (2) C = 0 A and D are independent (3) C > 0 positive correlation between D and A (4) C < 0 negative correlation between D and A 7
Logistic Model for SDSS P(D) P( D A), P( D ) P( D A...) Prior Probability Posterior Probability Logit{ D A...} W0 + W A + W +... = #(D) =20 Area(T) = 7780 P(D) = 0.0026 Area(A) =3065 #(D A) = 5 P(D A) = 0.0049 Area() =475 #(D ) = 9 P(D ) = 0.0045 7
Area(A) =624.27 #(D A) = 3 P(D A) = 0.008 8
Au, W, As, Au- Sn- W- As Multiple Elements Spatial Data Modeler Extension: Arc-SDM Weights of Evidence Logistic Regression Fuzzy Logic Neural Network 9
Logistic Model for SDSS Posterior Probability P( D A...) Prediction of Potential Flowing Wells in the ORM Logit{ D A...} W0 + W A + W +... = Flowing Wells and Springs Flowing Wells vs. Distance from ORM Spatial Correlation 2 8 4 0-4 -8 0 5000 0000 5000 Distance Flowing Wells vs. Distance from ORM Flowing Wells vs. Distance From High Slope Zone Spatial Correlation Spatial Correlation 8 4 0-4 Distance -8 0 2000 4000 6000 8000 Distance 0
Flowing Wells vs. Thickness of Drift Flowing vs. Distance from Thick Drift 8 Spatial Correlation 4 0-4 0 5000 0000 5000 20000 Distance Flowing vs. Distance from Thick Drift Potential Locations of Flowing Wells by SDSS Weights of Evidence (Cheng, 200) Results obtained by Weights of Evidence and Logistic Regression Methods Theme Area t- LR LR Points% Contrast % value Coeff. Std uffer zone (~km) around steep slope 63 80 0.89 6.54 0.88 0.4 uffer zone (~5km) around the ORM 40.3 55.3 0.62 5.68 0.28 0.2 uffer zone around steep slope of lower sand / gravel top 0. 52.7 0.62 5.68.77 0.2 Ratio of sand/gravel unit cumulative thickness in well depth (6~25%) 43.5 65. 0.90 7.88 0.49 0.2 uffer zone (~2.5km) of thick drift area 6.2 29.3 0.79 6.56 0.45 0.3 Elevation of the upper confined aqu ifers at 356~375 (m a. s. l.) 6.8 38.6.8 0.440.47 0.3 Elevation of the lower confined aquifers at 3~347 (m a. s. l.) 40.5 58.2 0.73 6.60 0.28 0.3 Steep slope of confined aquifer surface 22.9 54.7.45 3.9 0.97 0.2 uffer zone ( 0~2km) around the small ponds 6. 74.9 0.67 5.3 0.43 0.3 Intercept constant -.62 0.52
Multivariate Logistic Regression 2