Bivariate Sample Statistics Geog 210C Itroductio to Spatial Data Aalysis Chris Fuk Lecture 7
Overview Real statistical applicatio: Remote moitorig of east Africa log rais Lead up to Lab 5-6 Review of bivariate/multivariate relatioships Defiitio of variace Defiitio of co-variace Etesio to multi-variate case 2 C. Fuk Geog 210C Sprig 2011
Food Isecurity 3 C. Fuk Geog 210C Sprig 2011
East Africa Food Isecurity 4 C. Fuk Geog 210C Sprig 2011
Nairobi Prices 5 C. Fuk Geog 210C Sprig 2011
Keya Livelihoods 6 C. Fuk Geog 210C Sprig 2011
Child Stutig 7 C. Fuk Geog 210C Sprig 2011
Keya Populatio ad Number of raidays, past 30 days as of April 17th Image o the left shows Ladsca populatio desity overlai with a blue mask. Regios ot masked are i Keya ad had less tha 9 rai days durig the past moth. Halfway through the seaso, a large populatio ceter appears to be at risk? http://earlywarig.usgs.gov:8080/ewx/ide.html
Percet of March-April Raifall 9 C. Fuk Geog 210C Sprig 2011
Cocers for Cetral & Easter Keya If the rest of the seasoal is ormal (which is probably ulikely), cetral Keya will have seasoal totals 24% below ormal. If the seaso is plays out like 2009, Cetral Keya might receive ~40% below ormal. If the rest of the seasoal is ormal (which is probably ulikely), cetral Keya will have seasoal totals 23% below ormal. If the seaso is plays out like 2009, Cetral Keya might receive ~50% of ormal. Cetral Provice } Easter Provice Short Term Mea } 2011 2009 So far, the seaso matches 2009 eactly By 1 st Dekad of May, Raifall rates drop rapidly http://earlywarig.usgs.gov:8080/ewx/ide.html
Aomalies for March March RFE2 Aomalies Populatio Desity Area SE of Nairobi has ha +6-7 C LST Aomalies, ad ~-50--75 mm March raifall aomalies. Area NE of Nairobi has has +6-7 C LST Aomalies, ad ~-50 mm March raifall aomalies. http://earlywarig.usgs.gov:8080/ewx/ide.html March LST Aomalies
Number of Rai Days
Ma Cosecutive Dry Days
Settig Data pairs of two attributes X & Y, measured at N samplig uits: there are N pairs of attribute values {(, y ), = 1,..., N} Scatter plot: graph of y- versus -values i attribute space: y-values serve as coordiates i vertical ais, -values as coordiates i horizotal ais; -th poit i scatter-plot has coordiates (, y ) 1 4 Objective: provide a quatitative summary of the above scatter plot as a measure of associatio betwee - ad y-values C. Fuk Geog 210C Sprig 2011
Scatter Plot Quadrats (, y) Scatter plot ceter: poit with coordiates equal to the data meas: N 1 N 1 N, y N y 1 1 Scatter plot quadrats: The lie etedig from the mea- parallel to the y-ais ad the lie etedig from the mea-y parallel to the -ais defie 4 quadrats i the scatter-plot. Deviatios from the mea: ay measure associatio betwee X ad Y should be idepedet of where the sample scatter plot is cetered. Cosequetly, we ll be lookig at deviatios of the data from their respective meas: (, y y) quadrat I: quadrat II: quadrat III: quadrat IV: 0, 0, 0, 0, y y y y y y y y 0 0 0 0 1 5 C. Fuk Geog 210C Sprig 2011
Products of Data Deviatios from their Meas Sice we are after a measure of associatio, we compute products of data deviatios from their meas. A large positive product idicates high - ad y-values of the same sig. A large egative product idicates high - ad y-values of differet sig. Product sigs i differet quadrats: 1 6 quadrat I: quadrat II: quadrat III: 0 & y 0 & y 0 & y y 0 y 0 y 0 )( y quadrat IV: 0 & y y 0 ( )( y y) 0 ( ( ( )( y )( y y) 0 y) 0 y) 0 C. Fuk Geog 210C Sprig 2011
Sample Covariace of a Scatter Plot Products of deviatios from meas: Average of N products: Sample covariace betwee data of attributes ad y: 1 7 Sample variace = covariace of a attribute with itself: y C. Fuk Geog 210C Sprig 2011
Iterpretig The Sample Covariace Sample covariace betwee data of attributes X ad Y : Iterpretatio: large positive covariace idicates data pairs predomiatly lyig i quadrats I ad III large egative covariace idicates data pairs predomiatly lyig i quadrats II ad IV small covariace idicates data pairs lyig i all quadrats, i which case positive ad egative products cacel out whe oe computes their mea 1 8 NOTE: The covariace is a measure of liear associatio betwee X ad Y, ad just a summary measure of the actual scatter plot C. Fuk Geog 210C Sprig 2011
Sample Covariace ad Correlatio Coefficiet Problems with sample covariace: ot easily iterpretable, sice - ad y-values ca have differet uits ad sample variaces sesitive to outliers; quatifies oly liear relatioships Sample correlatio coefficiet: Pearso s product momet correlatio: lies i [ 1, +1]; sesitive to outliers; quatifies oly liear relatioships Sample rak correlatio coefficiet: (Spearma s correlatio): rak trasform each sample data set, by assigig a rak of 1 to the smallest value ad a rak of N to the largest oe trasform each data pair {, y } ito a rak pair {r( ), r(y )}, where r( ) ad r(y ) is the rak of ad y compute the correlatio coefficiet of the rak pairs, as: ca detect o-liear mootoic relatioships 1 9 C. Fuk Geog 210C Sprig 2011
Momet of Iertia of a Scatter Plot Motivatio: Istead of lookig at average product of deviatios from mea, we could look at the momet of iertia of a scatter plot; that is, the average squared distace betwee ay pair (, y ) ad the 45 lie; Note: such a lie does ot always make sese, but so be it for ow Momet of iertia = average deviatio of scatter plot poits from the 45 lie: Note: The momet of iertia for a scatter plot aliged with the 45 lie is always 0; that is, y, yy 0 2 0 alteratively, the dissimilarity of a attribute with itself is 0 C. Fuk Geog 210C Sprig 2011
Lik Betwee Covariace ad Momet of Iertia Recall: Epadig: 2 1 What s the differece: To estimate the momet of iertia γxy you do ot eed to kow the mea values μ X ad μ Y ; these two mea values are required for estimatig the covariace σ XY C. Fuk Geog 210C Sprig 2011
Geometric Iterpretatio (I) Vector legth: legth = distace of poit with coordiates { 1,..., N } from origi Vector-scalar multiplicatio: Multiplicatio of a vector by a scalar c chages legth (ad directio, depedig o sig of c): Ier product of two vectors: a scalar quatity (could be egative, zero or positive) Vector legth: ier product of a vector with itself 2 2 Agle θ betwee two vectors y ad : C. Fuk Geog 210C Sprig 2011
Geometric Iterpretatio (II) ~ Let deote the vector of deviatios (cetered vector) Variace: Covariace: Correlatio Coefficiet Iterpretatio 2 3 C. Fuk Geog 210C Sprig 2011
Geometric Iterpretatio (III) Projectio vector: ( shadow ) of vector y oto vector : Projectio legth: Uit Vector The sample mea vector Regressio = projectio: 2 4 C. Fuk Geog 210C Sprig 2011
Computig Multivariate Sample Statistics (I) Multivariate data set: N measuremets o K attributes {X 1,..., X K } made at N samplig uits ad arraged i a (N K) matri X: k = -th measuremet for the k-th variable X k -th row cotais K measuremets of differet attributes at a sigle samplig uit k-th colum cotais N measuremets of a sigle attribute at all N samplig uits Multivariate sample mea: Coditioal multivariate mea vector: (K 1) vector of mea values for all K attributes, computed oly from those rows of X whose etries satisfy some coditio (or query) 2 5 C. Fuk Geog 210C Sprig 2011
Computig Multivariate Sample Statistics (II) Matri of meas: Matri of deviatios from meas: 2 6 C. Fuk Geog 210C Sprig 2011
Computig Multivariate Sample Statistics (III) Matri of squares ad cross-products: Sample covariace matri: 2 7 Note: I the presece of missig values, oe should compute all variace ad covariace values oly from those N < N rows of matri X with o missig values. This esures that the resultig covariace matri Σ is a valid oe. Coditioal covariace matri: (K K) covariace matri betwee all K 2 pairs of attributes, computed oly from those rows of X whose etries satisfy some coditio (or query) C. Fuk Geog 210C Sprig 2011