Bivariate Sample Statistics Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 7

Similar documents
11 Correlation and Regression

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

1 Inferential Methods for Correlation and Regression Analysis

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

MCT242: Electronic Instrumentation Lecture 2: Instrumentation Definitions

STP 226 ELEMENTARY STATISTICS

Mathematical Notation Math Introduction to Applied Statistics

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

ECON 3150/4150, Spring term Lecture 3

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Correlation Regression

5.1 Review of Singular Value Decomposition (SVD)

Algebra of Least Squares

R is a scalar defined as follows:

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Chapter 12 Correlation

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

Parameter, Statistic and Random Samples

Linear Regression Models

Correlation and Covariance

a is some real number (called the coefficient) other

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Random Variables, Sampling and Estimation

Chapter 4 - Summarizing Numerical Data

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Paired Data and Linear Correlation

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

Axis Aligned Ellipsoid

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Properties and Hypothesis Testing

Chapter Vectors

Statistical Fundamentals and Control Charts

Session 5. (1) Principal component analysis and Karhunen-Loève transformation

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Dept. of maths, MJ College.

Regression, Inference, and Model Building

Linear Regression Demystified

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Correlation and Regression

Joint Probability Distributions and Random Samples. Jointly Distributed Random Variables. Chapter { }

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

Median and IQR The median is the value which divides the ordered data values in half.

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

MATH/STAT 352: Lecture 15

September 2012 C1 Note. C1 Notes (Edexcel) Copyright - For AS, A2 notes and IGCSE / GCSE worksheets 1

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

Lecture 3. Properties of Summary Statistics: Sampling Distribution

32 estimating the cumulative distribution function

Stat 139 Homework 7 Solutions, Fall 2015

Ray-triangle intersection

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13

Efficient GMM LECTURE 12 GMM II

UCLA STAT 110B Applied Statistics for Engineering and the Sciences

Simple Linear Regression

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

The z-transform. 7.1 Introduction. 7.2 The z-transform Derivation of the z-transform: x[n] = z n LTI system, h[n] z = re j

Simple Linear Regression

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

a b c d e f g h Supplementary Information

Sample Size Determination (Two or More Samples)


Pearson Edexcel Level 3 Advanced Subsidiary and Advanced GCE in Statistics

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Analysis of Experimental Data

Chapter 6. Sampling and Estimation

MA238 Assignment 4 Solutions (part a)

Orthogonal transformations

Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other

Problem Cosider the curve give parametrically as x = si t ad y = + cos t for» t» ß: (a) Describe the path this traverses: Where does it start (whe t =

Output Analysis (2, Chapters 10 &11 Law)

(6) Fundamental Sampling Distribution and Data Discription

Final Examination Solutions 17/6/2010

Introduction to Machine Learning DIS10

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

AP Calculus BC 2011 Scoring Guidelines Form B

Statistics 511 Additional Materials

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

FFTs in Graphics and Vision. The Fast Fourier Transform

Probability and statistics: basic terms

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

Section 14. Simple linear regression.

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Soo King Lim Figure 1: Figure 2: Figure 3: Figure 4: Figure 5: Figure 6: Figure 7:

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Transcription:

Bivariate Sample Statistics Geog 210C Itroductio to Spatial Data Aalysis Chris Fuk Lecture 7

Overview Real statistical applicatio: Remote moitorig of east Africa log rais Lead up to Lab 5-6 Review of bivariate/multivariate relatioships Defiitio of variace Defiitio of co-variace Etesio to multi-variate case 2 C. Fuk Geog 210C Sprig 2011

Food Isecurity 3 C. Fuk Geog 210C Sprig 2011

East Africa Food Isecurity 4 C. Fuk Geog 210C Sprig 2011

Nairobi Prices 5 C. Fuk Geog 210C Sprig 2011

Keya Livelihoods 6 C. Fuk Geog 210C Sprig 2011

Child Stutig 7 C. Fuk Geog 210C Sprig 2011

Keya Populatio ad Number of raidays, past 30 days as of April 17th Image o the left shows Ladsca populatio desity overlai with a blue mask. Regios ot masked are i Keya ad had less tha 9 rai days durig the past moth. Halfway through the seaso, a large populatio ceter appears to be at risk? http://earlywarig.usgs.gov:8080/ewx/ide.html

Percet of March-April Raifall 9 C. Fuk Geog 210C Sprig 2011

Cocers for Cetral & Easter Keya If the rest of the seasoal is ormal (which is probably ulikely), cetral Keya will have seasoal totals 24% below ormal. If the seaso is plays out like 2009, Cetral Keya might receive ~40% below ormal. If the rest of the seasoal is ormal (which is probably ulikely), cetral Keya will have seasoal totals 23% below ormal. If the seaso is plays out like 2009, Cetral Keya might receive ~50% of ormal. Cetral Provice } Easter Provice Short Term Mea } 2011 2009 So far, the seaso matches 2009 eactly By 1 st Dekad of May, Raifall rates drop rapidly http://earlywarig.usgs.gov:8080/ewx/ide.html

Aomalies for March March RFE2 Aomalies Populatio Desity Area SE of Nairobi has ha +6-7 C LST Aomalies, ad ~-50--75 mm March raifall aomalies. Area NE of Nairobi has has +6-7 C LST Aomalies, ad ~-50 mm March raifall aomalies. http://earlywarig.usgs.gov:8080/ewx/ide.html March LST Aomalies

Number of Rai Days

Ma Cosecutive Dry Days

Settig Data pairs of two attributes X & Y, measured at N samplig uits: there are N pairs of attribute values {(, y ), = 1,..., N} Scatter plot: graph of y- versus -values i attribute space: y-values serve as coordiates i vertical ais, -values as coordiates i horizotal ais; -th poit i scatter-plot has coordiates (, y ) 1 4 Objective: provide a quatitative summary of the above scatter plot as a measure of associatio betwee - ad y-values C. Fuk Geog 210C Sprig 2011

Scatter Plot Quadrats (, y) Scatter plot ceter: poit with coordiates equal to the data meas: N 1 N 1 N, y N y 1 1 Scatter plot quadrats: The lie etedig from the mea- parallel to the y-ais ad the lie etedig from the mea-y parallel to the -ais defie 4 quadrats i the scatter-plot. Deviatios from the mea: ay measure associatio betwee X ad Y should be idepedet of where the sample scatter plot is cetered. Cosequetly, we ll be lookig at deviatios of the data from their respective meas: (, y y) quadrat I: quadrat II: quadrat III: quadrat IV: 0, 0, 0, 0, y y y y y y y y 0 0 0 0 1 5 C. Fuk Geog 210C Sprig 2011

Products of Data Deviatios from their Meas Sice we are after a measure of associatio, we compute products of data deviatios from their meas. A large positive product idicates high - ad y-values of the same sig. A large egative product idicates high - ad y-values of differet sig. Product sigs i differet quadrats: 1 6 quadrat I: quadrat II: quadrat III: 0 & y 0 & y 0 & y y 0 y 0 y 0 )( y quadrat IV: 0 & y y 0 ( )( y y) 0 ( ( ( )( y )( y y) 0 y) 0 y) 0 C. Fuk Geog 210C Sprig 2011

Sample Covariace of a Scatter Plot Products of deviatios from meas: Average of N products: Sample covariace betwee data of attributes ad y: 1 7 Sample variace = covariace of a attribute with itself: y C. Fuk Geog 210C Sprig 2011

Iterpretig The Sample Covariace Sample covariace betwee data of attributes X ad Y : Iterpretatio: large positive covariace idicates data pairs predomiatly lyig i quadrats I ad III large egative covariace idicates data pairs predomiatly lyig i quadrats II ad IV small covariace idicates data pairs lyig i all quadrats, i which case positive ad egative products cacel out whe oe computes their mea 1 8 NOTE: The covariace is a measure of liear associatio betwee X ad Y, ad just a summary measure of the actual scatter plot C. Fuk Geog 210C Sprig 2011

Sample Covariace ad Correlatio Coefficiet Problems with sample covariace: ot easily iterpretable, sice - ad y-values ca have differet uits ad sample variaces sesitive to outliers; quatifies oly liear relatioships Sample correlatio coefficiet: Pearso s product momet correlatio: lies i [ 1, +1]; sesitive to outliers; quatifies oly liear relatioships Sample rak correlatio coefficiet: (Spearma s correlatio): rak trasform each sample data set, by assigig a rak of 1 to the smallest value ad a rak of N to the largest oe trasform each data pair {, y } ito a rak pair {r( ), r(y )}, where r( ) ad r(y ) is the rak of ad y compute the correlatio coefficiet of the rak pairs, as: ca detect o-liear mootoic relatioships 1 9 C. Fuk Geog 210C Sprig 2011

Momet of Iertia of a Scatter Plot Motivatio: Istead of lookig at average product of deviatios from mea, we could look at the momet of iertia of a scatter plot; that is, the average squared distace betwee ay pair (, y ) ad the 45 lie; Note: such a lie does ot always make sese, but so be it for ow Momet of iertia = average deviatio of scatter plot poits from the 45 lie: Note: The momet of iertia for a scatter plot aliged with the 45 lie is always 0; that is, y, yy 0 2 0 alteratively, the dissimilarity of a attribute with itself is 0 C. Fuk Geog 210C Sprig 2011

Lik Betwee Covariace ad Momet of Iertia Recall: Epadig: 2 1 What s the differece: To estimate the momet of iertia γxy you do ot eed to kow the mea values μ X ad μ Y ; these two mea values are required for estimatig the covariace σ XY C. Fuk Geog 210C Sprig 2011

Geometric Iterpretatio (I) Vector legth: legth = distace of poit with coordiates { 1,..., N } from origi Vector-scalar multiplicatio: Multiplicatio of a vector by a scalar c chages legth (ad directio, depedig o sig of c): Ier product of two vectors: a scalar quatity (could be egative, zero or positive) Vector legth: ier product of a vector with itself 2 2 Agle θ betwee two vectors y ad : C. Fuk Geog 210C Sprig 2011

Geometric Iterpretatio (II) ~ Let deote the vector of deviatios (cetered vector) Variace: Covariace: Correlatio Coefficiet Iterpretatio 2 3 C. Fuk Geog 210C Sprig 2011

Geometric Iterpretatio (III) Projectio vector: ( shadow ) of vector y oto vector : Projectio legth: Uit Vector The sample mea vector Regressio = projectio: 2 4 C. Fuk Geog 210C Sprig 2011

Computig Multivariate Sample Statistics (I) Multivariate data set: N measuremets o K attributes {X 1,..., X K } made at N samplig uits ad arraged i a (N K) matri X: k = -th measuremet for the k-th variable X k -th row cotais K measuremets of differet attributes at a sigle samplig uit k-th colum cotais N measuremets of a sigle attribute at all N samplig uits Multivariate sample mea: Coditioal multivariate mea vector: (K 1) vector of mea values for all K attributes, computed oly from those rows of X whose etries satisfy some coditio (or query) 2 5 C. Fuk Geog 210C Sprig 2011

Computig Multivariate Sample Statistics (II) Matri of meas: Matri of deviatios from meas: 2 6 C. Fuk Geog 210C Sprig 2011

Computig Multivariate Sample Statistics (III) Matri of squares ad cross-products: Sample covariace matri: 2 7 Note: I the presece of missig values, oe should compute all variace ad covariace values oly from those N < N rows of matri X with o missig values. This esures that the resultig covariace matri Σ is a valid oe. Coditioal covariace matri: (K K) covariace matri betwee all K 2 pairs of attributes, computed oly from those rows of X whose etries satisfy some coditio (or query) C. Fuk Geog 210C Sprig 2011