Statistics: A review. Why statistics?

Similar documents
ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

Spatial Regression. 6. Specification Spatial Heterogeneity. Luc Anselin.

Exploratory Spatial Data Analysis (ESDA)

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects.

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES

PhD/MA Econometrics Examination. January, 2015 PART A. (Answer any TWO from Part A)

Modeling Spatial Relationships using Regression Analysis

Multiple Linear Regression

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

POLI 618 Notes. Stuart Soroka, Department of Political Science, McGill University. March 2010

Part 8: GLMs and Hierarchical LMs and GLMs

Lecture 19 Multiple (Linear) Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Data Analysis as a Decision Making Process

Exam Applied Statistical Regression. Good Luck!

Modeling Spatial Relationships Using Regression Analysis. Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

STAT 525 Fall Final exam. Tuesday December 14, 2010

Modeling Spatial Relationships Using Regression Analysis

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Generalized Linear Models for Non-Normal Data

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

Experimental Design and Data Analysis for Biologists

Weighted Least Squares

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Comparing IRT with Other Models

Regression Analysis. A statistical procedure used to find relations among a set of variables.

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Multivariate and Multivariable Regression. Stella Babalola Johns Hopkins University

Unit 10: Simple Linear Regression and Correlation

Using Spatial Statistics Social Service Applications Public Safety and Public Health

176 Index. G Gradient, 4, 17, 22, 24, 42, 44, 45, 51, 52, 55, 56

Impact of serial correlation structures on random effect misspecification with the linear mixed model.

Bivariate Relationships Between Variables

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect,

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Checking model assumptions with regression diagnostics

Assoc.Prof.Dr. Wolfgang Feilmayr Multivariate Methods in Regional Science: Regression and Correlation Analysis REGRESSION ANALYSIS

Model Estimation Example

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models

Multivariate Time Series: Part 4

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes

Index. Regression Models for Time Series Analysis. Benjamin Kedem, Konstantinos Fokianos Copyright John Wiley & Sons, Inc. ISBN.

MS&E 226: Small Data

Monday 7 th Febraury 2005

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Binary Choice Models Probit & Logit. = 0 with Pr = 0 = 1. decision-making purchase of durable consumer products unemployment

Regression diagnostics

Unit 11: Multiple Linear Regression

Figure 36: Respiratory infection versus time for the first 49 children.

Community Health Needs Assessment through Spatial Regression Modeling

Spatial Econometrics

Modeling the Ecology of Urban Inequality in Space and Time

Introduction to Statistical Analysis using IBM SPSS Statistics (v24)

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

Introduction to Econometrics

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Outline. Overview of Issues. Spatial Regression. Luc Anselin

A GEOSTATISTICAL APPROACH TO PREDICTING A PHYSICAL VARIABLE THROUGH A CONTINUOUS SURFACE

ISyE 691 Data mining and analytics

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

Regression I: Mean Squared Error and Measuring Quality of Fit

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STAT Regression Methods

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses.

Ch. 5 Transformations and Weighting

Statistics Introductory Correlation

Longitudinal Modeling with Logistic Regression

Chapter 1 Linear Regression with One Predictor

Simple Linear Regression

Lecture #11: Classification & Logistic Regression

ST 740: Linear Models and Multivariate Normal Inference

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

ZERO INFLATED POISSON REGRESSION

Textbook Examples of. SPSS Procedure

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Lecture 7 Autoregressive Processes in Space

Lecture 10 Multiple Linear Regression

Bayesian Model Diagnostics and Checking

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Prospect. February 8, Geographically Weighted Analysis - Review and. Prospect. Chris Brunsdon. The Basics GWPCA. Conclusion

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

CS145: INTRODUCTION TO DATA MINING

Introduction to Econometrics Final Examination Fall 2006 Answer Sheet

8 Nominal and Ordinal Logistic Regression

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Review. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression

On dealing with spatially correlated residuals in remote sensing and GIS

Problem set 1: answers. April 6, 2018

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Time Series. Anthony Davison. c

Generalized linear models

Transcription:

Statistics: A review Why statistics?

What statistical concepts should we know? Why statistics? To summarize, to explore, to look for relations, to predict What kinds of data exist? Nominal, Ordinal, Interval and Ratio Populations and samples (sampling frames) (support) Sampling frames: Random, systematic, stratified, other? How to summarize? Measures of central tendency Measures of dispersion Looking for relations? Visualizations: Tables, graphics (scatterplots) Quantitative approaches: Correlations, regression

How to summarize a distribution using a single number?

Example: Grouping analysis

Z-score

Crosstabulation (Chi square statistics) Correlation Pearson s R (IR) Spearman s R (O) Measures of association

Regression What are some of the considerations we should make? Terminology?

Regression Variables considered as or Relations amongst variables? (Think r) How many cases? Too many variables? Residuals should be

Simple regression a statistical perspective One (or more) dependent (response) variables One or more independent (predictor) variables Linear regression is linear in coefficients: y = β + β x + β x + β x + or y = xβ 0 1 1 2 2 3 3..., Vector/matrix form often used Over-determined equations & least squares Regression modelling

Ordinary Least Squares (OLS) model yi = β0+ β1x1i + β2x2i + β3x3 i +... + εi, or y = Xβ+ ε Minimise sum of squared errors (or residuals) Regression modelling

OLS models and assumptions Model simplicity and parsimony Model over-determination, multi-collinearity and variance inflation Typical assumptions Data are independent random samples from an underlying population Model is valid and meaningful (in form and statistical) Errors are iid Independent; No heteroscedasticity; common distribution Errors are distributed N(0,σ 2 ) Regression modelling

Regression modelling Spatial modelling and OLS Positive spatial autocorrelation is the norm, hence dependence between samples exists Datasets often non-normal >> transformations may be required (Log, Box-Cox, Logistic) (show income) Samples are often clustered >> spatial declustering may be required Heteroscedasticity is common Spatial coordinates (x,y) can form part of the modelling process

Choosing between models Information content perspective and AIC AIC = 2 ln( L) + 2k AICc = 2 ln( L) + 2k n n k 1 where n is the sample size, k (and p below) is the number of parameters used in the model, and L is the likelihood function (ln is the natural log) Regression modelling

Some regression terminology Simple linear Multiple (multiple independent vars) Multivariate (multiple dependent and independent vars canonical analysis) SAR (Spatial Autoregressive model ) CAR (Conditional Autoregressive model) Logistic -- Binary data Poisson -- Count data Ecological Hedonic Analysis of variance (aka analysis of differences between means) Analysis of covariance Regression modelling

Regression & spatial autocorrelation (SA) Analyse the data for SA If SA significant then Proceed and ignore SA, or Permit the coefficients, β, to vary spatially (GWR), or Modify the regression model to incorporate the SA (e.g., include the Y s of the neighbours as predictor variables in the OLS expression; or, identify the missing variable that would explain the spatial autocorrelation). Regression modelling

Regression modelling Geographically Weighted Regression (GWR) Coefficients, β, allowed to vary spatially, β(t) Model: y = Xβ(t) + ε Coefficients determined by examining neighbourhoods of points, t, using distance decay functions (fixed or adaptive bandwidths) Weighting matrix, W(t), defined for each point

Geographically Weighted Regression Sensitivity model, decay function, bandwidth, point/centroid selection ESDA mapping of surface, residuals, parameters Significance testing Increased apparent explanation of variance Effective number of parameters AICc computations Regression modelling

Regression & spatial autocorrelation (SA) Modify the regression model to incorporate the SA, i.e. produce a Spatial Autoregressive model (SAR) Many approaches including: SAR e.g. pure spatial lag model, mixed model, spatial error model, etc. CAR a range of models that assume the expected value of the dependent variable is conditional on the (distance weighted) values of neighbouring points (e.g.,gwr) Spatial filtering e.g. OLS on spatially filtered data (e.g., declustering) Regression modelling