Week 2 Video 4. Metrics for Regressors

Similar documents
Transformations The bias-variance tradeoff Model selection criteria Remarks. Model selection I. Patrick Breheny. February 17

CHAPTER 6: SPECIFICATION VARIABLES

Introduction to Statistical modeling: handout for Math 489/583

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Chapter 19 Sir Migo Mendoza

An Introduction to Mplus and Path Analysis

An Introduction to Path Analysis

LECTURE 15: SIMPLE LINEAR REGRESSION I

Machine Learning Linear Regression. Prof. Matteo Matteucci

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Introduction to Within-Person Analysis and RM ANOVA

10. Alternative case influence statistics

Automatic Forecasting

Generalized Linear Models

Model Selection. Frank Wood. December 10, 2009

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

NATCOR. Forecast Evaluation. Forecasting with ARIMA models. Nikolaos Kourentzes

Dimension Reduction Methods

Graphical Diagnosis. Paul E. Johnson 1 2. (Mostly QQ and Leverage Plots) 1 / Department of Political Science

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

Machine Learning for OR & FE

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

2. Linear regression with multiple regressors

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago

The Simple Regression Model. Part II. The Simple Regression Model

MS-C1620 Statistical inference

Introduction to Machine Learning and Cross-Validation

Mixed Effects Models

Name Date Chiek Math 12

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

Model Estimation Example

Measures of Fit from AR(p)

STAT 100C: Linear models

Topic 4: Model Specifications

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

Final Exam - Solutions

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

Prediction of double gene knockout measurements

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Forecast comparison of principal component regression and principal covariate regression

Lecture 4: Multivariate Regression, Part 2

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Variable Selection in Predictive Regressions

Lecture 7: Modeling Krak(en)

Statistics 262: Intermediate Biostatistics Model selection

Introduction to Generalized Models

Week 1, video 3: Classifiers, Part 1

Alfredo A. Romero * College of William and Mary

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

How the mean changes depends on the other variable. Plots can show what s happening...

Cross-validation for detecting and preventing overfitting

Assessing Studies Based on Multiple Regression

Regression.

CS Homework 3. October 15, 2009

Linear model selection and regularization

MS&E 226: Small Data

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

Big Data Analysis with Apache Spark UC#BERKELEY

Testing and Model Selection

My data doesn t look like that..

Minimum Description Length (MDL)

= A. Example 2. Let U = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, A = {4, 6, 7, 9, 10}, and B = {2, 6, 8, 9}. Draw the sets on a Venn diagram.

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Lecture 3: Just a little more math

Performance of Cross Validation in Tree-Based Models

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Correlation A relationship between two variables As one goes up, the other changes in a predictable way (either mostly goes up or mostly goes down)

Machine Learning. Part 1. Linear Regression. Machine Learning: Regression Case. .. Dennis Sun DATA 401 Data Science Alex Dekhtyar..

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

PDEEC Machine Learning 2016/17

Regression I: Mean Squared Error and Measuring Quality of Fit

Prediction of Bike Rental using Model Reuse Strategy

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Confidence intervals CE 311S

Temporal context calibrates interval timing

An automatic report for the dataset : affairs

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Regression, Ridge Regression, Lasso

ECE521 week 3: 23/26 January 2017

ECNS 561 Multiple Regression Analysis

Biostatistics 301A. Repeated measurement analysis (mixed models)

Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Assessment I What s New? & Goodness-of-Fit

Time Series Forecasting Methods:

6. Assessing studies based on multiple regression

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Statistics and Quantitative Analysis U4320. Lecture 13: Explaining Variation Prof. Sharyn O Halloran

Chapter 7: Model Assessment and Selection

Group Comparisons: Differences in Composition Versus Differences in Models and Effects

LOGISTIC REGRESSION Joseph M. Hilbe

Classification and Regression Trees

Niche Modeling. STAMPS - MBL Course Woods Hole, MA - August 9, 2016

Transcription:

Week 2 Video 4 Metrics for Regressors

Metrics for Regressors Linear Correlation MAE/RMSE Information Criteria

Linear correlation (Pearson s correlation) r(a,b) = When A s value changes, does B change in the same direction? Assumes a linear relationship

What is a good correlation? 1.0 perfect 0.0 none -1.0 perfectly negatively correlated In between depends on the field

What is a good correlation? 1.0 perfect 0.0 none -1.0 perfectly negatively correlated In between depends on the field In physics correlation of 0.8 is weak! In education correlation of 0.3 is good

Why are small correlations OK in education? Lots and lots of factors contribute to just about any dependent measure

Examples of correlation values From Denis Boigelot, available on Wikipedia

Same correlation, different functions From John Behrens, Pearson

r 2 The correlation, squared Also a measure of what percentage of variance in dependent measure is explained by a model If you are predicting A with B,C,D,E r 2 is often used as the measure of model goodness rather than r (depends on the community)

RMSE/MAE

Mean Absolute Error Average of Absolute value (actual value minus predicted value)

Root Mean Squared Error (RMSE) Square Root of average of (actual value minus predicted value) 2

MAE vs. RMSE MAE tells you the average amount to which the predictions deviate from the actual values Very interpretable RMSE can be interpreted the same way (mostly) but penalizes large deviation more than small deviation

However RMSE is largely preferred to MAE The example to follow is courtesy of Radek Pelanek, Masaryk University

Radek s Example Take a student who makes correct responses 70% of the time And two models Model A predicts 70% correctness Model B predicts 100% correctness

In other words 70% of the time the student gets it right Response = 1 30% of the time the student gets it wrong Response = 0 Model A Prediction = 0.7 Model B Prediction = 1.0 Which of these seems more reasonable?

MAE 70% of the time the student gets it right Response = 1 Model A (0.7) Absolute Error = 0.3 Model B (1.0) Absolute Error = 0 30% of the time the student gets it wrong Response = 0 Model A (0.7) Absolute Error = 0.7 Model B (1.0) Absolute Error = 1

MAE Model A (0.7)(0.3)+(0.3)(0.7) 0.21+0.21 0.42 Model B (0.7)(0)+(0.3)(1) 0+0.3 0.3

MAE Model A (0.7)(0.3)+(0.3)(0.7) 0.21+0.21 0.42 Model B is better, according to MAE (0.7)(0)+(0.3)(1) 0+0.3 0.3

Do you believe it? Model A (0.7)(0.3)+(0.3)(0.7) 0.21+0.21 0.42 Model B is better, according to MAE (0.7)(0)+(0.3)(1) 0+0.3 0.3

RMSE 70% of the time the student gets it right Response = 1 Model A (0.7) Squared Error = 0.09 Model B (1.0) Squared Error = 0 30% of the time the student gets it wrong Response = 0 Model A (0.7) Squared Error = 0.49 Model B (1.0) Squared Error = 1

RMSE Model A (0.7)(0.09)+(0.3)(0.49) 0.063+0.147 0.21 Model B (0.7)(0)+(0.3)(1) 0+0.3 0.3

RMSE Model A is better, according to RMSE. (0.7)(0.09)+(0.3)(0.49) 0.063+0.147 0.21 Model B (0.7)(0)+(0.3)(1) 0+0.3 0.3

RMSE Model A is better, according to RMSE. Does this seem more reasonable? (0.7)(0.09)+(0.3)(0.49) 0.063+0.147 0.21 Model B (0.7)(0)+(0.3)(1) 0+0.3 0.3

Note Low RMSE is good High Correlation is good

What does it mean? Low RMSE/MAE, High Correlation = Good model High RMSE/MAE, Low Correlation = Bad model

What does it mean? High RMSE/MAE, High Correlation = Model goes in the right direction, but is systematically biased A model that says that adults are taller than children But that adults are 8 feet tall, and children are 6 feet tall

What does it mean? Low RMSE/MAE, Low Correlation = Model values are in the right range, but model doesn t capture relative change Particularly common if there s not much variation in data

Information Criteria

BiC Bayesian Information Criterion (Raftery, 1995) Makes trade-off between goodness of fit and flexibility of fit (number of parameters) Formula for linear regression BiC = n log (1- r 2 ) + p log n n is number of students, p is number of variables

BiC Values over 0: worse than expected given number of variables Values under 0: better than expected given number of variables Can be used to understand significance of difference between models (Raftery, 1995)

BiC Said to be statistically equivalent to k-fold crossvalidation for optimal k The derivation is somewhat complex BiC is easier to compute than cross-validation, but different formulas must be used for different modeling frameworks No BiC formula available for many modeling frameworks

AIC Alternative to BiC Stands for An Information Criterion (Akaike, 1971) Akaike s Information Criterion (Akaike, 1974) Makes slightly different trade-off between goodness of fit and flexibility of fit (number of parameters)

AIC Said to be statistically equivalent to Leave-Out- One-Cross-Validation

AIC or BIC: Which one should you use? <shrug>

All the metrics: Which one should you use? The idea of looking for a single best measure to choose between classifiers is wrongheaded. Powers (2012)

Next Lecture Cross-validation and over-fitting