The Transactional. Understanding. with Attribute Data. Smita Skrivanek August 26, Copyright 2010 MoreSteam.com

Similar documents
Powering Up Your Lean Six Sigma Projects

Procedia - Social and Behavioral Sciences 109 ( 2014 )

Basic Business Statistics, 10/e

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Linear Regression With Special Variables

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Unit 11: Multiple Linear Regression

STAT 7030: Categorical Data Analysis

Chapter 10 Logistic Regression

ECONOMETRIC MODEL WITH QUALITATIVE VARIABLES

STATISTICS-STAT (STAT)

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

LOGISTIC REGRESSION Joseph M. Hilbe

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

More Statistics tutorial at Logistic Regression and the new:

Binary Logistic Regression

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Chapter 11. Regression with a Binary Dependent Variable

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

General Regression Model

Lecture #11: Classification & Logistic Regression

Multivariate Methods. Multivariate Methods: Topics of the Day

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Train the model with a subset of the data. Test the model on the remaining data (the validation set) What data to choose for training vs. test?

Administrative Data Research Facility Linked HMDA and ACS Database

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi

Introduction To Logistic Regression

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

CS6220: DATA MINING TECHNIQUES

Logistic regression: Miscellaneous topics

Introducing Generalized Linear Models: Logistic Regression

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Categorical data analysis Chapter 5

1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1

Generalized logit models for nominal multinomial responses. Local odds ratios

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Model generation and model selection in credit scoring

ECONOMICS TRIPOS PART I. Friday 15 June to 12. Paper 3 QUANTITATIVE METHODS IN ECONOMICS

CS6220: DATA MINING TECHNIQUES

15: CHI SQUARED TESTS

CIVL 7012/8012. Simple Linear Regression. Lecture 3

26:010:557 / 26:620:557 Social Science Research Methods

Finding the Gold in Your Data

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Logistic Regression: Regression with a Binary Dependent Variable

Applied Econometrics Lecture 1

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Multiple Regression: Chapter 13. July 24, 2015

Statistics for IT Managers

Lecture 4 Logistic Regression

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Conditional Probability Solutions STAT-UB.0103 Statistics for Business Control and Regression Models

Estimated Precision for Predictions from Generalized Linear Models in Sociological Research

Single-level Models for Binary Responses

Convergence with International Financial Reporting Standards: The Experience of Taiwan

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models

BIOS 625 Fall 2015 Homework Set 3 Solutions

Beyond GLM and likelihood

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Mass Fare Adjustment Applied Big Data. Mark Langmead. Director Compass Operations, TransLink Vancouver, British Columbia

Estimating Explained Variation of a Latent Scale Dependent Variable Underlying a Binary Indicator of Event Occurrence

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

WISE International Masters

Binary Dependent Variable. Regression with a

Basic Medical Statistics Course

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

Impact Evaluation of Mindspark Centres

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

13.1 Categorical Data and the Multinomial Experiment

ISQS 5349 Spring 2013 Final Exam

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

8 Nominal and Ordinal Logistic Regression

Statistics and Data Analysis

Big Data Analytics Linear models for Classification (Often called Logistic Regression) April 16, 2018 Prof. R Bohn

LI EAR REGRESSIO A D CORRELATIO

12 Generalized linear models

Advancing Green Chemistry Practices in Business

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Chapter 3 Multiple Regression Complete Example

Access to Quality Education Project (P145323)

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

ISQS 5349 Final Exam, Spring 2017.

Solutions to the Spring 2018 CAS Exam MAS-1

9. Linear Regression and Correlation

SYMBIOSIS CENTRE FOR DISTANCE LEARNING (SCDL) Subject: production and operations management

Just a Few Keystrokes Away

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Introduction to Linear Regression Analysis

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

Transcription:

The Transactional Dilemma: Understanding Regression with Attribute Data Smita Skrivanek August 26, 2010

Agenda Welcome Introduction of MBB Webcast Series Larry Goldman, MoreSteam.com The Transactional Dilemma: Understanding Regression with Attribute Data Smita Skrivanek, MoreSteam.com Open Discussion and Questions s 2

MoreSteam.com Company Background Founded 2000 Select Customers: Over 250,000 Lean Six Sigma professionals trained Serving 45% of the Fortune 500 First firm to offer the complete Black Belt curriculum online Courses reviewed and approved by ASQ Registered education provider of Project Management Institute (PMI) 3

Master Black Belt Program Offered in partnership with Fisher College of Business at The Ohio State University Employs a Blended Learning model with world-class instruction delivered in both the classroom and online Covers the MBB Body of Knowledge with topics ranging from advanced DOE to Leading Change to Finance for MBBs Go to http:///master-black-belt.cfm for more information about curriculum, prerequisites, and schedule 4

Today s Presenter Smita Skrivanek Principal Statistician, MoreSteam LLC Develops content, software functions, exam question banks and simulation games for MoreSteam s diverse client base EngineRoom Product Manager Masters in Applied Statistics from The Ohio State University and a MS from Mumbai University, India 5

The Dilemma Examples of categorical responses: Delinquent payments Return purchases Billing errors Brand preferences Delayed shipments Customer satisfaction ratings It is unnecessary (and often inappropriate) to use continuous data methods on categorical responses. Logistic regression is a more intuitive and powerful method in such cases. 6

Objectives What is binary logistic regression (BLR) When is a logistic approach appropriate (and why) Probabilities, Odds and Odds Ratios Logistic model interpretation Methods used to estimate t model coefficients, i evaluate model fit and compare alternative models How to approach the teaching of logistic regression to students 7

The Regression Model Ordinary Least Squares (OLS) Regression: Y = α+ βx+ε Dependent, Continuous Constant Regression Coefficient Independent Error E(Y x) = α + βx - infinity < < infinity Logistic/Logit Regression: Y = α + βx + ε Dependent, Binary 0 < E(Y x) = P(Y x) = < α 1+ βx -infinity < α + βx < infinity 8

OLS vs. BLR the OLS Model infinity Y = α + βx + ε Y Expenditure 0 X = Income 9

OLS vs. BLR Where We Go Wrong 1 o o oo o oooooooooooo o oooo o o o Reality Model P(Y) Y Buy House Logit(Y) = α + βx 0 o o oo oooooooooooo o o ooooo o o oo o X = Income P(Y) = α + βx e + 1+e α βx 10

OLS vs. BLR Initial Comparison OrdinaryLeast Squares (OLS) Independent data Binary Logistic Regression (BLR) Independent data Errors are normal, with Errors are bernoulli, with Constant variance (σ 2 ) Non-constant variance [p i (1-p i )] Y is linear in the predictors Logit(Y) is linear in the predictors 11

Probabilities and Odds: Logit(Y) = α + βx+ ε P(Event) = # Events #T Total Odds(Event) = # Events #N Non-Events P(Non-Event) = #NonEvents Non-Events # Total Odds(Non-Event) = # Non-Events # Events P(Event) + P(Non-Event) = 1 Odds(Event) * Odds(Non-Event) = 1 Odds(Event) = # Events # Non-Events = #Events #Totals #Non-Events #Totals = P(Event) P(Non-Event) 12

Probabilities and Odds: Logit(Y) = α + βx+ ε Own Car? Gender Yes No Total Male Female Total 62 48 110 157 185 342 219 233 452 P(Own) = 110 Odds(Own) = 110 = 0.24 = 0.24 452 342 0.76 = 0.32 P(Don t own) = 342 342 0.76 = 0.76 Odds(Don t own) = = 452 110 0.24 = 3.1 0.24 + 0.76 = 1 0.32* 3.1 = 1 13

The Odds Ratio: A Measure of Association Odds(Event Group 1) = Odds(Event Group 2) = Odds Ratio (Event) = P(Event in Group 1) P(Non-Event in Group 1) P(Event in Group 2) P(Non-Event in Group 2) Odds(Event/Group 1) Odds(Event/Group 2) X = Categorical: Odds ratio = the increase/decrease in the odds of the event in group 1 relative to group 2 X = Continuous: Odds ratio = the increase/decrease in the odds of the event for a unit increase in X 14

Odds Ratio of Owning: Logit(Y) = α + βx+ ε Own Car? Gender Yes No Total Male Female Total 62 48 110 157 185 342 219 233 452 Odds(Own/Male) = 62 = 0.39 Note: 157 Log(1.52) = 0.418 Odds(Own/Female) = 48 185 = 0.26 Odds Ratio(Own) = 0.39 0.26 = 1.52 Males have 1.52 times greater odds of owning a car than females. 15

Odds and Odds Ratios: Logit(Y) = α + βx+ ε Event Success: Y = 1 Non-Event Failure: Y = 0 X = x: Logit(Y =1) = Log-odds(Y = 1) = α + βx Odds ( Y = 1) = e ( α + βx ) X = Binary (0, 1): X = 1: Odds ( Y = 1 X = 1) = e = e ( α + β *1) α + β X = 0: ( α + β *0 ) Odds ( Y = 1 X = 0) = e = e α Odds Ratio (Y=1 X) = Odds(Y=1 X=1) Odds(Y=1 X=0) = e α e α +β = e α e β = α e β e 16

OLS vs. BLR : Logit(Y) = α + βx + ε OrdinaryLeast Squares (OLS) BinaryLogisticRegression (BLR) -infinity < β < infinity 0 < Odds ratio = e β < infinity β < 0 negative association Odds ratio = e β < 1 decreasing odds β > 0 positive association Odds ratio = e β > 1 increasing odds 17

Βeta vs. Exp(Beta): Logit(Y) = α + βx + ε 10 Odds Ratio(Y) = 1 Odds (Y Group 1) = Odds (Y Group 2) = 0.5 9 8 7 6 5 4 3 2 1 0 3 2 1 0 1 2 3 18

Odds Ratio of Owning: Multiple Predictors Own car Coeff (β) z P(Z> z ) constant 4.683 3.18 0.001 income age male 0.0102 0.246 0.418 0.02 3.55 202 2.02 0.986 0.000 0.044044 Odds Ratio = e β Income 0.99 A unit increase in income does not change the odds of owning a car. Age 1.28 Male 1.52 A unit increase in age increases the odds of owning a car by 28%. Males have a 52% higher odds of owning a car than females 19

Estimating the Parameters OLS Regression uses Minimum Least Squares method When applied to a logistic regression model, the estimators lose their desirable statistical properties. Logistic regression uses the Maximum Likelihood method Find values of the parameters α and β which make the probability of observing Y, i.e., P(Y = y) as large as possible. Best parameters to explain the observed data. 20

Assessing Fit and Comparing Models Comparing alternative models Does the model which includes the selected variables tell us more about the response variable than a model that does not include those variables? Assessing Goodness of Fit How well does our model fit the observed data (describe the response variable Y)? 21

Another Example: Late Debt Payments Do Age Category and/or Home Ownership affect P(Default) and if so, how? Default Coeff (β) Odds ratio (e β ) constant 0.4214 homeowner 0.2672 0.76 age (<35) age (35 64) 0.1512 0.2704 1.16 1.31 Qstn: What is the estimated probability that a renter aged 30 years will default on a loan payment? Log-Odds (Default) = 0.4214 0.2672*(0) + 0.1512*(1) + 0.2704*(0) = 0.5726 Odds (Default) = e 0.5726 = 1.773 P e Default) = 1+ + e 0.5726 ( 0. 5726 = 0.64 22

How to Teach Logistic Regression Keep it Simple. Use analogies between ordinary least squares (OLS) regression and binary logistic regression (BLR). Introduce BLR with a single independent variable, as is used to teach OLS. Illustrate concepts with contingency tables. Link logistic regression concepts to the interpretation of statistical computer outputs. 23

References Logistic Regression Models: Joseph M. Hilbe Applied Logistic Regression: David W. Hosmer, Stanley Lemeshow Teaching, Understanding and Interpretation of Logit Regression: Anthony Walsh (Teaching Sociology, Vol. 5, No. 2) Using and Interpreting Logistic Regression: Ilsa L. Lottes, Alfred DeMaris, Marina A. Adler (Teaching Sociology, Vol. 24, No. 3 ) 24

Thank you for joining us 25

Resource Links and Contacts Questions? Comments? We d love to hear from you. Smita Skrivanek, Principal Statistician - MoreSteam.com sskrivanek@moresteam.com Larry Goldman, Vice President Marketing - MoreSteam.com lgoldman@moresteam.com Additional Resources: Archived presentation, slides and other materials: http:///presentations/webcast-regression-analysis-attribute-data.cfm regression analysis attribute Master Black Belt Program: http:///master-black-belt.cfm 26