Applied Regression Modeling

Similar documents
Applied Regression Modeling

AN INTRODUCTION TO PROBABILITY AND STATISTICS

TEACH YOURSELF THE BASICS OF ASPEN PLUS

BASICS OF ANALYTICAL CHEMISTRY AND CHEMICAL EQUILIBRIA

Arrow Pushing in Organic Chemistry

STATISTICAL ANALYSIS WITH MISSING DATA

Statistical Methods. for Forecasting

Discriminant Analysis and Statistical Pattern Recognition

BASIC STRUCTURAL DYNAMICS

FOURIER TRANSFORMS. Principles and Applications. ERIC W. HANSEN Thayer School of Engineering, Dartmouth College

INTRODUCTION TO LINEAR REGRESSION ANALYSIS

INTRODUCTION TO CHEMICAL ENGINEERING COMPUTING

ANALYSIS OF ELECTRIC MACHINERY AND DRIVE SYSTEMS

REACTIVE INTERMEDIATE CHEMISTRY


TRANSPORT PHENOMENA FOR CHEMICAL REACTOR DESIGN

STRESS IN ASME PRESSURE VESSELS, BOILERS, AND NUCLEAR COMPONENTS

RESPONSE SURFACE METHODOLOGY

Arrow Pushing in Organic Chemistry

Regression Analysis by Example

Thermal Design. Heat Sinks, Thermoelectrics, Heat Pipes, Compact Heat Exchangers, and Solar Cells. HoSung Lee JOHN WILEY & SONS, INC.

Statistical Hypothesis Testing with SAS and R

TRANSPORT PHENOMENA AND UNIT OPERATIONS

Practical Statistics for Geographers and Earth Scientists

ELECTRONIC MATERIALS SCIENCE

GREEN CHEMISTRY AND ENGINEERING

WATER SOFTENING WITH POTASSIUM CHLORIDE

A FIRST COURSE IN INTEGRAL EQUATIONS

THE ORGANIC CHEMISTRY OF DRUG SYNTHESIS

Fundamental Concepts in Heterogeneous Catalysis

Quick Selection Guide to Chemical Protective Clothing Fourth Edition A JOHN WILEY & SONS PUBLICATION

Organometallics in Synthesis. Third Manual

A Second Course in Statistics: Regression Analysis

ELECTRON FLOW IN ORGANIC CHEMISTRY

DIFFERENTIAL EQUATION ANALYSIS IN BIOMEDICAL SCIENCE AND ENGINEERING

ENVIRONMENTAL LABORATORY EXERCISES FOR INSTRUMENTAL ANALYSIS AND ENVIRONMENTAL CHEMISTRY

PRINCIPLES OF CHEMICAL REACTOR ANALYSIS AND DESIGN

QUANTUM COMPUTING EXPLAINED

Regression Analysis By Example

ORGANO MAIN GROUP CHEMISTRY

APPLIED ELECTROMAGNETICS AND ELECTROMAGNETIC COMPATIBILITY

PROTEIN SEQUENCING AND IDENTIFICATION USING TANDEM MASS SPECTROMETRY

QUICK SELECTION GUIDE TO CHEMICAL PROTECTIVE CLOTHING

THERMAL ANALYSIS OF POLYMERS

COMPARATIVE STATICS ANALYSIS in ECONOMICS

Geometrical Properties of Differential Equations Downloaded from by on 05/09/18. For personal use only.

Monte-Carlo Methods and Stochastic Processes

Section 11: Quantitative analyses: Linear relationships among variables

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition

JWUS_LC-Khoo_Prelims.qxd 1/19/ :32 PM Page i Liquid Crystals

Bayesian Modeling Using WinBUGS

GIS AND TERRITORIAL INTELLIGENCE. Using Microdata. Jean Dubé and Diègo Legros

Design and Analysis of Experiments

Multivariate Data Analysis Joseph F. Hair Jr. William C. Black Barry J. Babin Rolph E. Anderson Seventh Edition

Statistics and Measurement Concepts with OpenStat

Linear Statistical Models

Linear Models in Statistics

PeopleSoft 8.8 Global Payroll Reports

HOW TO FIND CHEMICAL INFORMATION

ATMOSPHEMC CHEMISTRY AND PHYSICS

For Bonnie and Jesse (again)

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

VARIATIONS INTRODUCTION TO THE CALCULUS OF. 3rd Edition. Introduction to the Calculus of Variations Downloaded from

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Handbook of Regression Analysis

An INTRODUCTION to NUMERICAL METHODS and

Convective Heat Transfer

Spatial Analysis with ArcGIS Pro STUDENT EDITION

A Second Course in Statistics Regression Analysis William Mendenhall Terry Sincich Seventh Edition......

Statistics for Managers using Microsoft Excel 6 th Edition

Using Meteorology Probability Forecasts in Operational Hydrology

Mechanics of Fluid Flow

Risk Assessment in Geotechnical Engineering

Response Surface Methodology

INTRODUCTORY REGRESSION ANALYSIS

FRACTIONAL CALCULUS IN PHYSICS

Field Geophysics THIRD EDITION. John Milsom University College London

Course ID May 2017 COURSE OUTLINE. Mathematics 130 Elementary & Intermediate Algebra for Statistics

APPLIED STRUCTURAL EQUATION MODELLING FOR RESEARCHERS AND PRACTITIONERS. Using R and Stata for Behavioural Research

FORECASTING. Methods and Applications. Third Edition. Spyros Makridakis. European Institute of Business Administration (INSEAD) Steven C Wheelwright

Nuclear Chemistry. Principles of. Principles of Nuclear Chemistry Downloaded from

SPECTROSCOPY FOR THE BIOLOGICAL SCIENCES

QUANTUM MECHANICS. For Electrical Engineers. Quantum Mechanics Downloaded from

FORENSIC ANALYTICAL TECHNIQUES

Regulated CheInicals Directory

LINEAR MODELS IN STATISTICS

An Introduction to Nonlinear Partial Differential Equations

Contents. Acknowledgments. xix

PRACTICAL RAMAN SPECTROSCOPY AN INTRODUCTION

SUPERCRITICAL WATER. A Green Solvent: Properties and Uses. Yizhak Marcus

Fundamentals of Mass Determination

VIBRATIONS AND WAVES. George C. King. School of Physics & Astronomy, The University of Manchester, Manchester, UK

Applied Structural Equation Modelling for Researchers and Practitioners Using R and Stata for Behavioural Research

Forecasting: Methods and Applications

The Manchester Physics Series

Advanced Calculus of a Single Variable

INTRODUCTION TO LINEAR REGRESSION ANALYSIS

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

Chapter 13. Multiple Regression and Model Building

Transcription:

Applied Regression Modeling

Applied Regression Modeling A Business Approach Iain Pardoe University of Oregon Charles H. Lundquist College of Business Eugene, Oregon WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION

Copyright O 2006 by John Wiley & Sons, Inc. Allrightsreserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Denvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for pennission should be addressed to the Permissions Department, John Wiley & Sons, Inc., Ill River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://ww.wley.com/go/pennisjion. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no tepiesentalions or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability orfitnessfor a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format For information about Wiley products, visit our web site at www.wiley.com. Library of Congress CaUdoging-in-Publication Datei Pardoe, Iain Applied regression modeling: a business approach / Iain Pardoe. p. cm. Includes bibliographical references and index. ISBN 13:978-0^71-97033-0 (alk. paper) ISBN 10:0-471-97033-6 (alk. paper) 1. Regression analysis. 2. Statistics. I. Title. QA278.2.P363 2006 519.5*36 dc22 2006044262 10 987654321

To Tanya, Bethany, and Sierra

CONTENTS Preface Acknowledgments xiii xv Introduction xvii 1.1 Statistics in business xvii 1.2 Learning statistics xix 1 Foundations 1 1.1 Identifying and summarizing data 1 1.2 Population distributions 4 1.3 Selecting individuals at random probability 9 1.4 Random sampling 10 1.4.1 Central limit theorem normal version 11 1.4.2 Student's t-distribution 12 1.4.3 Central limit theorem t version 14 1.5 Interval estimation 14 1.6 Hypothesis testing 17 1.6.1 The rejection region method 17 1.6.2 The p-value method 19 1.6.3 Hypothesis test errors 23 1.7 Random errors and prediction 23 vii

Vlll CONTENTS 1.8 Chapter summary 26 Problems 27 2 Simple linear regression 31 2.1 Probability model for X and Y 31 2.2 Least squares criterion 36 2.3 Model evaluation 40 2.3.1 Regression standard error 41 2.3.2 Coefficient of determination R 2 43 2.3.3 Slope parameter 47 2.4 Model assumptions 54 2.4.1 Checking the model assumptions 54 2.5 Model interpretation 59 2.6 Estimation and prediction 60 2.6.1 Confidence interval for the population mean, E(K) 61 2.6.2 Prediction interval for an individual K-value 62 2.7 Chapter summary 65 2.7.1 Review example 66 Problems 70 3 Multiple linear regression 73 3.1 Probability model for (X\,X 2,...) and Y 73 3.2 Least squares criterion 77 3.3 Model evaluation 81 3.3.1 Regression standard error 81 3.3.2 Coefficient of determination R 2 82 3.3.3 Regression parameters global usefulness test 89 3.3.4 Regression parameters nested model test 93 3.3.5 Regression parameters individual tests 97 3.4 Model assumptions 105 3.4.1 Checking the model assumptions 106 3.5 Model interpretation 109 3.6 Estimation and prediction 111 3.6.1 Confidence interval for the population mean, E(Y) 111 3.6.2 Prediction interval for an individual K-value 112 3.7 Chapter summary 114 Problems 116 4 Regression model building I 121 4.1 Transformations 122 4.1.1 Natural logarithm transformation for predictors 122

CONTENTS IX 4.1.2 Polynomial transformation for predictors 128 4.1.3 Reciprocal transformation for predictors 130 4.1.4 Natural logarithm transformation for the response 134 4.1.5 Transformations for the response and predictors 137 4.2 Interactions 140 4.3 Qualitative predictors 146 4.3.1 Qualitative predictors with two levels 147 4.3.2 Qualitative predictors with three or more levels 153 4.4 Chapter summary 158 Problems 160 5 Regression model building II 165 5.1 Influential points 165 5.1.1 Outliers 165 5.1.2 Leverage 168 5.1.3 Cook's distance 171 5.2 Regression pitfalls 173 5.2.1 Autocorrelation 173 5.2.2 Multicollinearity 175 5.2.3 Excluding important predictor variables 177 5.2.4 Overfitting 180 5.2.5 Extrapolation 181 5.2.6 Missing Data 183 5.3 Model building guidelines 186 5.4 Model interpretation using graphics 188 5.5 Chapter summary 194 Problems 196 6 Case studies 201 6.1 Home prices 201 6.1.1 Data description 201 6.1.2 Exploratory data analysis 203 6.1.3 Regression model building 204 6.1.4 Results and conclusions 205 6.1.5 Further questions 210 6.2 Vehicle fuel efficiency 211 6.2.1 Data description 211 6.2.2 Exploratory data analysis 21 6.2.3 Regression model building 213 6.2.4 Results and conclusions 214 6.2.5 Further questions 219

X CONTENTS 7 Extensions 221 7.1 Generalized linear models 222 7.1.1 Logistic regression 222 7.1.2 Poisson regression 226 7.2 Discrete choice models 229 7.3 Multilevel models 232 7.4 Bayesian modeling 234 7.4.1 Frequentist inference 234 7.4.2 Bayesian inference 235 Appendix A: Computer software help 237 A.l SPSS 238 A. 1.1 Getting started and summarizing univariate data 238 A. 1.2 Simple linear regression 241 A. 1.3 Multiple linear regression 243 A.2 Minitab 245 A.2.1 Getting started and summarizing univariate data 245 A.2.2 Simple linear regression 248 A.2.3 Multiple linear regression 249 A.3 SAS 251 A.3.1 Getting started and summarizing univariate data 252 A.3.2 Simple linear regression 254 A.3.3 Multiple linear regression 255 A.4 R and S-PLUS 257 A.4.1 Getting started and summarizing univariate data 258 A.4.2 Simple linear regression 260 A.4.3 Multiple linear regression 261 A.5 Excel 263 A.5.1 Getting started and summarizing univariate data 263 A.5.2 Simple linear regression 265 A.5.3 Multiple linear regression 265 Problems 267 Appendix B: Critical values for t-dlstributions 269 Appendix C: Notation and formulas 273 C.l Univariate data 273 C.2 Simple linear regression 274 C.3 Multiple linear regression 275 Appendix D: Mathematics refresher 277

CONTENTS Xi D. 1 The natural logarithm and exponential functions 277 D.2 Rounding and accuracy 278 Appendix E: Brief answers to selected problems 279 References 287 Glossary 291 Index 297

PREFACE This book has developed from class notes written for the "Business Statistics" course taken primarily by undergraduate business majors in their junior year at the University of Oregon. This course is essentially an applied regression course, and incoming students have already taken an introductory probability and statistics course. The book is suitable for any undergraduate second statistics course in which regression analysis is the main focus. It would also be suitable for use in an applied regression course for nonstatistics major graduate students, including MBAs. Mathematical details have deliberately been kept to a minimum, and the book does not contain any calculus. Instead, emphasis is placed on applying regression analysis to data using statistical software, and understanding and interpreting results. Chapter 1 reviews essential introductory statistics material, while Chapter 2 covers simple linear regression. Chapter 3 introduces multiple linear regression, while Chapters 4 and 5 provide guidance on building regression models, including transforming variables, using interactions, incorporating qualitative information, and using regression diagnostics. Each of these chapters includes homework problems, mostly based on analyzing real datasets provided with the book. Chapter 6 contains two in-depth case studies, while Chapter 7 introduces extensions to linear regression and outlines some related topics. The appendices contain instructions on using statistical software (SPSS, Minitab, SAS, and R/S-PLUS) to carry out all the analyses covered in the book, a table of critical values for the t-distribution, notation and formulas used throughout the book, a glossary of important terms, a short mathematics refresher, and brief answers to selected homework problems. The first five chapters of the book have been successfully used in quarter-length courses over the last several years. An alternative approach for a quarter-length course would be to skip some of the material in Chapters 4 and 5 and substitute one or both of the case studies xlli