Sociology 301. Bivariate Regression. Clarification. Regression. Liying Luo Last exam (Exam #4) is on May 17, in class.

Similar documents
Sociology 301. Bivariate Regression II: Testing Slope and Coefficient of Determination. Bivariate Regression. Calculating Expected Values

Sociology 470. Bivariate Regression. Extra Points. Regression. Liying Luo Job talk on Thursday 11/3 at Pond 302

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Statistics MINITAB - Lab 2

Chapter 11: Simple Linear Regression and Correlation

18. SIMPLE LINEAR REGRESSION III

Lecture 6: Introduction to Linear Regression

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Statistics for Economics & Business

Basic Business Statistics, 10/e

28. SIMPLE LINEAR REGRESSION III

/ n ) are compared. The logic is: if the two

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Chapter 9: Statistical Inference and the Relationship between Two Variables

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Statistics for Business and Economics

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Lecture 3 Stat102, Spring 2007

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

STATISTICS QUESTIONS. Step by Step Solutions.

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

SIMPLE LINEAR REGRESSION

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Soc 3811 Basic Social Statistics Third Midterm Exam Spring 2010

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Linear Regression Analysis: Terminology and Notation

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Cathy Walker March 5, 2010

Properties of Least Squares

Learning Objectives for Chapter 11

Chapter 15 - Multiple Regression

Correlation and Regression

Midterm Examination. Regression and Forecasting Models

Introduction to Regression

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

The Ordinary Least Squares (OLS) Estimator

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

STAT 3008 Applied Regression Analysis

Chapter 14 Simple Linear Regression

Statistics II Final Exam 26/6/18

Comparison of Regression Lines

Scatter Plot x

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

e i is a random error

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Chapter 8 Indicator Variables

β0 + β1xi. You are interested in estimating the unknown parameters β

Activity #13: Simple Linear Regression. actgpa.sav; beer.sav;

17 - LINEAR REGRESSION II

Chapter 3 Describing Data Using Numerical Measures

Economics 130. Lecture 4 Simple Linear Regression Continued

Biostatistics 360 F&t Tests and Intervals in Regression 1

Statistics Chapter 4

Chapter 13: Multiple Regression

Chapter 8 Multivariate Regression Analysis

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

III. Econometric Methodology Regression Analysis

Dummy variables in multiple variable regression model

January Examinations 2015

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

x i1 =1 for all i (the constant ).

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

a. (All your answers should be in the letter!

Definition. Measures of Dispersion. Measures of Dispersion. Definition. The Range. Measures of Dispersion 3/24/2014

Chapter 4: Regression With One Regressor

This column is a continuation of our previous column

Statistical Evaluation of WATFLOOD

The topics in this section concern with the second course objective. Correlation is a linear relation between two random variables.

β0 + β1xi. You are interested in estimating the unknown parameters β

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Unit 10: Simple Linear Regression and Correlation

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Chapter 12 Analysis of Covariance

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

STAT 511 FINAL EXAM NAME Spring 2001

x = , so that calculated

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Reminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1

Kernel Methods and SVMs Extension

IV. Modeling a Mean: Simple Linear Regression

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Spatial Statistics and Analysis Methods (for GEOG 104 class).

ANOVA. The Observations y ij

First Year Examination Department of Statistics, University of Florida

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

Chapter 5 Multilevel Models

Diagnostics in Poisson Regression. Models - Residual Analysis

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Chapter 10. What is Regression Analysis? Simple Linear Regression Analysis. Examples

Professor Chris Murray. Midterm Exam

Transcription:

Socology 30 Bvarate Regresson Lyng Luo 04.28 Clarfcaton Last exam (Exam #4) s on May 7, n class. No exam n the fnal exam week (May 24). Regresson Regresson Analyss: the procedure for esjmajng and tesjng the relajonshp between conjnuous varables. Bvarate Regresson (Smple Lnear Regresson): two conjnuous varables MulJvarate Regresson: three or more conjnuous varables

Scatterplot A dagram that vsually dsplays the covarajon of two conjnuous varables as a set of ponts on X-Y coordnates Scatterplot A dagram that vsually dsplays the covarajon of two conjnuous varables as a set of ponts on X-Y coordnates Scatterplot A dagram that vsually dsplays the covarajon of two conjnuous varables as a set of ponts on X-Y coordnates Wfe s Age Husband s Age Unted States 20 0 0-0 -20-30 -40 20 30 40 50 60 70 80 90 Husband s Age Wfe s Age Husband s Age Serre Leone 20 0 0-0 -20-30 -40 20 30 40 50 60 70 80 90 Husband s Age

Scatterplot The basc shape of the relajonshp: a straght lne? a curve? a blob? The drecjon of the relajonshp: posjve? negajve? uncertan? The amount of varablty: how many ponts devate from the basc patern? Outlers or usual observajons? Scatterplot Shape Straght Lne Curved Lne Football Shape Shapeless Blob Scatterplot DrecJon Postve Negatve Postve Negatve

Scatterplot Varablty No Varablty Some Varablty More Varablty Almost a Shapeless Blob Scatterplot Outler No Outlers Maybe an Outler One Clear Outler One Lkely Outler Bvarate Regresson Conceptually, bvarate regresson nvolves drawng a lne through the ponts on the scaterplot that comes closest to the ponts on the Y dmenson.

Bvarate Regresson Regresson analyss nvolves esjmajng an equajon that... descrbes how, on average, the response varable (Y) s related to the predctor varable (X)...allows us to make esjmajon about the value of the response varable (Y) gven a specfed value of the predctor varable (X) When we regress Y on X we produce a model that esjmates Y on the bass of X. Bvarate Regresson TheoreJcal regresson model for populajon ntercept slope error TheoreJcal regresson model for sample Populaton Model Sample Model ntercept slope error ntercept F#ed regresson model for sample slope ntercept slope error Bvarate Regresson FTed regresson model for sample ntercept slope Usng the regresson equajon we can...esjmate the average value of Y for a gven value of X (to a less extent)predct an ndvdual s value of Y for a gven value of X

Bvarate Regresson FTed regresson model for sample ntercept slope Learnng objecjves Understand the basc dea of esjmajng regresson coeffcents Be able to nterpret and test regresson coeffcents Be able to compute and nterpret coeffcent of determnajon Estmatng Bvarate Regresson Coeffcent FTed regresson model for sample ntercept slope EsJmaJng bvarate regresson coeffcent: a ntercept:? b slope:? Exam2 Score 00 80 60 40 20 Regress Exam2 on Exam 0 0 20 40 60 80 00 Exam Score Estmatng Bvarate Regresson Coeffcent EsJmaJng bvarate regresson coeffcent: The lne that we draw... the values of ntercept a and slope b that we choose... maxmzes our ablty to predct the value of Y and thus mnmzes the predc5on errors. 2 N ( ) MathemaJcally, we choose the lne for whch Y Ŷ That s, the sum of the squared verjcal dstance s smallest. N 2 e s smallest. Ths least squares error sum crteron produces ordnary least squares (OLS) esjmates of a and b Games Exam 2 Score Won 20 00 80 60 40 Regressng Exam 2 Scores on Exam Scores 20 0 30 40 50 60 70 80 90 00 Exam Score

Estmatng Bvarate Regresson Coeffcent EsJmaJng bvarate regresson coeffcent: b Covarance Varance X XY s YX 2 sx N N ( Y Y)( X X) 2 ( X X) a large b vs a small b SXY SXY S 2 X S 2 Y S 2 X S 2 Y Estmatng Bvarate Regresson Coeffcent For the baseball example, the predcjon equajon s Ŷ 4.76 + 0.X Games Won Interpretng Bvarate Regresson Coeffcent FTed regresson model for sample ntercept slope InterpreJng bvarate regresson coeffcent: a ntercept: the fted value of Y when X0 b slope: the amount of change n Y for every one unt change n X

Interpretng Bvarate Regresson Coeffcent Ŷ 4.76 + 0.X Games Won How do we nterpret ths regresson equaton? It says that for every one unt ncrease n X (runs) we should observe a 0. unt ncrease n Y (wns) It also lterally says that f a team were to score zero runs n a season such that X0 we should observe that the team would wn game (more on ths later) Interpretng Bvarate Regresson Coeffcent Example : An economst s nterested n the relajonshp between annual salary (Y) and heght n nches (X). He regressed annual salary on heght and found the esjmated ntercept a30,000 and slope b350. Wrte down the fted regresson model and nterpret the ntercept and the slope. Interpretng Bvarate Regresson Coeffcent Example 2: A crmnologst s nterested n the relajonshp between number of homcde (Y) and medan household ncome (X) n neghborhoods. She regressed the number of homcde on medan household ncome. She found the esjmated ntercept a0. and slope b-0.5. Wrte down the fted regresson model and nterpret the ntercept and the slope.

For the baseball example, the predcton equaton s: For Ŷ the 4.76 + 0.X baseball example, Ŷ 4.76 + 0.X the predcton equaton s: Ŷ 4.76 + 0.X Ŷ 4.76 0.X 4.76 + 0.X Games Games Won Won Won Games Games Won Won Calculatng Expected Values Ŷ 4.76 + 0.X Games Won Interpretng the Regresson Equaton Interpretng the Regresson Equaton Interpretng the Regresson Equaton Interpretng the Regresson Equaton Usng the regresson equaton we can Ŷ 4.76 + 0.X 4.76 0.X estmate the Ŷ 4.76 + 0.X average value of Y for a gven value of X Ŷ 4.76 + 0.X How Ŷ do 4.76 predct we nterpret + 0.X an ndvdual s value of Y for a gven value of How X do we nterpret ths regresson equaton? ths regresson equaton? How do we nterpret ths regresson equaton? How do we nterpret ths regresson equaton? It says that for every one unt ncrease n X (runs) we should It says that for every one unt ncrease n (runs) we should How do we nterpret ths regresson equaton? It says that for every one unt ncrease n X (runs) we should observe It says that a 0. for every unt ncrease one unt n ncrease Y (wns) n X (runs) we should observe 0. unt ncrease n (wns) observe a 0. unt ncrease n Y (wns) It observe says that a 0. for every unt ncrease one unt n ncrease Y (wns) n X (runs) we should It observe also lterally a 0. says unt that ncrease f a team n Y (wns) were to score zero runs n a It also lterally says that f team were to score zero runs n It also lterally says that f a team were to score zero runs n a season It also lterally such says that that X0 f a we team should were observe to score that zero the runs team n a would season such that X0 we should observe that the team would season such that X0 we should observe that the team would wn It season also lterally such game says that (more that X0 on f ths a we team should later) were observe to score that zero the runs team n a would wn game (more on ths later) wn game (more on ths later) season wn such game that (more X0 on ths we should later) observe that the team would Usng wn the regresson game (more equaton ths later) we can Usng Usng the the regresson regresson equaton equaton we we can can Usng the regresson equaton we can estmate the average value of Y for a gven value of X estmate the average value of for gven value of Usng the regresson equaton we can estmate the average value of Y for a gven value of X estmate the average value of Y for a gven value of X predct an ndvdual s value of for a gven value of X predct an ndvdual s value of for gven value of estmate the average value of for a gven value of predct an ndvdual s value of Y for a gven value of X predct an ndvdual s value of Y for a gven value of X predct an ndvdual s value of Y for a gven value of X Calculatng Expected Values Interpretng the Regresson Equaton Interpretng Interpretng the the Interpretng the Regresson Equaton Regresson Regresson Equaton Equaton The equaton Ŷ 4.76 + 0.X The equaton 4.76 0.X means (n Englsh) that means (n Englsh) that The equaton Ŷ 4.76 + 0.X means (n Englsh) that The equaton Ŷ 4.76 + 0.X The Expected equaton Number 4.76 of Wns 0.X means (n Englsh) that Expected Number of Wns 0.Runs means + 0.Runs (n Englsh) that Expected Number of Wns + 0.Runs Expected Number of Wns + 0.Runs Expected Number of Wns 0.Runs How many wns would we predct team to wn f they How How many many wns wns would would we we predct predct a team team to to wn wn f f they they scored How many 80 wns runs? would we predct a team to wn f they How many wns would we predct team to wn f they scored scored 80 80 runs? runs? scored Expected 80 runs? scored 80 runs? Number of Wns + 0.(80) 83.35 Expected Expected Number Number of of Wns Wns + 0.(80) 0.(80) 83.35 83.35 Expected Number of Wns Expected Number of Wns + 0.(80) 0.(80) 83.35 83.35 What s the average number of wns among teams that What What s s the the average average number number of of wns wns among among teams teams that that score What What 750 s s the the runs? average average number number of of wns wns among among teams teams that that score score 750 750 runs? runs? score score Expected 750 750 runs? runs? Number of Wns + 0.(750) 77.74 Expected Expected Number Number of of Wns Wns + 0.(750) 0.(750) 77.74 77.74 Expected Expected Number Number of of Wns Wns + 0.(750) 0.(750) 77.74 77.74 Socology 38 3/3/205 Socology 38 Socology 38 ~ 3/3/205 7 ~ 3/3/205 7 Worksheet A fnancal analyst would lke to know the relajonshp between the mutual fund fees (X, n %) and ts annual yeld (Y, n %). She regressed annual yeld on fees and found n her sample a4 and b-0... Wrte down the fted regresson model. 2. Interpret the esjmated ntercept and slope. 3. What s the expected return for a mutual fund chargng % fee? 5% fee?

Testng Bvarate Regresson Coeffcent FTed regresson model for sample ntercept slope TesJng bvarate regresson coeffcent: a ntercept: We don t usually care b slope: Is the predctor (X) related to the response (Y)? Testng Bvarate Regresson Coeffcent Typcal null hypothess about populajon slope β: the predctor varable X has no lnear relajonshp wth the response varable Y. Gven the esjmated slope b wth sample data, how lkely the populajon slope β equals to 0? Most common: two-taled test: H 0: β0 vs H : β 0 (Far) less common: one-taled test: H 0: β 0 vs H : β<0 or H 0: β 0 vs H : β>0 Testng Bvarate Regresson Coeffcent Example : An economst s nterested n the relajonshp between annual salary (Y) and heght n nches (X). He regressed annual salary on heght and found the esjmated ntercept a30,000 and slope b350. State the null and research hypothess about the slope.

Testng Bvarate Regresson Coeffcent Example 2: A crmnologst s nterested n the relajonshp between number of homcde (Y) and medan household ncome (X) n neghborhoods. She regressed the number of homcde on medan household ncome. She found the esjmated ntercept a0. and slope b-0.5. State the null and research hypothess about the slope. Testng Bvarate Regresson Coeffcent The Central Lmt Theorem b a sample stajsjcs s normally dstrbuted wth meanβ and standard error (under some crcumstances of course). It means that we can use Z-staJsJcs to test whether β s dfferent from the hypotheszed value (usually 0). However, we usually don t know so we use t-stajsjcs nstead. and have to rely on sample nformajon, For ths class, (standard error of b) wll be provded. Testng Bvarate Regresson Coeffcent Example A cvl engneer s nterested n whether anxety level (X, a numerc measure rangng anywhere from 0 to 20 wth 20 beng extremely anxous) s assocated wth drvng speed on the hghway (X, n mles per hour). He regressed anxety level on speed and found the esjmated ntercept a75, the esjmated slope b2., and the esjmated standard error for the slope S b.2.. Wrte down the fted regresson model. 2. Interpret the esjmated ntercept and slope. 3. State the null and research hypothess about the slope. 4. Decde the alpha level and crjcal value(s). 5. Compute the test stajsjc. 6. Compare the test stajsjc to the crjcal value to make a decson. 7. State a techncal decson and a substanjve concluson.

Testng Bvarate Regresson Coeffcent Example 2 A socologst s nterested n nter-generajonal moblty defned as the relajonshp between father educajon and chld s educajon. He sampled n,485 ndvduals, and below s the SPSS output.. Wrte down the fted regresson model. 2. Interpret the esjmated ntercept and slope. 3. State the null and research hypothess about the slope. 4. Decde the alpha level and crjcal value(s). 5. Compute the test stajsjc. 6. Compare the test stajsjc to the crjcal value to make a decson. 7. State a techncal decson and a substanjve concluson. Coeffcent of Determnaton How well does X do n esjmajng/predcjng Y? A strong assocajon/relajonshp allows good esjmajon/predcjon. A weak assocajon/relajonshp means bad esjmajon/predcjon. To measure ths, we examne how much of the varajon n Y can be atrbuted to X and how much s random error. That s, we can parjjon the varance n Y nto the part atrbutable to X and the part atrbutable to error. Coeffcent of Determnaton ParJJon the varance n Y nto the part atrbutable to X and the part atrbutable to error. Y Y ( Ŷ Y) + ( Y Ŷ ) Porton of the devaton from the mean that s attrbutable to X Porton of the devaton from the mean that s attrbutable to random error Devatons from the mean can be expressed as the sum of () devatons of the predcted value from the mean and (2) ndvdual devatons from the predcted value

Coeffcent of Determnaton If we square each sde and then sum across cases we get: n Total Sum of Squares 2 2 n n ( Y Y) ( Ŷ Y) + ( Y Ŷ ) 2 Regresson Sum of Squares SS TOTAL SS REGRESSION + SS ERROR If we square each sde and then sum across cases we get: If When we square X s not related each n sde 2 to Y, 2 then and n then sum n across ( ) ( ) ( ) 2 cases we get: Y Y Ŷ Y + Y Ŷ n 2 knowng X does not help 2 n n esjmate or predct ( ) ( ) ( Y ) 2 Y Y Ŷ Y + Y Ŷ our best guess about Y_hat s Y_bar, so SS Total Regresson REGRESSION Error 0 and SS TOTAL SS ERROR Sum of Squares Sum of Squares Sum of Squares Total Regresson Error Sum of Squares SS TOTAL SS Sum REGRESSION of Squares + SS ERROR Sum of Squares If there s no SS assocaton TOTAL SS REGRESSION between Y + and SS ERROR X, then knowng X does not help predct Y If there s no assocaton between Y and X, then knowng X does In ths not case, help our predct best guess Y about Y-hat s Y-bar; thus SS REGRESSION equals zero and SS TOTAL SS ERROR In ths case, our best guess about Y-hat s Y-bar; thus SS REGRESSION equals zero and SS TOTAL SS ERROR Error Sum of Squares How How Well Well Does Does X Predct Predct Y? Y? Coeffcent of Determnaton How Well Does X Predct Y? The coeffcent of determnaton (R 2 YX) ndcates the proporton of the total varaton n Y that s determned by The coeffcent of determnaton (R ts lnear relatonshp wth X 2 YX) ndcates the The coeffcent proporton How of the Well of determnaton total varaton Does (R n X 2 YX) that Predct ndcates the s determned Y? by proporton ts 2 SS of TOTAL the SStotal varaton ERROR SS n Y that s determned by REGRESSION R lnear YX relatonshp wth ts lnear relatonshp SS wth TOTAL SS X TOTAL The 2 SSTOTAL SSERROR SSREGRESSION coeffcent 2YX SSTOTAL of SSdetermnaton ERROR (R 2 YX) ndcates the REGRESSION proporton R YX of SSTOTAL SSTOTAL SSthe total varaton If 2 TOTAL SS n Y that s determned by TOTAL ts lnear YX0, relatonshp then SS REGRESSION wth 0, X whch suggests that there s no If R assocaton 2 YX0, then SS between REGRESSION 0, whch suggests that there s no 2 SS and TOTAL SSERROR SSREGRESSION assocaton R between Y and X YX SS If 2 YX, then SS REGRESSION SS TOTAL whch suggests that there If R s no 2 TOTAL SSTOTAL YX, then SS error varaton REGRESSION SS and that TOTAL, whch suggests that there If R we can perfectly predct s 2 no YX0, then SS error varaton REGRESSION 0, whch suggests that there s no and that we can perfectly predct Y assocaton based on based on X between Y and X If R 2 YX, then SS REGRESSION SS TOTAL, whch suggests that there s no error varaton and that we can perfectly predct Y based on X How Well Does X Predct Y? 2 YX.0 R 2 YX.0 R 2 YX.0 R 2 YX.0 Coeffcent of Determnaton How Well Does X Predct Y? R 2 YX.0 R 2 YX.0 R 2 YX 0.25 R 2 YX 0.25 R 2 YX 0.25 R 2 YX 0.25 R 2 YX 0.25 R 2 YX 0.25 Socology 38 ~ 3/3/205 9 Socology 38 ~ 3/3/205 9

Coeffcent of Determnaton 2 For the baseball example: R SS SS TOTAL 28.63 2948.967 REGRESSION YX 0.383 Games Won Coeffcent of Determnaton An R 2 YX of 0.383 means that 38.3% of the varaton n Y (Wns) s explaned by X () Games Won