Q1: What is the interpretation of the number 4.1? A: There were 4.1 million visits to ER by people 85 and older, Q2: What percent of people 65-74

Similar documents
Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Business Statistics. Lecture 9: Simple Regression

University of California, Berkeley, Statistics 131A: Statistical Inference for the Social and Life Sciences. Michael Lugo, Spring 2012

Math 147 Lecture Notes: Lecture 12

Statistics 100 Exam 2 March 8, 2017

FSA Algebra I End-of-Course Review Packet

Review of Multiple Regression

Section 5: Dummy Variables and Interactions

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapter 16. Simple Linear Regression and Correlation

MATH 1150 Chapter 2 Notation and Terminology

Intro to Linear Regression

Intro to Linear Regression

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

1 A Review of Correlation and Regression

AP Final Review II Exploring Data (20% 30%)

Inferences for Regression

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Section 3: Simple Linear Regression

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

9. Linear Regression and Correlation

ECON 497 Midterm Spring

12.7. Scattergrams and Correlation

Final Exam - Solutions

ST430 Exam 1 with Answers

Additional practice with these ideas can be found in the problems for Tintle Section P.1.1

Chapter 8. Linear Regression /71

4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Lesson 4 Linear Functions and Applications

Mathematics for Economics MA course

Stat 101 L: Laboratory 5

Final Exam - Solutions

Section 2.5 from Precalculus was developed by OpenStax College, licensed by Rice University, and is available on the Connexions website.

Chapter 16. Simple Linear Regression and dcorrelation

Examining Relationships. Chapter 3

CRP 272 Introduction To Regression Analysis

STAT 350 Final (new Material) Review Problems Key Spring 2016

Stat 20 Midterm 1 Review

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation

3.2: Least Squares Regressions

Copyright, Nick E. Nolfi MPM1D9 Unit 6 Statistics (Data Analysis) STA-1

Ch. 16: Correlation and Regression

Inference with Simple Regression

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Scatterplots and Correlation

Talking feet: Scatterplots and lines of best fit

Can you tell the relationship between students SAT scores and their college grades?

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

This module focuses on the logic of ANOVA with special attention given to variance components and the relationship between ANOVA and regression.

MINI LESSON. Lesson 2a Linear Functions and Applications

MATH 2200 PROBABILITY AND STATISTICS M2200FL081.1

Using a Graphing Calculator

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Regression Models REVISED TEACHING SUGGESTIONS ALTERNATIVE EXAMPLES

MEP Y7 Practice Book B

Do Now 18 Balance Point. Directions: Use the data table to answer the questions. 2. Explain whether it is reasonable to fit a line to the data.

PS2: Two Variable Statistics

H l o t lol t M t c M D gc o ed u o g u al a 1 g A al lg Al e g b e r r 1 a

Honors Algebra 1 - Fall Final Review

Chapter 3: Examining Relationships Review Sheet

Complete Week 9 Package

Data Analysis and Statistical Methods Statistics 651

Business Statistics 41000: Homework # 5

Correlation and Linear Regression

Lecture 18 MA Applied Statistics II D 2004

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

ECON3150/4150 Spring 2015

FSA Algebra I End-of-Course Review Packet

HUDM4122 Probability and Statistical Inference. February 2, 2015

Correlation and Regression

Regression of Inflation on Percent M3 Change

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Scatterplots. 3.1: Scatterplots & Correlation. Scatterplots. Explanatory & Response Variables. Section 3.1 Scatterplots and Correlation

Unit 6 - Introduction to linear regression

Lecture 1: Description of Data. Readings: Sections 1.2,

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

appstats27.notebook April 06, 2017

Correlation & Simple Regression

MAT 0022C/0028C Final Exam Review. BY: West Campus Math Center

Analysis of Bivariate Data

Chapter 3: Examining Relationships

LECTURE 15: SIMPLE LINEAR REGRESSION I

Unit 8: Exponential & Logarithmic Functions

Complete Week 8 Package

1. Use Scenario 3-1. In this study, the response variable is

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Chapter 27 Summary Inferences for Regression

This document contains 3 sets of practice problems.

Econometrics Homework 1

Algebra 1. Statistics and the Number System Day 3

STATISTICS 141 Final Review

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Chapter 12 : Linear Correlation and Linear Regression

Simple Linear Regression Using Ordinary Least Squares

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Graphing Equations in Slope-Intercept Form 4.1. Positive Slope Negative Slope 0 slope No Slope

AP Statistics - Chapter 2A Extra Practice

Transcription:

Lecture 4 This week lab:exam 1! Review lectures, practice labs 1 to 4 and homework 1 to 5!!!!! Need help? See me during my office hrs, or goto open lab or GS 211. Bring your picture ID and simple calculator.(note computation could be done in Excel) Do not be late.

Q1: What is the interpretation of the number 4.1? A: There were 4.1 million visits to ER by people 85 and older, Q2: What percent of people 65-74 visited the ER between 2009 and 2010? A: About 40%,

Q3: What is the total number of ER visits for senior citizens between 2009 and 2010? A: 19.6 millions, Q4: What do you think was the main purpose of this chart? What would be the conclusion? A: Although percentagewise the 85+ population visits ER the most their actual number is the smallest.

Q: What % of women age 30-34 in 2009 were pregnant? Q:Which age group experienced the biggest drop from 1990 to 2009? Q: Which age group had the highest rate of pregnancy in 1990? A: 13.8%, A: 18-19 year, A: 20-24 year,

Q: Which age group had the highest rate of pregnancy in 2009? Q:Which age group experienced the biggest jump from 1990 to 2009? Q: What do you think was the main purpose of this chart? What would be the conclusion? A :25-29 year, A: 35-39 year A :There exists a reverse trend! As years go by younger women are less often pregnant while older woman are more often pregnant

Q: For which period was the CIA estimate the highest? A 1928-1940, Q: For the period 60-65 what was the Official Soviet Income Growth? A about 7-8%, Q: For which period do we see the highest discrepancy between Khanin s and CIA estimates? A 1928-1940.

Q1. Which month has the highest tornado related deaths? A: April, Q2. Which two months have the highest number of tornados? A: May and June

Q3. Which month has the lowest number of tornado related deaths? A: July, Q4. The blue chart (tornado deaths) is shifted toward left compared to the red chart. What could be the likely reason for this? A: People tend to underestimate the danger and are unprepared at the beginning of tornado season.

Example Compare the # of tornadoes of the months of January, March, May and July. Which of the three months is the most correlated to Jan? Produce a scatter plot of the X axis representing January and the Y axis the month most correlated to January. What is the equation of the regression line and the R^2? If you see 30 tornadoes in January how many tornadoes would you expect to see in the month most correlated to January?

400 350 300 250 200 150 100 50 jan vs may y = 1.8034x + 139.13 R² = 0.0841 0 0 10 20 30 40 50 60 # of tornadoes in May, Y = 1.8034 * 30 + 139.13 = 193.23 Round to the nearest integer, Y = 193 tornadoes is one would expect to see in May if 30 tornadoes occurred in January.

A few more examples Q1. Consider the Chart and imagine if one would randomly pick a month, what is the probability that this month has more than 4000 tornadoes total? A1. From the chart we can see that there are only three months for which this happened: April, May and June, thus the probability is 3/12 = 1/4 Q2. Consider the Chart and imagine if one would randomly pick two different months, what is the probability that each of them would have more than 5000 tornadoes in total. A2. We have already seen that one can construct 66 different 2-pair months. (Just remember the correlation table). But there is only one pair that satisfies the requirement: (May, June). Thus only one-out-of-66 pairs would work. And probability is 1/66.

A few more examples Q3. Out of three statistical measurements we have learned: Median, Average and Standard Deviation, which one is NOT sensitive to outliers? A3. Median. Q4. Which of the three measurements, Median, Average and Standard Deviation, describe the center of the data? A4. Median and Average Note: In literature, we often use Mean and Average indistinguishably.

Multiple Regression Week of July 11 th : Exam 2!!!!! Do not miss Lab 5 and Lab 6! Exam 2 will cover materials from lab 0 to lab 6 and of course all lectures Review homework 2 to 8!

Warm up: Data in question are SAT.txt. So far we have learned how to work with individual charts based on two variables ( x variable and y variable). Q1.What is the interpretation of 0.5655 on High School chart? A: This is a tricky one. Students typically write: For each 1 point increase in High School GPA one expects 0.565 increase in College GPA; Another alternative and better answer: Given two students for which High School GPA differs by 1 point, one could expect that their College GPA will differ by 0.565 points. This second interpretation is useful for College administrators Q2.The largest R square is on the High School vs College GPA chart. How would you comment on this information? A: It looks like High school GPA has the closest relation to College GPA.

Warm up: Data in question are SAT.txt. So far we have learned how to work with individual charts based on two variables ( x variable and y variable). Q4.What is the interpretation of the number 0.822 on the third chart? A: The number 0.822 should not be interpreted. A person with zero High school GPA does not go to College. Q5. What does each dot represent on the chart? A: A student

Breakdown of R^2 from 0 to 0.2 from 0.2 to 0.4 from 0.4 to 0.6 from 0.6 to 0.85 from 0.85 to 1 (poor) (decent) (good) (very good) (excellent) So far so good. Clearly each of the three variables (high school GPA, SAT score and letters) had a positive influence on the college GPA. In other words, if one would like to predict the performance of an incoming freshman, each of these three predictors would be relevant.

So how do we compute a student s College GPA based on the letters, HS GPA and SAT? For example: If student A has quality of letter = 8 then his predicted College GPA = 0.1754(8)+1.0702 = 2.4134 Now if his High school GPA = 3.4 then his predicted College GPA = 0.5655(3.4)+0.822 = 2.7447 And if his SAT score is 1050 then his predicated College GPA = 0.0018(1050)+0.1519 = 2.0419

Warm up: Data in question are SAT.txt. So far we have learned how to work with individual charts based on two variables ( x variable and y variable). Natural question: If one really wants to predict the students performance (and many admission officers do) it would make sense to combine these three variables and try to predict the student s performance based on all three factors at once. But how to do so? For this we need multiple regression.

Multiple regression (using the same SAT data). To do this we need to click on Data, then on Data analysis, then on Regression. In dialog box for the input X-variables highlight all three columns, (and do not forget to include Labels) As far as Excel is concerned this highlighting of three columns (instead just one) is the only difference between Regression and Multiple Regression. After performing the above highlighting $B$1: $D$101 should appear in the box for Input X range $A1$:$A$101 should appear in the box for Input Y range) Do not forget to click on the box for Labels and then OK.

The following table should appear: We cannot visualize d- dimensions! One of the main and most important problems regarding multiple regressions is the visualization issue. Namely, unlike the one dim case where we can plot the chart and fit the line, here we have 3-dim inputs and 1-dim output and there are no mathematical ways to plot this chart; so we cannot see what is going on. Instead we must depend on the regression table. Predictions: Imagine a student with a good high school GPA=3.5, SAT=1300, and with a letter quality of 9, how do we use the Table above and predict his College GPA????

Multiple Regression Analysis Multiple regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of two or more variables- also called the predictors. More precisely, multiple regression analysis helps us to predict the value of Y for the given values of X 1, X 2,, X k By multiple regression, we mean models with just one dependent(y) and two or more independent (explanatory) variables(xs). The variable whose value is to be predicted is known as the dependent variable and the ones whose known values are used for prediction are known independent (explanatory) variables.

The Multiple Regression Model coefficient Standard Error t stat P-value Lower 95% Upper 95% Lower 95% Upper 95% Intercept 0.345908-0.37459 0.678905-0.879605 0.48759-0.87961 0.48759 Variable X1 0.115679 4.567089 0.009385 0.1456789 0.178938 0.145679 0.178938 Variable X2 0.00394 3.245097 0.657849 0.0000768 0.0013 7.68E-05 0.0013.... 0.059608 0.45689 0.034579-0.09786 0.123456-0.09786 0.123456 0.309457 5.345609 0.57689 0.0905949 0.58697 0.090595 0.58697 Variable Xk 0.009846-3.40958 0.00346 0.056784 0.678591 0.056784 0.678591 In general, the multiple regression equation of Y on X 1, X 2,, X k from the table above is given by: Y = b 0 + b 1 X 1 + b 2 X 2 + + b k X k Here b 0 is the intercept and b 1, b 2, b 3,, b k are analogous to the slope in linear regression equation and are also called regression coefficients. They can be interpreted the same way as slope. Thus if b i = 2.5, it would indicates that Y will increase by 2.5 units if X i increased by 1 unit.

The Multiple Regression Model In general, the multiple regression equation of Y on X 1, X 2,, X k is given by: Y = b 0 + b 1 X 1 + b 2 X 2 + + b k X k Use the above table on SAT data, predict an incoming student s College GPA if his High school GPA = 3.5, SAT = 1300, and Letter quality = 9. From the table above we can read that the predictive model is, Y, the College GPA = Y intercept + coefficient of GPA high* HS GPA + coefficient of SAT*SAT + coefficient of letters *letters College GPA = -0.15326 + 0.376351*3.5 + 0.001227*1300 + 0.022684*9 = 2.963 Thus the predicted College GPA is 2.963. Important: WE USED ALL THREE VARIABLES!

How Good Is the Regression? Once a multiple regression equation has been constructed, one can check how good it is (in terms of predictive ability) by examining the coefficient of determination (R-square, R^2). R- square always lies between 0 and 1. All software provides it whenever regression procedure is run. The closer R^2 is to 1, the better is the model and its prediction. In our case of the SAT data, the predictive model of College GPA based on High school GPA, SAT, and letters has R^2 = 0.3997 which is considered a decent model. Breakdown of R^2 from 0 to 0.2 from 0.2 to 0.4 from 0.4 to 0.6 from 0.6 to 0.85 (poor) (decent) (good) (very good) from 0.85 to 1 (excellent)

Let us practice a bit: In general, the multiple regression equation of Y on X 1, X 2,, X k is given by: Y = b 0 + b 1 X 1 + b 2 X 2 + + b k X k High School GPA SAT Letters Prediction (round to 2 decimal places) 2.5 1100 8?? 2.32 3.5 1100 8?? 2.70 4.0 900 9?? 2.66 2.5 1500 10?? 2.85

We start with a review: Data in question are SAT.txt. Clearly, each of the three variables (high school GPA, SAT score and letters) had a positive influence on the college GPA. In other words, if one would like to predict the performance of an incoming freshman, each of these three predictors would be relevant. So how do we compute a student s College GPA based on the letters, HS GPA and SAT?

So how do we compute a student s College GPA based on the letters, HS GPA and SAT? For example: If student A has quality of letter = 8 then his predicted College GPA = 0.1754(8)+1.0702 = 2.4134 Now if his High school GPA = 3.4 then his predicted College GPA = 0.5655(3.4)+0.822 = 2.7447 And if his SAT score is 1050 then his predicated College GPA = 0.0018(1050)+0.1519 = 2.0419

So how do we compute a student s College GPA based on the letters, HS GPA and SAT? Natural question: If one really wants to predict the students performance (and many admission officers do) it would make sense to combine these three variables and try to predict the student s performance based on all three factors at once. But how to do so? For this we need multiple regression.

Multiple regression (using the same SAT data). To do this we need to click on Data, then on Data analysis, then on Regression. In dialog box for the input X-variables highlight all three columns, (and do not forget to include Labels) As far as Excel is concerned this highlighting of three columns (instead just one) is the only difference between Regression and Multiple Regression. After performing the above highlighting $B$1: $D$101 should appear in the box for Input X range $A1$:$A$101 should appear in the box for Input Y range) Do not forget to click on the box for Labels and then OK.

The following table should appear: We cannot visualize d- dimensions! One of the main and most important problems regarding multiple regressions is the visualization issue. Namely, unlike the one dim case where we can plot the chart and fit the line, here we have 3-dim inputs and 1-dim output and there are no mathematical ways to plot this chart; so we cannot see what is going on. Instead we must depend on the regression table.

The Multiple Regression Model coefficient Standard Error t stat P-value Lower 95% Upper 95% Lower 95% Upper 95% Intercept 0.345908-0.37459 0.678905-0.879605 0.48759-0.87961 0.48759 Variable X1 0.115679 4.567089 0.009385 0.1456789 0.178938 0.145679 0.178938 Variable X2 0.00394 3.245097 0.657849 0.0000768 0.0013 7.68E-05 0.0013.... 0.059608 0.45689 0.034579-0.09786 0.123456-0.09786 0.123456 0.309457 5.345609 0.57689 0.0905949 0.58697 0.090595 0.58697 Variable Xk 0.009846-3.40958 0.00346 0.056784 0.678591 0.056784 0.678591 In general, the multiple regression equation of Y on X 1, X 2,, X k from the table above is given by: Y = b 0 + b 1 X 1 + b 2 X 2 + + b k X k Here b 0 is the intercept and b 1, b 2, b 3,, b k are analogous to the slope in linear regression equation and are also called regression coefficients. They can be interpreted the same way as slope. Thus if b i = 2.5, it would indicates that Y will increase by 2.5 units if X i increased by 1 unit.

The Multiple Regression Model In general, the multiple regression equation of Y on X 1, X 2,, X k is given by: Y = b 0 + b 1 X 1 + b 2 X 2 + + b k X k Use the above table on SAT data, predict an incoming student s College GPA if his High school GPA = 3.5, SAT = 1300, and Letter quality = 9. From the table above we can read that the predictive model is, Y, the College GPA = Y intercept + coefficient of GPA high* HS GPA + coefficient of SAT*SAT + coefficient of letters *letters College GPA = -0.15326 + 0.376351*3.5 + 0.001227*1300 + 0.022684*9 = 2.963 Thus the predicted College GPA is 2.963. Important: WE USED ALL THREE VARIABLES!

Practice Use Cars04-1 data. Your TASK is to use engine s size (litre), cylinders, horsepower, and weight (pounds) to predict the retail price of a car. Create the regression table.

Using the table predict the retail price of a car if the car s engine size = 4 litre, and the car has 6 cylinders, 400 horsepower, and its weight is 3200 pounds. Retail price = -31602.488 + -6504.722*engine size + 3547.161*cylinders + 162.451*horsepower + 8.508*weight. = -31602.488 + -6504.722*4 + 3547.161*6 + 162.451*400 + 8.508*3200. = $55,868.45

Back to SAT.txt data, after performing multiple regression with variable Y= College GPA we get the table below. Imagine a student with a good high school GPA=3.5, SAT=1300, and with letter quality of 9. The table below allows us to predict his College GPA!! Regression Statistics Multiple R 0.63225 R Square 0.39974 Adjusted R Square 0.38098 Standard Error 0.58948 Observations 100 ANOVA df SS MS F gnificance F Regression 3 22.2144 7.40479 21.3098 1.2E-10 Residual 96 33.3583 0.34748 Total 99 55.5727 Coefficientsandard Err t Stat P-value Lower 95%Upper 95%ower 95.0%pper 95.0% Intercept -0.15326 0.32294-0.47459 0.63616-0.79429 0.48776-0.79429 0.48776 GPA High 0.37635 0.11426 3.29377 0.00139 0.14954 0.60316 0.14954 0.60316 SAT 0.00123 0.0003 4.04636 0.00011 0.00063 0.00183 0.00063 0.00183 Letters 0.02268 0.05098 0.44495 0.65736-0.07851 0.12388-0.07851 0.12388 From this table we can read that the predictive model is: College GPA=-0.15326+GPA High *0.37635 +SAT *0.00123+Letters *0.02268 In our case this becomes College GPA=-0.15326+3.5 *0.37635 +1300*0.00123+9 *0.02268 =2.963 Thus predicted College GPA is 2.963. Question: Clearly we cannot be 100% sure that an incoming student with these given credentials will have the College GPA of exactly 2.96. Thus, the prediction 2.96 is only an approximation. But how accurate is this approximation? The regression table comes to the rescue: the number we use is highlighted above. Standard Error=0.589. In other words, statistical analysis implies that for these particular credentials we can expect that student s college GPA will be 2.96+/- 0.589. Another way to state this: the predicted GPA is in the interval [2.37, 3.52]

A bit more practice Regression Statistics Multiple R 0.63225 R Square 0.39974 Adjusted R Square 0.38098 Standard Error 0.58948 Observations 100 ANOVA df SS MS F gnificance F Regression 3 22.2144 7.40479 21.3098 1.2E-10 Residual 96 33.3583 0.34748 Total 99 55.5727 Coefficientsandard Err t Stat P-value Lower 95%Upper 95%ower 95.0%pper 95.0% Intercept -0.15326 0.32294-0.47459 0.63616-0.79429 0.48776-0.79429 0.48776 GPA High 0.37635 0.11426 3.29377 0.00139 0.14954 0.60316 0.14954 0.60316 SAT 0.00123 0.0003 4.04636 0.00011 0.00063 0.00183 0.00063 0.00183 Letters 0.02268 0.05098 0.44495 0.65736-0.07851 0.12388-0.07851 0.12388 Q1. What would be the interpretation for the coefficient GPA High =0.376? A For each point increase in Highs School GPA we expect a 0.367 point increase in College GPA. Q2. What is the interpretation of the number -0.153? A. The intercept has no interpretation here (it is impossible to have a student with 0 high school GPA and zero SAT going to college).

A bit more practice Regression Statistics Multiple R 0.63225 R Square 0.39974 Adjusted R Square 0.38098 Standard Error 0.58948 Observations 100 ANOVA df SS MS F gnificance F Regression 3 22.2144 7.40479 21.3098 1.2E-10 Residual 96 33.3583 0.34748 Total 99 55.5727 Coefficientsandard Err t Stat P-value Lower 95%Upper 95%ower 95.0%pper 95.0% Intercept -0.15326 0.32294-0.47459 0.63616-0.79429 0.48776-0.79429 0.48776 GPA High 0.37635 0.11426 3.29377 0.00139 0.14954 0.60316 0.14954 0.60316 SAT 0.00123 0.0003 4.04636 0.00011 0.00063 0.00183 0.00063 0.00183 Letters 0.02268 0.05098 0.44495 0.65736-0.07851 0.12388-0.07851 0.12388 Imagine the following scenario: Student A s High School GPA is 1 point higher than Student B s. On the other hand, Student B s SAT score is higher by 200 points. Their letters are of the same strength. Q3. Which of the two students will have a higher predicted College GPA? A: Student A will gain 0.3763 points due to his high school GPA and student B will gain 200*0.0012=0.24 points due to his superior SAT. Overall, Student A will have predicted GPA higher by 0.1363 points (since 0.3763-0.24=0.1363) Q4. What are the predicted GPA s for students A and B? A: It is impossible to state the predictions for students A and B since we do not know their actual credentials.

Using Sail boat data, we can make the following chart and the regression table. Regression Statistics Multiple R 0.92355 R Square 0.85294 Adjusted R Square 0.84477 Standard Error 4.41382 Observations 20 ANOVA df SS MS F nificance Regression 1 2033.9 2033.9 104.4 6E-09 Residual 18 350.67 19.482 Total 19 2384.6 Observe: Chart The line equation on the chart is Y=1.0129X-18.016 R^2 =08529 The chart offers visual information; we can actually see the dots (i.e. sail boats), the trend line and how well they fit. It does not extend beyond 1-dim input X Coefficientndard Er t Stat P-valueower 95% Intercept -18.016 4.1006-4.394 0.0004-26.63 Feet 1.01286 0.0991 10.218 6E-09 0.8046 Table The intercept is -18.016 and the coefficient next to Feet is 1.0129. Equation of the line: Y=1.0129X-18.016 R Square=0.8529 No visualization but it contains many more numbers, some of which we already used (and some of them we will use soon). Easily extends to d-dimensional input

More practice Regression Statistics Multiple R 0.92355 R Square 0.85294 Adjusted R Square 0.84477 Standard Error 4.41382 Observations 20 ANOVA df SS MS F nificance Regression 1 2033.9 2033.9 104.4 6E-09 Residual 18 350.67 19.482 Total 19 2384.6 Coefficientndard Er t Stat P-valueower 95% Intercept -18.016 4.1006-4.394 0.0004-26.63 Feet 1.01286 0.0991 10.218 6E-09 0.8046 Given the table and chart answer the following questions: Q1. What is the predicted weight for a sail boat that is 30 feet long? A: 12370 pounds =1.01286*30-18.016 = 12.3698 = 12.3698 * 1000 pounds = 12369.8 = about 12370 pounds Q2. This prediction comes with certain error estimate. What is it? In other words what is the interval prediction for this weight? A: the error is 4.41 thus the interval is [7960, 16780] pounds (remember, the units are in thousands of pounds and we truncated the decimals) =[12.3698-4.41, 12.3698+4.41] =[7.9598, 16.7798] = [7.9598*1000, 16.7798*1000] = [7959.8, 16779.8]= about [7960,16780] pounds!

More practice Regression Statistics Multiple R 0.92355 R Square 0.85294 Adjusted R Square 0.84477 Standard Error 4.41382 Observations 20 ANOVA df SS MS F nificance Regression 1 2033.9 2033.9 104.4 6E-09 Residual 18 350.67 19.482 Total 19 2384.6 Coefficientndard Er t Stat P-valueower 95% Intercept -18.016 4.1006-4.394 0.0004-26.63 Feet 1.01286 0.0991 10.218 6E-09 0.8046 Given the table and chart answer the following questions: Q3. What is the interpretation of the slope 1.01? A: For each foot increase the boat s weight increases by about 1010 pounds Q4. What is the interpretation for the intercept -18.016? A: The intercept has no real life interpretation here (no sailboat is zero feet long).