DO NOW Take a seat! Chromebooks out (if charged) SILENCE YOUR PHONE and put it in the pocket that has your number in the bulletin board (back wall). NO EXCEPTION! If I see your phone, I will take it!!! No food or drinks (except for water) are allowed in my room. Finish your food outside before you enter my classroom.
Multiple Regression Unit 7 - Day 5
DO NOW! 2 minutes: Read the article 2 minutes: Pair-share 1 minute: You share with a partner 1 minute: You listen to your partner
Speaking of Statistics Is there a direct relationship between level of cleanliness in students home and success? What factors might have contributed to the results of this study? What information would you like to see to complement this article? Other thoughts?
Multiple Regression: What is it? A simple regression model is an equation created by using trends and variation from real data for a specific time period. In a simple regression equation we have a dependent variable (y) and an independent variable (x). The independent variable is our predictor used to estimate future values for the dependent variable under certain conditions. We have explored 3 different models: linear, exponential and quadratic. We used the residuals and R-squared to choose the best. Focusing in linear models, we can also perform a multiple regression, where there are several independent variables and one dependent variable, and the equation is y = a + b ' x ' + b ( x ( + + b * x *
Does it make sense? Does a regression with two predictors even make sense? It does and that s fortunate because the world is too complex a place for simple linear regression alone to model it. Let s review a portion of one of our regression outputs from last class. The model is linear. 80.4% of the variance in pressure can be explained by the aging factor. y = 81.0 + 0.964x is the model. But what about the other 19.6%? Genetics, diet, stress? These could be our x (, x 3, x 4 in our model, making it a multiple regression.
For example If you know how to find the regression of %body fat on waist size, you can usually just add height to the list of predictors without having to think hard about how to do it. R ( = 67.8% For simple regression we found the Least Squares solution, the one whose coefficients made the sum of the squared residuals as small as possible. For multiple regression, we ll do the same thing but this time with more coefficients.
Remember: Equation R ( P-values y = 3.10 + 1.77x ' 0.60x ( or R ( gives the fraction of the variability of %body fat accounted for by the multiple regression model. (With waist alone predicting %body fat, the was 67.8%.) Waist size and height together account for about 71.3% of the variation in %body fat among men. We shouldn t be surprised that has gone up. It was the hope of accounting for some of that leftover variability that led us to try a second predictor.
How do we interpret the coefficients? y = 3.10 + 1.77x ' 0.60x ( or The intercept a in this example can be interpreted as the value you would predict for %body fat if both waist and height are equal to zero. However, this is only a meaningful interpretation if it is reasonable that both X 1 and X 2 can be 0, and if the data set actually included values for X 1 and X 2 that were near 0. If neither of these conditions are true, then the intercept a really has no meaningful interpretation. It just anchors the regression line in the right place.
How do we interpret the coefficients? y = 3.10 + 1.77x ' 0.60x ( or The first predictor b ' x ' represents the difference in the predicted value of y for each one-unit difference in x ', if x ( remains constant. The second predictor b ( x ( represents the difference in the predicted value of y for each one-unit difference in x (, if x ' remains constant. The regression equation indicates that each inch in waist size is associated with about a 1.77 increase in %body fat among men who are of a particular height. Each inch of height is associated with a decrease in %body fat of about 0.60 among men with a particular waist size. Both predictors are statistically significant!
Your turn to interpret! 5 minutes Calorie content of a breakfast cereal is linearly associated with its sugar content. Is that the whole story? Here s the output of a regression model that regresses calories for each serving on its protein(g), fat(g), fiber(g), carbohydrate(g), and sugars(g) content. 5 Minutes 5 0 click here to start timer
Can we run it in desmos? Sure we can, but we won t have p-values. Follow the following steps: Enter data in google spreadsheets or excel. Name your variables. Check for correlations between all possible pairs of variables. Ideally, your independent variables should have a low correlation between each other, but a high correlation with the dependent. Check for statistical significance. If everything looks OK, copy the table in google and paste it in desmos. Make sure your dependent in desmos is y ' and your independent are x ', x (, x 3, etc. In a new box type: y ' ~a + b ' x ' + b ( x ( (or keep adding if more variables, following the same format. Interpret!
Try it! The nursing instructor wishes to see whether a student s grade point average and age are related to the student s score on the state board nursing examination. She selects five students and obtains the following data.
Check your results!
Partner/Independent Work Work on your worksheet. If you do not finish today, please turn it in later. Use your Chromebook or personal computer to complete all the calculation and graphing steps. Open a google doc to add all your outputs and graphs. Save all your work in google and desmos. Make sure you keep everything in the same google doc.