Discussion # 6, Water Quality and Mercury in Fish
|
|
- Lily Johnson
- 5 years ago
- Views:
Transcription
1 Solution: Discussion #, Water Quality and Mercury in Fish Summary Approach The purpose of the analysis was somewhat ambiguous: analysis to determine which of the explanatory variables appears to influence the response variable might be somewhat different from analysis to develop a predictive model, since the former focuses on conclusions for individual variables while the latter doesn t. In neither case, though, is any of the explanatory variables of special a priori interest, so I think the appropriate method of analysis is model selection rather than hypothesis testing. Data splitting? Because the number of possible predictor variables is small relative to the number of observations, the data could be split into model-building and validation subsets without violating the guidelines concerning the ratio of observations to variables, especially if the split was not equal (e.g. randomly choose observations for model selection, leaving for validation). I did not do this, however, preferring to rely on PRESS for internal validation rather than using a small subset for external validation. Data Manipulations and Model Diagnostics Transformation (log or something similar) of all the explanatory variables except ph is useful to reduce the leverage of the observations with large values of these variables and to produce straighter relationship. Square-root transformation of the mercury variable makes the variability of the residuals more even, but the unevenness without this transformation is mild so I think the transformation is acceptable but not necessary. The conclusions are not greatly affected by this transformation. Examination of residual plots shows no important problems with the full or reduced models, apart from those resolved by these transformations of the variables. Results and Conclusion Alkalinity (log transformed) clearly is the single most useful water-quality variable for predicting the mean mercury level of a lake s bass. The model with only log alkalinity is best by all criteria except Cp if mercury is not transformed, and best by SBC (= BIC) if mercury is square-root transformed. Using chlorophyll (also log transformed) in addition to alkalinity may be slightly better than only alkalinity: this two-variable model is best by Cp, AICc, and PRESS if mercury is transformed, and second-best by PRESS if mercury is not transformed.
2 Prelminary Data Exploration Of the four explanatory variables, all but ph have skewed distributions (long upper tails); observations in the tails of these distributions could have high leverage. Log transformations eliminate this skew, and indeed log-alkalinity is somewhat skewed the opposite direction. alk 9 ph 9 9 ph cal 9... chl lncal lnchl As the scatterplot matrix below shows, all the variables are fairly strongly associated, positively so among the explanatory variables and negatively between them and the response variable, mercury. Most of these bivariate relationships, however, are strongly curvilinear, and strongly dominated by observations in the right tails of the skewed distributions. There also is one aberrant observation (lake, observation #9, shown by the black square) with a high level of mercury despite high levels of alkalinity and calcium... mercury. alk ph cal chl...
3 Log transformations largely straighten out these relationships and reduce the likely leverage of the observations with high values of the predictor variables; by doing so, they also make lake (observation 9) less unusual... mercury. ph lncal lnchl... Conclusion from data exploration Because of concerns about both nonlinearity and leverage I think it would be preferable to work with the transformed variables. In the following I show results using log transformations of the three variables (all but ph); similar results would be obtained if, for instance, alkalinity were square-root transformed. Diagnostics for Maximum Model The basic residual plots as well as the added-variable (=leverage = partial-regression) plots for the maximum model including all four possible predictor variables, all but ph having been log transformed are shown on the next page. They generally are acceptable. There does appear to be greater variability in the residuals at larger values of predicted mercury, and the distribution of the residuals is slightly skewed (long right tail). I don t feel either of these problems is severe enough to invalidate analysis using this model, but square-root transformation of the mercury variable does somewhat lessen both these concerns, as shown in the second set of residual and added-variable plots (two pages below).
4 mercury vs. all four variables, all but ph log transformed Normal Probability Plot of the s s Versus the s Histogram of the s s Versus the Order of the Data Partial Regression Plot of mercury vs. Partial Regression Plot of mercury vs. ph.. mercury s.. mercury s s ph s Estimated Slope of the Least Squares Line = -.9 Estimated Slope of the Least Squares Line = -.. Partial Regression Plot of mercury vs. lncal Partial Regression Plot of mercury vs. lnchl. mercury s.. mercury s lncal s lnchl s Estimated Slope of the Least Squares Line =.9 Estimated Slope of the Least Squares Line = -.
5 sqrt (mercury) vs. all four variables, all but ph log transformed Normal Probability Plot Versus Fits Histogram Versus Order Partial Regression Plot of sqrtmerc vs. Partial Regression Plot of sqrtmerc vs. ph.. sqrtmerc s.. -. sqrtmerc s s ph s Estimated Slope of the Least Squares Line = -.7 Estimated Slope of the Least Squares Line = -.. Partial Regression Plot of sqrtmerc vs. lncal Partial Regression Plot of sqrtmerc vs. lnchl.. sqrtmerc s.. sqrtmerc s lncal s lnchl s Estimated Slope of the Least Squares Line =. Estimated Slope of the Least Squares Line = -.7 I think analysis using either mercury or square-root transformed mercury is acceptable, and will show results for both in the following. A Note on AICc and SBC Values There are several ways to calculate AIC, AICc, and SBC (aka BIC). One difference is whether to include the term n ln n. Because this term is identical for all models (for a given data set), including it or not has not effect on comparisons among models, but does cause the values reported by different programs to differ. A more consequential difference is that some versions include σ in the count of parameters being estimated (giving a total count of p + ), while others only count the βs (for a count of p). This affects the p or [ln n] p terms in the formulae: if σ is counted, these terms become (p+)
6 and [ln n](p+). When p is small, the difference between these versions can be substantial, altering the comparisons among models of differeing sizes. The text uses p, while JMP apparently uses p+. In R, AIC uses p+ while extractaic uses p. In the following I show values computed using the formulae in the text and that I gave in lecture (i.e. including n ln n and using p rather than p+ as the number of parameters). I don t think for these data that different versions of the criteria will give different conclusions. Untransformed Mercury Model Selection Vars C-p AICc BIC PRESS variables ph. lncal. lnchlor , lncal , lnchlor , ph. ph, lnchlor , lncal, lnchlor , ph, lncal , ph, lnchlor. ph, lncal, lnchlor , ph, lncal, lnchlor To facilitate comparison, these criteria are plotted against p in the following. mercury Cp AICc +ph - +ph lnchl +lncal BIC +ph +lnchl +lncal ph +lnchl +lncal PRESS +lncal +lnchl p By AICc, BIC (= SBC), and PRESS, the model with only log-transformed alkalinity is best. The model with log-alkalinity and log-calcium is the smallest model to have Cp near p, and so would be selected by that criterion. This model also has AICc nearly as small as for the best model. Interestingly, though, by PRESS this model is worse than the other two-variable models combining either log-chlorophyll or ph with log-alkalinity.
7 There is much in common among the best models. Log-alkalinity is in every one of them, and is the only variable which by itself constitutes a good model. Combining log-alkalinity with either or both of log-calcium and log-chlorophyll gives good models, though whether they are better or worse than the model with only log-alkalinity depends on the criterion, as does the relative performance of these three models. Diagnostic evaluations log-alkalinity only The scatterplot of mercury vs. log-alkalinity. shows a fairly linear relationship, with one point. (lake ; blue diamond) somewhat to the left of the. main cloud of points and thus having moderate. leverage, and one point (lake, observation 9;. black square) quite far above the trend near the right side, with fairly high alkalinity and fairly high. mercury.. The plot of residuals vs. fits is quite straight. and featureless, apart from one high outlier (lake log (alkalinity) ). The observation with unusually low alkalinity (lake ) accordingly has an unusually high predicted level of mercury, but it fits the trend well and so has a small residual and presumably little influence. Interestingly, the uneven variance of the residuals seen for the full model is not apparent for this reduced model. The distribution of the residuals is fairly skewed, but with n = this is not a major problem. mercury mercury vs. log-alkalinity Normal Probability Plot of the s. s Versus the s Histogram of the s. s Versus the Order of the Data Larger models plots for two other good models with either log-calcium or log-chlorophyll added to log-alkalinity are quite similar to those for the single-variable model above. When log-chlorophyll is included the distribution of residuals is closer to Normal, but there is a
8 somewhat stronger pattern of increasing variability with larger values of predicted mercury. Conversely, the model combining log-alkalinity with log-calcium has slightly more even variability but a less Normal distribution. In all models lake (observation 9) is an outlier with a large positive residual, and lake has the highest predicted level of mercury. mercury vs. log-alkalinity + log-calcium Normal Probability Plot of the s s Versus the s mercury vs. log-alkalinity + log-chlorophyll Normal Probability Plot of the s s Versus the s Histogram of the s.. s Versus the Order of the Data Histogram of the s s Versus the Order of the Data Conclusion from diagnostics I see no serious problems with any of these models. I also therefore see no reasons to consider any of these models as more or less appropriate than any of the others, and thus no reason to prefer any of the larger models over the simple single-variable model with logalkalinity. Square-root Transformed Mercury Model Selection Vars C-p AICc BIC PRESS variables ph. lncal. lnchlor , lnchlor , lncal , ph. ph, lnchlor , lncal, lnchlor , ph, lnchlor , ph, lncal.9 ph, lncal, lnchlor , ph, lncal, lnchlor These criteria are plotted against p in the figure on the next page. The model with log-alkalinity and log-chlorophyll is best by Cp (has the smallest Cp as well as being the smallest model with Cp near p), as well as by AICc and PRESS. The model with only log-alkalinity is best by BIC and second-best by AICc and PRESS.
9 sqrt(mercury) sqrtm_cp + ph -7 sqrtm_aicc + ph + lncal lncal lnchl sqrtm_bic + ph + lncal + lnchl lnchl sqrtm_press + lncal + ph + lnchl p As was seen above for untransformed mercury, the model with log-alkalinity and logcalcium was the second best fitting two-variable model (by R and thus Cp, AICc, and BIC), but was somewhat worse by the PRESS critierion than the model with log-alkalinity and ph. There again is much in common among the good models: all include log-alkalinity, either alone or with one or both of log-calcium and log-chlorophyll. Diagnostic evaluations log-alkalinity + log-chlorophyll plots for this model show no substantial problems, except that yet again observation 9 (lake ) is a moderately high outlier. The distribution of residuals, while skewed, is less so than for the models above using untransformed mercury. square-root(mercury) vs. log-alkalinity + log-chlorophyll Normal Probability Plot. Versus Fits Histogram. Versus Order
10 log-alkalinity only The scatterplot of square-root-mercury vs.. log-alkalinity is quite similar to that shown above for untransformed mercury, showing a fairly linear. relationship with one point (lake ; blue diamond). somewhat to the left of the main cloud of points and one point (lake, observation 9; black. square) quite far above the trend near the right side,. with fairly high alkalinity and fairly high mercury. plots for this model are quite similar. to those just above for the model relating squareroot-mercury to log-alkalinity and log-chlorophyll. log(alkalinity) There again is the one high outlier (observation 9 = lake ) but no other apparent problems. square-root (mercury) vs. log (alkalinity) square-root (mercury) Normal Probability Plot. Versus Fits Histogram. Versus Order Conclusion from diagnostics I again see no serious problems with either of these models, so no basis for choosing between them based on assumptions/diagnostics. Overall Conclusion Either log-alkalinity alone, or log-alkalinity and log-chlorophyll together, are the best models for explaining/predicting mercury levels in the fish. Of the various models considered, I would choose the one using square-root-transformed mercury and both log-alkalinity and logchlorophyll as predictors, since this is the best model for square-root-mercury by PRESS (my favorite criterion), and the models for square-root-mercury have somewhat larger R than those for untransformed mercury.
Soil Phosphorus Discussion
Solution: Soil Phosphorus Discussion Summary This analysis is ambiguous: there are two reasonable approaches which yield different results. Both lead to the conclusion that there is not an independent
More information10. Alternative case influence statistics
10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationData Set 1A: Algal Photosynthesis vs. Salinity and Temperature
Data Set A: Algal Photosynthesis vs. Salinity and Temperature Statistical setting These data are from a controlled experiment in which two quantitative variables were manipulated, to determine their effects
More informationTopic 18: Model Selection and Diagnostics
Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables
More informationRegression Diagnostics Procedures
Regression Diagnostics Procedures ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF VARIANCE IN Y FOR EACH VALUE OF X For any fixed value of the independent variable X, the distribution of the
More informationFish act Water temp
A regression of the amount of calories in a serving of breakfast cereal vs. the amount of fat gave the following results: Calories = 97.53 + 9.6525(Fat). Which of the following is FALSE? a) It is estimated
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)
The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE
More informationLAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION
LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationAP Statistics. Chapter 9 Re-Expressing data: Get it Straight
AP Statistics Chapter 9 Re-Expressing data: Get it Straight Objectives: Re-expression of data Ladder of powers Straight to the Point We cannot use a linear model unless the relationship between the two
More informationSimple Linear Regression
Simple Linear Regression OI CHAPTER 7 Important Concepts Correlation (r or R) and Coefficient of determination (R 2 ) Interpreting y-intercept and slope coefficients Inference (hypothesis testing and confidence
More informationAP Statistics. Chapter 6 Scatterplots, Association, and Correlation
AP Statistics Chapter 6 Scatterplots, Association, and Correlation Objectives: Scatterplots Association Outliers Response Variable Explanatory Variable Correlation Correlation Coefficient Lurking Variables
More informationUNIT 12 ~ More About Regression
***SECTION 15.1*** The Regression Model When a scatterplot shows a relationship between a variable x and a y, we can use the fitted to the data to predict y for a given value of x. Now we want to do tests
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice
The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test
More informationIntroduction to Linear regression analysis. Part 2. Model comparisons
Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual
More informationAnalyzing the NYC Subway Dataset
PROJECT REPORT Analyzing the NYC Subway Dataset Short Questions Overview This project consists of two parts. In Part 1 of the project, you should have completed the questions in Problem Sets 2, 3, 4, and
More informationHow the mean changes depends on the other variable. Plots can show what s happening...
Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More informationConditions for Regression Inference:
AP Statistics Chapter Notes. Inference for Linear Regression We can fit a least-squares line to any data relating two quantitative variables, but the results are useful only if the scatterplot shows a
More informationSTA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:
STA 2023 Module 5 Regression and Correlation Learning Objectives Upon completing this module, you should be able to: 1. Define and apply the concepts related to linear equations with one independent variable.
More informationCorrelation and Regression
Correlation and Regression Dr. Bob Gee Dean Scott Bonney Professor William G. Journigan American Meridian University 1 Learning Objectives Upon successful completion of this module, the student should
More informationBivariate Data Summary
Bivariate Data Summary Bivariate data data that examines the relationship between two variables What individuals to the data describe? What are the variables and how are they measured Are the variables
More informationChapter 7. Scatterplots, Association, and Correlation
Chapter 7 Scatterplots, Association, and Correlation Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 29 Objective In this chapter, we study relationships! Instead, we investigate
More informationChapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.
Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright
More informationStatistical View of Least Squares
May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples
More information3.1 Scatterplots and Correlation
3.1 Scatterplots and Correlation Most statistical studies examine data on more than one variable. In many of these settings, the two variables play different roles. Explanatory variable (independent) predicts
More informationRegression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.
Regression Bivariate i linear regression: Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables and. Generally describe as a
More informationAMS 7 Correlation and Regression Lecture 8
AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation
More informationStatistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).
Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results
More informationChapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals
Chapter 8 Linear Regression Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8-1 Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fat Versus
More informationCorrelation and Regression Theory 1) Multivariate Statistics
Correlation and Regression Theory 1) Multivariate Statistics What is a multivariate data set? How to statistically analyze this data set? Is there any kind of relationship between different variables in
More informationApplied Econometrics (QEM)
Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationProject Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang
Project Report for STAT7 Statistical Methods Instructor: Dr. Ramon V. Leon Wage Data Analysis Yuanlei Zhang 77--7 November, Part : Introduction Data Set The data set contains a random sample of observations
More informationPBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.
PBAF 528 Week 8 What are some problems with our model? Regression models are used to represent relationships between a dependent variable and one or more predictors. In order to make inference from the
More informationWhat to do if Assumptions are Violated?
What to do if Assumptions are Violated? Abandon simple linear regression for something else (usually more complicated). Some examples of alternative models: weighted least square appropriate model if the
More informationChapter 3: Examining Relationships
Chapter 3: Examining Relationships Most statistical studies involve more than one variable. Often in the AP Statistics exam, you will be asked to compare two data sets by using side by side boxplots or
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationSimple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)
10 Simple Linear Regression (Chs 12.1, 12.2, 12.4, 12.5) Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 2 Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 3 Simple Linear Regression
More informationappstats8.notebook October 11, 2016
Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More information, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1
Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression
More informationStat 101 Exam 1 Important Formulas and Concepts 1
1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationStat 101 L: Laboratory 5
Stat 101 L: Laboratory 5 The first activity revisits the labeling of Fun Size bags of M&Ms by looking distributions of Total Weight of Fun Size bags and regular size bags (which have a label weight) of
More informationAny of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.
STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed
More informationBusiness Statistics. Lecture 9: Simple Regression
Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals
More informationREVIEW 8/2/2017 陈芳华东师大英语系
REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationTutorial 6: Linear Regression
Tutorial 6: Linear Regression Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction to Simple Linear Regression................ 1 2 Parameter Estimation and Model
More informationPsychology Seminar Psych 406 Dr. Jeffrey Leitzel
Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting
More information3rd Quartile. 1st Quartile) Minimum
EXST7034 - Regression Techniques Page 1 Regression diagnostics dependent variable Y3 There are a number of graphic representations which will help with problem detection and which can be used to obtain
More informationRegression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics
Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns
More informationRegression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics
Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns
More informationLecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population
Lecture 5 1 Lecture 3 The Population Variance The population variance, denoted σ 2, is the sum of the squared deviations about the population mean divided by the number of observations in the population,
More informationChapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania
Chapter 10 Regression Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Scatter Diagrams A graph in which pairs of points, (x, y), are
More informationINFERENCE FOR REGRESSION
CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We
More informationLinear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?
Did You Mean Association Or Correlation? AP Statistics Chapter 8 Be careful not to use the word correlation when you really mean association. Often times people will incorrectly use the word correlation
More informationChapter 3: Describing Relationships
Chapter 3: Describing Relationships Section 3.2 The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Chapter 3 Describing Relationships 3.1 Scatterplots and Correlation 3.2 Section 3.2
More informationRelationships Regression
Relationships Regression BPS chapter 5 2006 W.H. Freeman and Company Objectives (BPS chapter 5) Regression Regression lines The least-squares regression line Using technology Facts about least-squares
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More information7.0 Lesson Plan. Regression. Residuals
7.0 Lesson Plan Regression Residuals 1 7.1 More About Regression Recall the regression assumptions: 1. Each point (X i, Y i ) in the scatterplot satisfies: Y i = ax i + b + ɛ i where the ɛ i have a normal
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationInferences for linear regression (sections 12.1, 12.2)
Inferences for linear regression (sections 12.1, 12.2) Regression case history: do bigger national parks help prevent extinction? ex. area of natural reserves and extinction: 6 national parks in Tanzania
More informationLinear Regression In God we trust, all others bring data. William Edwards Deming
Linear Regression ddebarr@uw.edu 2017-01-19 In God we trust, all others bring data. William Edwards Deming Course Outline 1. Introduction to Statistical Learning 2. Linear Regression 3. Classification
More informationChapter 6. September 17, Please pick up a calculator and take out paper and something to write with. Association and Correlation.
Please pick up a calculator and take out paper and something to write with. Sep 17 8:08 AM Chapter 6 Scatterplots, Association and Correlation Copyright 2015, 2010, 2007 Pearson Education, Inc. Chapter
More informationK. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =
K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing
More informationLinear Regression is a very popular method in science and engineering. It lets you establish relationships between two or more numerical variables.
Lab 13. Linear Regression www.nmt.edu/~olegm/382labs/lab13r.pdf Note: the things you will read or type on the computer are in the Typewriter Font. All the files mentioned can be found at www.nmt.edu/~olegm/382labs/
More informationSimple Linear Regression
Simple Linear Regression 1 Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable Y (criterion) is predicted by variable X (predictor)
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More informationChapter 3: Regression Methods for Trends
Chapter 3: Regression Methods for Trends Time series exhibiting trends over time have a mean function that is some simple function (not necessarily constant) of time. The example random walk graph from
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationHOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable
Chapter 08: Linear Regression There are lots of ways to model the relationships between variables. It is important that you not think that what we do is the way. There are many paths to the summit We are
More informationChapter 9 Regression. 9.1 Simple linear regression Linear models Least squares Predictions and residuals.
9.1 Simple linear regression 9.1.1 Linear models Response and eplanatory variables Chapter 9 Regression With bivariate data, it is often useful to predict the value of one variable (the response variable,
More informationNonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp
Nonlinear Regression Summary... 1 Analysis Summary... 4 Plot of Fitted Model... 6 Response Surface Plots... 7 Analysis Options... 10 Reports... 11 Correlation Matrix... 12 Observed versus Predicted...
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationMATH 1150 Chapter 2 Notation and Terminology
MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the
More information7. Do not estimate values for y using x-values outside the limits of the data given. This is called extrapolation and is not reliable.
AP Statistics 15 Inference for Regression I. Regression Review a. r à correlation coefficient or Pearson s coefficient: indicates strength and direction of the relationship between the explanatory variables
More informationMODELING. Simple Linear Regression. Want More Stats??? Crickets and Temperature. Crickets and Temperature 4/16/2015. Linear Model
STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6 Least squares line Interpreting coefficients Cautions Want More Stats??? If you have enjoyed learning how to analyze data, and want to
More informationRegression Model Building
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation in Y with a small set of predictors Automated
More informationAnnouncements. Lecture 18: Simple Linear Regression. Poverty vs. HS graduate rate
Announcements Announcements Lecture : Simple Linear Regression Statistics 1 Mine Çetinkaya-Rundel March 29, 2 Midterm 2 - same regrade request policy: On a separate sheet write up your request, describing
More informationCHAPTER 5. Outlier Detection in Multivariate Data
CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for
More informationChapter 3. Measuring data
Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring
More informationScatterplots and Correlation
Chapter 4 Scatterplots and Correlation 2/15/2019 Chapter 4 1 Explanatory Variable and Response Variable Correlation describes linear relationships between quantitative variables X is the quantitative explanatory
More informationRegression Analysis. Regression: Methodology for studying the relationship among two or more variables
Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More informationTopic 23: Diagnostics and Remedies
Topic 23: Diagnostics and Remedies Outline Diagnostics residual checks ANOVA remedial measures Diagnostics Overview We will take the diagnostics and remedial measures that we learned for regression and
More informationSociology 6Z03 Review I
Sociology 6Z03 Review I John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review I Fall 2016 1 / 19 Outline: Review I Introduction Displaying Distributions Describing
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationEstimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.
Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.
More informationProbability Distributions
CONDENSED LESSON 13.1 Probability Distributions In this lesson, you Sketch the graph of the probability distribution for a continuous random variable Find probabilities by finding or approximating areas
More informationFinal Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58
Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple
More informationAP Statistics L I N E A R R E G R E S S I O N C H A P 7
AP Statistics 1 L I N E A R R E G R E S S I O N C H A P 7 The object [of statistics] is to discover methods of condensing information concerning large groups of allied facts into brief and compendious
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationChapter 7 Summary Scatterplots, Association, and Correlation
Chapter 7 Summary Scatterplots, Association, and Correlation What have we learned? We examine scatterplots for direction, form, strength, and unusual features. Although not every relationship is linear,
More informationMultiple linear regression S6
Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple
More informationBox-Cox Transformations
Box-Cox Transformations Revised: 10/10/2017 Summary... 1 Data Input... 3 Analysis Summary... 3 Analysis Options... 5 Plot of Fitted Model... 6 MSE Comparison Plot... 8 MSE Comparison Table... 9 Skewness
More informationCS 5014: Research Methods in Computer Science
Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and
More informationChapter 7. Scatterplots, Association, and Correlation. Copyright 2010 Pearson Education, Inc.
Chapter 7 Scatterplots, Association, and Correlation Copyright 2010 Pearson Education, Inc. Looking at Scatterplots Scatterplots may be the most common and most effective display for data. In a scatterplot,
More information