Chapter 9 Regression. 9.1 Simple linear regression Linear models Least squares Predictions and residuals.

Size: px
Start display at page:

Download "Chapter 9 Regression. 9.1 Simple linear regression Linear models Least squares Predictions and residuals."

Transcription

1 9.1 Simple linear regression Linear models Response and eplanatory variables Chapter 9 Regression With bivariate data, it is often useful to predict the value of one variable (the response variable, Y) from the other (the eplanatory variable, X). A curve or straight line that is drawn close to the crosses on a scatterplot can be used to predict the y-value corresponding to any. might consider how well the model would predict the y-value of the point, These predictions are called fitted values. Residuals = b 0 + b 1 i The difference between the i'th fitted values and its actual y-value is called its residual. e i = y i The residuals describe the 'errors' that would have resulted from using the model to predict y from the -values of our data points. Note that the residuals are the vertical distances of the crosses to the line Least squares Aim of small residuals The residuals from a linear model (vertical distances from the crosses to the line) indicate how closely the model's predictions match the actual responses in the data. Note that the response variable should always be drawn on the vertical ais. Linear model A linear model is an adequate description of many bivariate data sets: y = b 0 + b 1 The constant b 0 is the intercept of the line and describes the y-value when is zero. The constant b 1 is the line's slope; it describes the change in y when increases by one. Small residuals are good, so the parameters b 0 and b 1 should be set to make them as small as possible. Least squares Predictions and residuals Fitted values To assess how well a particular linear model fits any one of our data points, ( i, y i ), we The size of the residuals is summarised by the residual sum of squares, Σ ei 2 = Σ ( y ) 2 i y i = Σ ( y ) 2 i b 0 b 1 i 'Good' values for b 0 and b 1 can be objectively chosen to be the values that minimise the residual sum of squares. This is the method of least squares and the values of b 0 and b 1 are called least squares estimates.

2 The diagram below respresents the squared residuals as blue squares. The least squares estimates minimise the total blue area Normal linear model In an eperiment, several response measurements are often made at each distinct value of X. The diagram below shows one such data set using a histogram for the distribution of Y at each -value. Formulae The problem of minimising the residual sum of squares is not difficult mathematically, but you will rarely require or use the resulting formulae for b 0 and b 1 since spreadsheets, statistical programs and even scientific calculators will do the calculations for you. However, for completeness, the formulae are Σ( ) ( y y) b1 = b = Σ ( ) 2 0 y b1 Model for data The response measurements at any -value can be modelled as a random sample from a normal distribution. The collection of distributions of Y at different values of X is called a regression model Interest in generalising from data In most bivariate data sets, we have no interest in the specific individuals from which the data are collected. The individuals are 'representative' of a larger population or process, and our main interest is in this underlying population. Eample A newspaper compiled data from each of New Jersey's 21 counties about the number of people per bank branch in each county and its percentage of minority groups. Normal linear model for the response The most commonly used regression model is a normal linear model. It involves: Local residents might be interested in the specific counties, but most outsiders would want to generalise from the data to describe the relationship in a way that might describe other similar areas in the Eastern USA. How strong is the evidence that banks tend to have fewer branches in areas with large minority groups? Normality At each value of X, Y has a normal distribution. Constant variance The standard deviation of Y is the same for all values of X. Linearity The mean of Y is linearly related to X. The last two properties of the normal linear model can be epressed as

3 σ y = σ µ y = β 0 + β Another way to describe the model The normal linear model describes the distribution of Y for any value of X: where Y ~ normal (µ y, σ y ) µ y = β 0 + β 1 σ y = σ An equivalent way to write the same model is... y = β 0 + β 1 + ε where ε is called the model error and has a distribution ε ~ normal (0, σ) The error, ε, for a data point is the vertical distance between the cross on a scatterplot and the regression line. Response, Y data point (, y) regression line µ y = β 0 + β 1 A normal linear model, µ y = β 0 + β 1 σ y = σ involves 3 parameters, β 0, β 1 and σ. The model's slope, β 1, and intercept, β 0, can be interpreted in a similar way to the slope and intercept of a least squares line. Slope Increase in the mean response per unit increase in X. Intercept Mean response when X = 0. Eamples of interpretation Contet Interpretation of β 1 Interpretation of β 0 Y = Sales of music CD ($) X = Money spent on advertising ($) Y = Eam mark X = Hours of study by student before eam Increase in mean sales for each etra dollar spent on advertising Increase in epected mark for each additional hour of study Mean sales if there was no advertising Epected mark if there is no study β 0 + y β 1 error, ε Y = Hospital stay (days) X = Age of patient Average etra days in hospital per etra year of age Average days in hospital at age 0. Not particularly meaningful here. Band containing about 95% of values Eplanatory, X Applying the rule of thumb to the errors, about 95% of them will be within 2 standard deviations of zero i.e. between ±2σ. Since the errors are vertical distances of data points to the regression line, a band 2σ on each side of it should contain about 95% of the crosses on a scatterplot of the data. Response, Y µ y = β 0 + β 1 2σ 2σ Eercise: Pick the eplanatory variable and response Eercises are only available online Eercise: Draw a straight line Eercises are only available online Eercise: Find the slope and intercept Eercises are only available online Eercise: Interpret the slope and intercept Eercises are only available online Model parameters approimately 95% of crosses lie in this band Eplanatory, X Eercise: Find a residual Eercises are only available online.

4 9.2 Linear model assumptions Assumptions in a normal linear model The normal linear model is: y = β 0 + β 1 + ε ε ~ normal (0, σ) The following four requirements are implicit in the model but may be violated, as illustrated by the eamples. Linearity The response may change nonlinearly with. Y Constant standard deviation The response may be more variable at some than others. Y Probability plot of residuals The normal linear model assumes that the model errors are normally distributed, ε ~ normal (0, σ) A histogram of the residuals can be eamined for normality but a better way is with a normal probability plot of the residuals. If the residuals are normally distributed, the crosses in the normal probability plot should lie close to a straight line. X Normal distribution for errors The errors may have skew distributions. Y Independent errors When the observations are ordered in time, successive errors may be correlated. Y X X Time, X Warning If the assumptions of linearity and constant variance are violated, or if there are outliers, the probability plot of residuals will often be curved, irrespective of the error distribution Residual plots Problems may be immediately apparent in a scatterplot of the raw data, but a residual plot often highlights them Outliers Only draw a probability plot if you are sure that the data are linear, have constant variance and have no outliers. In a scatterplot, a cross that is unusually far above or below the regression line is an outlier. It would correspond to a large error, ε.

5 Response, Y outlier error, regression line Large residuals pull very strongly on the line since they are squared in the least squares criterion. As a result, Outliers will strongly pull the least squares line towards themselves, making their residuals smaller than you might otherwise epect. Eplanatory, X Leverage If an outlier corresponds to an -value near its mean, it usually will have a large residual, Residual plot Outliers are usually clearer if the residuals are plotted against X rather than the original response Standardised residuals (opt) To help assess the residuals, we usually standardise them dividing each by an estimate of its standard deviation. However if the outlier occurs at an etreme -value, it has a stronger influence on the position of the least squares line than the other data points. Such points are called high leverage points and pull the least squares line strongly towards them. Outliers that are high leverage points may therefore result in residuals that do not stand out from the other residuals. standardised residual = e s e The standardised residuals are each approimately normal (0, 1) if the normal linear model fits, so only about 5% will be outside the range ±2, and hardly any outside the range ±3. Standadised residuals greater than 3 or less than -3 are often taken to indicate possible outliers. Note however that in a large data set of 1,000 values, we would epect 50 values outside ±2 and 3 values outside ±3. Values a little outside ±3 can occur by chance Outliers and leverage (opt) Problems with residuals as indicators of outliers All data points pull the least squares line towards themselves the line is positioned to minimise the sum of squares of the residuals minimise Σ ei Eercise: Pick the correct residual plot Eercises are only available online Eercise: Identify regression problems (opt) Eercises are only available online.

6 9.3 Inference for regression parameters Estimating the slope and intercept Least squares In practical situations, we must estimate β 0, β 1 and σ from a data set that we believe satisfies the normal linear model. The best estimates of β 0 and β 1 are the slope and intercept of the least squares line, b 0 and b 1 β 0 + Response, Y y β 1 data point (, y) regression line µ y = β 0 + β 1 error, ε Eplanatory, X In practice, the slope and intercept of the regression line are unknown, so the errors are also unknown values, but the least squares residuals provide estimates. Response, Y Since b 0 and b 1 are functions of a data set that we assume to be a random sample from the normal linear model, b 0 and b 1 are themselves random quantities and have distributions. Simulated eample The diagram below represents a regression model with a grey band. A sample of 20 values has been generated from this model and the least squares line (shown in blue) has been fitted to the simulated data. The least squares line provides estimates of the slope and intercept but they are not eactly equal to the underlying model values. y b 0 + b 1 data point (, y) least squares line y = b 0 + b 1 residual, e Eplanatory, X Estimating the error standard deviation The third unknown parameter of the normal linear model, σ, is the standard deviation of the errors, σ = st devn( ε ) σ can be estimated from the least squares residuals, {e i }, σ = Σ e 2 n 2 This is similar to the formula for the standard deviation of the residuals, but uses the divisor (n 2) instead of (n 1). It describes the size of a 'typical' residual. Eample A different sample would give 20 different points and a different least squares line, so the least squares slope and intercept are random Estimating the error standard devn Errors and residuals The error, ε, for any data point is its vertical distance from the regression line.

7 9.3.3 Distn of least squares estimates The least squares line varies from sample to sample it is random. 3. the spread of -values is high To get the most accurate estimate of the slope from eperimental data, Reduce σ σ can be reduced by ensuring that the eperimental units are as similar as possible. Increase n Collect as much data as possible. Increase s Choose to run the eperiment with -values that are widely spread. The least squares estimates b 0 and b 1 of the two linear model parameters β 0 and β 1 therefore also vary from sample to sample and have normal distributions that are centered on β 0 and β 1 respectively. However don't just collect data at the ends of the 'acceptable' range of -values, even though this maimises s. Y Is the relationship linear? X Testing whether slope is zero Does the response depend on X? Standard error of least squares slope When the least squares slope, b 1, is used as an estimate of β 1, it has standard error, where σ σ σ b1 = = 2 1 s Σ( ) n σ is the standard deviation of the errors i.e. the spread of points around the regression line, n is the number of data points, and s is the sample standard deviation of X. Implications for data collection The standard error of b 1 is lowest when: 1. the response standard deviation, σ, is low 2. the sample size, n, is large In a normal linear model, the response has a distribution whose mean, µ y, depends linearly on the eplanatory variable, Y ~ normal (µ y, σ y ) If the slope parameter, β 1, is zero, then the response has a normal distribution that does not depend on X. Y ~ normal (β 0, σ) This can be tested formally with a hypothesis test for whether β 1 is zero. Hypothesis test H 0 : β 1 = 0 H A : β 1 0 The test is based on the 'statistical distance' of b 1 from zero, b1 t = σ b1 and this has a t distribution with (n - 1) degrees of freedom if there really is no relationship.

8 Summary statistic (helps distinguish H 0 and H A ) Test statistic (standard distribution with no unknown parameters under H 0 ) P-value (probability of more 'etreme' test statistic) p-value = sum of tail areas recorded Using output from statistical software Computer software will provide everything you need to perform the test in its regression output: Least squares estimates Standard error of slope Test statistic Eamples p-value Strength of evidence and relationship It is important to distinguish between the correlation coefficient, r, and the p-value for testing whether there is a relationship between X and Y. Correlation coefficient Describes the strength of the relationship between X and Y The p-value for testing whether X and Y are related Describes the strength of evidence for whether X and Y are related at all It is important not to confuse these two values when interpreting the p-value for a test. A p-value close to zero does not imply that there must be a strong relationship. It just means that we are sure that there is some relationship, however weak. A large p-value does not imply that the relationship must be weak. The sample size might just be too small to be sure that the relationship eists.

9 This is partly eplained by an alternative formula for the test statistic, t b1 = = r n 2 σ 1 2 b r 1 The test statistic and the p-value therefore both depend on both r and the sample size, n. Increasing n and increasing r both result in a lower p-value. n = 30 p-value = n = 200 p-value = r = 0.24 more data stronger relationship r = 0.63 p-value = Predicting the response Point estimates of the response Our point estimate (best guess) for a the response at a particular value of is = b 0 + b 1 Note that the least squares line should only be used for prediction when the linear model assumptions hold. In particular there should be: Variability of estimate at X (opt) The predicted response at X is and has a normal distribution with mean = b 0 + b 1 µ y = β 0 + β 1 Its standard deviation depends on the value at which the prediction is being made. The further is from its mean in the training data,, the greater the variability in the prediction. Simulation The effect of the -value on the variability of the predicted response can be shown using least squares lines fitted to simulated data: 1. No outliers or points with high leverage 2. No curvature It is also dangerous to predict far outside the range of the 's we have used to fit the model (the training data) since we have no information about whether the relationship remains linear. This is called etrapolation Estimated response distn at X (opt) A normal linear model provides a response distribution for all X. With estimates for all three model parameters, we can obtain the approimate response distribution at any - value, even if we have no data at that -value. The diagram below shows two theoretical distributions from the above model. (The spread would be even greater for predicting at = 10.)

10 9.4.4 Estimating the mean vs prediction (opt) Estimating the mean response In some situations, we are interested in estimating the mean response at some -value, The least squares estimate, µ y = β 0 + β 1 = b 0 + b 1 becomes increasingly accurate as the sample size increases (since b 0 and b 1 become more accurate estimates of β 0 and β 1 ). Predicting a single item's response To predict the response for a single new individual with a known -value, the same prediction would be used, = b 0 + b 1 However no matter how accurately we estimate the mean response for such individuals, a single new individual's response will have a distribution with standard deviation σ around this mean and we have no information to help us predict how far it will be from its mean. The prediction error cannot have a standard deviation that is less than σ. Simulation The error in predicting an individual's response is usually greater than the error in estimating the mean response. The diagram below contrasts estimation of the mean response and prediction of a new individual's response at = 5.5. Least squares lines have been fitted to several simulated data sets, one of which is shown on the left. The two kinds of errors from the simulations are shown on the right, showing that the prediction errors are usually greater Confidence & prediction intervals (opt) The same value, = b 0 + b 1 is used both to estimate the mean response at and to predict a new individual's response at, but the errors are different in the two situations they tend to be larger for predicting a new value. 95% confidence interval for mean response (b 0 + b 1 ) ± tn 2 σ b 0 +b 1 A formula for the standard error on the right eists, but you should rely on statistical software to find its value. 95% prediction interval for a new individual's response For prediction, a similar interval is used: (b 0 + b 1 ) ± tn 2 k where k is greater than the corresponding standard error for the confidence interval. Statistical software should again be used to find its value. Eample The diagram below shows 95% confidence intervals for the mean response at and 95% prediction intervals for a new response at as bands for a small data set with n = 7 values.

11 model. Etrapolation These 95% confidence intervals and 95% prediction intervals are valid within the range of -values about which we have collected data, but they should not be relied on for etrapolation. Both intervals assume that the normal linear model describes the process, but we have no information about linearity beyond the -values that have been collected Eercise: Predict the response Eercises are only available online. Residual variation (noise) The residual sum of squares is the uneplained variation. Note that the pooled estimate of the error variance, σ 2, is the residual sum of squares divided by (n - 2). 9.5 Coefficient of determination Sums of squares Total variation The total sum of squares reflects the total variability of the response. The overall variance of all response values is the total sum of squares divided by (n - 1). Relationship between sums of squares The following relationship requires some algebra to prove but is important Coefficient of determination Eplained variation (signal) When the relationship is strong,...the eplained sum of squares is close to the total sum of squares (and the residual sum of squares is small). When the relationship is weak,...the eplained sum of squares is small relative to the total sum of squares. The eplained sum of squares is the variation that is eplained by the

12 A useful summary statistic is the proportion of the total variation that is eplained, the coefficient of determination, R 2, A proportion (1 - R 2 ) of the total variation remains uneplained by the model. Although it is derived with quite a different aim, Eample and so on with more eplanatory variables, y = b 0 + b 1 + b 2 z y = b 0 + b 1 + b 2 z + b 3 w +... This type of model is called a multiple regression model. Coefficients Despite our use of the same symbols (b 0, b 1,...) for all three models above, their 'best' values are often different for the different models. An eample will be given in the net page Interpreting coefficients Marginal and conditional relationships In a linear model that predicts a response from several eplanatory variables, the least squares coefficient associated with any eplanatory variable describes its effect on the response if all other variables are held constant. This is also called the variable's conditional effect on the response. This may be very different from the size and even the sign of the coefficient when a linear model is fitted with only that single eplanatory variable. This simple linear model describes the marginal relationship between the response and that variable. Eample In a model for predicting the percentage body fat of men, the best model (as determined by least squares) in a simple model with weight, is 9.6 Multiple regression More than one eplanatory variable Response and eplanatory variables We are often interested in how a 'response' variable, Y, depends on other eplanatory variables. If there is a single eplanatory variable, X, we can predict Y from X with a simple linear model of the form, y = b 0 + b 1 However if other eplanatory variables have been recorded from each individual, we should be able to use them to predict the response more accurately Multiple regression equation Adding etra variables A simple linear model for a single eplanatory variable, y = b 0 + b 1 can be easily etended to describe the effect of a second eplanatory variable, Z, with an etra linear term, Predicted body fat = Weight However if we add Abdomen circumference to the model, the best values for the coefficients are Predicted body fat = Weight Abdomen For each 1lb etra Weight, men have, on average, 0.162% more body fat. For each 1lb etra Weight, men have, on average, 0.136% less body fat than others with the same Abdomen circumference Standard errors General linear model The general linear model is where Parameter estimates and standard errors y = β 0 + β β β ε ε ~ normal (0, σ) The best estimates of β 0, β 1,... are the least squares estimates, b 0, b 1,... The best estimate of σ 2 is the residual sum of squares, divided by its degrees of freedom,

13 where n is the number of observations and p is the number of β-parameters (i.e. the number of eplanatory variables plus 1). The least squares estimates, b 0, b 1,... are random quantities and have distributions. The formulae for their standard errors are comple but statistical software will report their values. Eample The equation below gives the least squares equation for predicting the percentage body fat of men, based on other body measurements. However each p-value assesses whether you can drop a single eplanatory variable from the full model. After dropping one variable from the full model, the p-values for the other variables will change and they may no longer be unimportant. Eample If several eplanatory variables have high p-values, this does not give evidence that you can simultaneously drop all of them from the model. The table below shows the p-values for testing whether the individual parameters are zero in the body fat model. Several p-values are higher than 0.1, giving evidence that these variables could be dropped from the full model but this does not mean that we could drop all such variables simultaneously. The table below shows the standard errors of these coefficients and the estimate of the error standard deviation, σ Inference for general linear models Hypothesis tests for single parameters This test asks whether the corresponding eplanatory variable can be dropped from the full model. The test statistic is the 'statistical distance' of the least squares estimate, b i, from zero, and its p-value is found from the tail area of the t distribution with (n - p) degrees of freedom. Interpretation of p-values The p-values are interpreted in the usual way as the strength of evidence against the null hypothesis. 9.7 What will be assessed? What you need to know You will not be eamined about everything in this chapter. Some of the material has been included to eplain why the chapter's methods are used, in the hope that it will help you to understand these methods better. What you need to learn for the eam is more limited. We now describe what we epect you to be able to do in the assignment and eam after studying the regression chapter. A. Simple linear regression Simple linear regression models are used to predict the value of a "response" from a single "eplanatory" variable. 1. Identify the response and eplanatory variable When a scenario is described in words, identify the response and eplanatory variable. 2. Describe the relationship When shown a scatterplot of Y against X, describe the relationship between the

14 variables: Positive or negative? Linear or curved? Strong or weak? 3. Given Ecel regression output, you need to: (a) Regression model: Find the values of the slope and intercept in the Ecel output. Write down the regression model using these values. Use the regression model to predict the response from a value of the eplanatory variable. (b) Slope and intercept: Eplain what the values of the slope and intercept describe, in a way that a non-statistician might understand. Use the p-value associated with the eplanatory variable to test whether the variables are related. Eplain your conclusion from this hypothesis test to a nonstatistician. 4. Coefficient of determination Identify the coefficient of determination, R 2. Interpret the R 2 value (in terms of the percentage of response variation eplained by the model). (c) Coefficient of determination, R 2 : Identify the coefficient of determination, R 2 from the Ecel output. Interpret the R 2 value (in terms of the percentage of response variation eplained by the model). 4. From a scatterplot of residuals against the eplanatory variable, B. Multiple regression List the assumptions required for the model (linearity, constant variance, independence). Use the scatterplot to discuss whether the model assumptions hold. Multiple regression models etend the idea of simple linear regression with two or more eplanatory variables. Given Ecel output from a multiple regression model, you should be able to: 1. Regression model 2. Slope parameters Write down the regression model that best predicts the response from all eplanatory variables. Use the regression model to predict the response from particular values of all eplanatory variables. Eplain what the values of the slope parameters describe, in a way that a non-statistician might understand. 3. Important eplanatory variables Interpret the p-values associated with the different eplanatory variables. Eplain which eplanatory variables are most important and which might be considered for dropping from the model.

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Eplained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

Chapter 11. Correlation and Regression

Chapter 11. Correlation and Regression Chapter 11 Correlation and Regression Correlation A relationship between two variables. The data can be represented b ordered pairs (, ) is the independent (or eplanator) variable is the dependent (or

More information

Correlation and simple linear regression S5

Correlation and simple linear regression S5 Basic medical statistics for clinical and eperimental research Correlation and simple linear regression S5 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/41 Introduction Eample: Brain size and

More information

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics Chapter 13 Student Lecture Notes 13-1 Department of Quantitative Methods & Information Sstems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analsis QMIS 0 Dr. Mohammad

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of

More information

Chapter 7 9 Review. Select the letter that corresponds to the best answer.

Chapter 7 9 Review. Select the letter that corresponds to the best answer. AP STATISTICS Chapter 7 9 Review MULTIPLE CHOICE Name: Per: Select the letter that corresponds to the best answer. 1. The correlation between X and Y is r = 0.35. If we double each X value, increase each

More information

Chapter 14 Simple Linear Regression (A)

Chapter 14 Simple Linear Regression (A) Chapter 14 Simple Linear Regression (A) 1. Characteristics Managerial decisions often are based on the relationship between two or more variables. can be used to develop an equation showing how the variables

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation? Did You Mean Association Or Correlation? AP Statistics Chapter 8 Be careful not to use the word correlation when you really mean association. Often times people will incorrectly use the word correlation

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. 1/48 Linear regression Linear regression is a simple approach

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters.

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters. Chapter 9: Sampling Distributions 9.1: Sampling Distributions IDEA: How often would a given method of sampling give a correct answer if it was repeated many times? That is, if you took repeated samples

More information

Warm-up Using the given data Create a scatterplot Find the regression line

Warm-up Using the given data Create a scatterplot Find the regression line Time at the lunch table Caloric intake 21.4 472 30.8 498 37.7 335 32.8 423 39.5 437 22.8 508 34.1 431 33.9 479 43.8 454 42.4 450 43.1 410 29.2 504 31.3 437 28.6 489 32.9 436 30.6 480 35.1 439 33.0 444

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

A3. Statistical Inference

A3. Statistical Inference Appendi / A3. Statistical Inference / Mean, One Sample-1 A3. Statistical Inference Population Mean μ of a Random Variable with known standard deviation σ, and random sample of size n 1 Before selecting

More information

Probability Distributions

Probability Distributions CONDENSED LESSON 13.1 Probability Distributions In this lesson, you Sketch the graph of the probability distribution for a continuous random variable Find probabilities by finding or approximating areas

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

ACCUPLACER MATH 0310

ACCUPLACER MATH 0310 The University of Teas at El Paso Tutoring and Learning Center ACCUPLACER MATH 00 http://www.academics.utep.edu/tlc MATH 00 Page Linear Equations Linear Equations Eercises 5 Linear Equations Answer to

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3 SMAM 319 Exam1 Name 1. Pick the best choice. (10 points-2 each) _c A. A data set consisting of fifteen observations has the five number summary 4 11 12 13 15.5. For this data set it is definitely true

More information

Graphing and Optimization

Graphing and Optimization BARNMC_33886.QXD //7 :7 Page 74 Graphing and Optimization CHAPTER - First Derivative and Graphs - Second Derivative and Graphs -3 L Hôpital s Rule -4 Curve-Sketching Techniques - Absolute Maima and Minima

More information

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc. Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright

More information

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom Central Limit Theorem and the Law of Large Numbers Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Understand the statement of the law of large numbers. 2. Understand the statement of the

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

explicit expression, recursive, composition of functions, arithmetic sequence, geometric sequence, domain, range

explicit expression, recursive, composition of functions, arithmetic sequence, geometric sequence, domain, range Jordan-Granite-Canyons Consortium Secondary Math 1: Unit B (7 8 Weeks) Unit : Linear and Eponential Relationships In earlier grades, students define, evaluate, and compare functions, and use them to model

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

TEACHER NOTES MATH NSPIRED

TEACHER NOTES MATH NSPIRED Math Objectives Students will produce various graphs of Taylor polynomials. Students will discover how the accuracy of a Taylor polynomial is associated with the degree of the Taylor polynomial. Students

More information

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of

More information

Important note: Transcripts are not substitutes for textbook assignments. 1

Important note: Transcripts are not substitutes for textbook assignments. 1 In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance

More information

ONLINE PAGE PROOFS. Relationships between two numerical variables

ONLINE PAGE PROOFS. Relationships between two numerical variables 14 Relationships between two numerical variables 14.1 Kick off with CAS 14.2 Scatterplots and basic correlation 14.3 Further correlation coefficients 14.4 Making predictions 14.5 Review 14.1 Kick off with

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of

More information

AMS 7 Correlation and Regression Lecture 8

AMS 7 Correlation and Regression Lecture 8 AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

ALGEBRA I SEMESTER EXAMS PRACTICE MATERIALS SEMESTER 2 27? 1. (7.2) What is the value of (A) 1 9 (B) 1 3 (C) 9 (D) 3

ALGEBRA I SEMESTER EXAMS PRACTICE MATERIALS SEMESTER 2 27? 1. (7.2) What is the value of (A) 1 9 (B) 1 3 (C) 9 (D) 3 014-015 SEMESTER EXAMS SEMESTER 1. (7.) What is the value of 1 3 7? (A) 1 9 (B) 1 3 (C) 9 (D) 3. (7.3) The graph shows an eponential function. What is the equation of the function? (A) y 3 (B) y 3 (C)

More information

Secondary 1 Vocabulary Cards and Word Walls Revised: June 27, 2012

Secondary 1 Vocabulary Cards and Word Walls Revised: June 27, 2012 Secondary 1 Vocabulary Cards and Word Walls Revised: June 27, 2012 The vocabulary cards in this file match the Common Core, the math curriculum adopted by the Utah State Board of Education, August 2010.

More information

Chapter 7. Scatterplots, Association, and Correlation

Chapter 7. Scatterplots, Association, and Correlation Chapter 7 Scatterplots, Association, and Correlation Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 29 Objective In this chapter, we study relationships! Instead, we investigate

More information

A Practitioner s Guide to Generalized Linear Models

A Practitioner s Guide to Generalized Linear Models A Practitioners Guide to Generalized Linear Models Background The classical linear models and most of the minimum bias procedures are special cases of generalized linear models (GLMs). GLMs are more technically

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

Class 26: review for final exam 18.05, Spring 2014

Class 26: review for final exam 18.05, Spring 2014 Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Simple Linear Regression for the Climate Data

Simple Linear Regression for the Climate Data Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions

AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions Know the definitions of the following words: bivariate data, regression analysis, scatter diagram, correlation coefficient, independent

More information

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to: STA 2023 Module 5 Regression and Correlation Learning Objectives Upon completing this module, you should be able to: 1. Define and apply the concepts related to linear equations with one independent variable.

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

FAQ: Linear and Multiple Regression Analysis: Coefficients

FAQ: Linear and Multiple Regression Analysis: Coefficients Question 1: How do I calculate a least squares regression line? Answer 1: Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables so that one variable

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Simple linear regression: linear relationship between two qunatitative variables. Linear Regression. The regression line

Simple linear regression: linear relationship between two qunatitative variables. Linear Regression. The regression line Linear Regression Simple linear regression: linear relationship etween two qunatitative variales The regression line Facts aout least-squares regression Residuals Influential oservations Cautions aout

More information

Tables Table A Table B Table C Table D Table E 675

Tables Table A Table B Table C Table D Table E 675 BMTables.indd Page 675 11/15/11 4:25:16 PM user-s163 Tables Table A Standard Normal Probabilities Table B Random Digits Table C t Distribution Critical Values Table D Chi-square Distribution Critical Values

More information

Vocabulary. Fitting a Line to Data. Lesson 2-2 Linear Models

Vocabulary. Fitting a Line to Data. Lesson 2-2 Linear Models Lesson 2-2 Linear Models BIG IDEA The sum of squared deviations is a statistic for determining which of two lines fi ts the data better. A linear function is a set of ordered pairs (, ) satisfing an equation

More information

appstats8.notebook October 11, 2016

appstats8.notebook October 11, 2016 Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus

More information

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals Chapter 8 Linear Regression Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8-1 Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fat Versus

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

Polynomial Functions of Higher Degree

Polynomial Functions of Higher Degree SAMPLE CHAPTER. NOT FOR DISTRIBUTION. 4 Polynomial Functions of Higher Degree Polynomial functions of degree greater than 2 can be used to model data such as the annual temperature fluctuations in Daytona

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

INFERENCE FOR REGRESSION

INFERENCE FOR REGRESSION CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We

More information

Chapter 5 Confidence Intervals

Chapter 5 Confidence Intervals Chapter 5 Confidence Intervals Confidence Intervals about a Population Mean, σ, Known Abbas Motamedi Tennessee Tech University A point estimate: a single number, calculated from a set of data, that is

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Multiple Regression Methods

Multiple Regression Methods Chapter 1: Multiple Regression Methods Hildebrand, Ott and Gray Basic Statistical Ideas for Managers Second Edition 1 Learning Objectives for Ch. 1 The Multiple Linear Regression Model How to interpret

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Definition of Statistics Statistics Branches of Statistics Descriptive statistics Inferential statistics

Definition of Statistics Statistics Branches of Statistics Descriptive statistics Inferential statistics What is Statistics? Definition of Statistics Statistics is the science of collecting, organizing, analyzing, and interpreting data in order to make a decision. Branches of Statistics The study of statistics

More information

TOPIC 9 SIMPLE REGRESSION & CORRELATION

TOPIC 9 SIMPLE REGRESSION & CORRELATION TOPIC 9 SIMPLE REGRESSION & CORRELATION Basic Linear Relationships Mathematical representation: Y = a + bx X is the independent variable [the variable whose value we can choose, or the input variable].

More information

SYSTEMS OF THREE EQUATIONS

SYSTEMS OF THREE EQUATIONS SYSTEMS OF THREE EQUATIONS 11.2.1 11.2.4 This section begins with students using technology to eplore graphing in three dimensions. By using strategies that they used for graphing in two dimensions, students

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

Statistics and Quantitative Analysis U4320. Lecture 13: Explaining Variation Prof. Sharyn O Halloran

Statistics and Quantitative Analysis U4320. Lecture 13: Explaining Variation Prof. Sharyn O Halloran Statistics and Quantitative Analysis U4320 Lecture 13: Eplaining Variation Prof. Sharyn O Halloran I. Eplaining Variation: R 2 A. Breaking Down the Distances Let's go back to the basics of regression analysis.

More information

Equations and Inequalities

Equations and Inequalities Equations and Inequalities Figure 1 CHAPTER OUTLINE 1 The Rectangular Coordinate Systems and Graphs Linear Equations in One Variable Models and Applications Comple Numbers Quadratic Equations 6 Other Types

More information

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Regression Models. Chapter 4. Introduction. Introduction. Introduction Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager

More information

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation y = a + bx y = dependent variable a = intercept b = slope x = independent variable Section 12.1 Inference for Linear

More information

The sample mean and sample variance are given by: x sample standard deviation Excel: STDEV(values)

The sample mean and sample variance are given by: x sample standard deviation Excel: STDEV(values) Unless we have made a very large number of measurements, we don't have an accurate estimate of the mean or standard deviation of a data set. If we assume the values are normally distributed, we can estimate

More information

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation Chapter 1 Summarizing Bivariate Data Linear Regression and Correlation This chapter introduces an important method for making inferences about a linear correlation (or relationship) between two variables,

More information

CHAPTER. Scatterplots

CHAPTER. Scatterplots CHAPTER 7 Two-Variable Data Analysis IN THIS CHAPTER Summary: In the previous chapter we used eploratory data analysis to help us understand what a one-variable data set was saying to us. In this chapter

More information

Which boxplot represents the same information as the histogram? Test Scores Test Scores

Which boxplot represents the same information as the histogram? Test Scores Test Scores Frequency of Test Scores ALGEBRA I 01 013 SEMESTER EXAMS SEMESTER 1. Mrs. Johnson created this histogram of her 3 rd period students test scores. 8 6 4 50 60 70 80 90 100 Test Scores Which boplot represents

More information

I used college textbooks because they were the only resource available to evaluate measurement uncertainty calculations.

I used college textbooks because they were the only resource available to evaluate measurement uncertainty calculations. Introduction to Statistics By Rick Hogan Estimating uncertainty in measurement requires a good understanding of Statistics and statistical analysis. While there are many free statistics resources online,

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

Sociology 6Z03 Review I

Sociology 6Z03 Review I Sociology 6Z03 Review I John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review I Fall 2016 1 / 19 Outline: Review I Introduction Displaying Distributions Describing

More information

EDEXCEL ANALYTICAL METHODS FOR ENGINEERS H1 UNIT 2 - NQF LEVEL 4 OUTCOME 4 - STATISTICS AND PROBABILITY TUTORIAL 3 LINEAR REGRESSION

EDEXCEL ANALYTICAL METHODS FOR ENGINEERS H1 UNIT 2 - NQF LEVEL 4 OUTCOME 4 - STATISTICS AND PROBABILITY TUTORIAL 3 LINEAR REGRESSION EDEXCEL AALYTICAL METHODS FOR EGIEERS H1 UIT - QF LEVEL 4 OUTCOME 4 - STATISTICS AD PROBABILITY TUTORIAL 3 LIEAR REGRESSIO Tabular and graphical form: data collection methods; histograms; bar charts; line

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Chapter 2: Looking at Data Relationships (Part 3)

Chapter 2: Looking at Data Relationships (Part 3) Chapter 2: Looking at Data Relationships (Part 3) Dr. Nahid Sultana Chapter 2: Looking at Data Relationships 2.1: Scatterplots 2.2: Correlation 2.3: Least-Squares Regression 2.5: Data Analysis for Two-Way

More information

Core Connections Algebra 2 Checkpoint Materials

Core Connections Algebra 2 Checkpoint Materials Core Connections Algebra 2 Note to Students (and their Teachers) Students master different skills at different speeds. No two students learn eactly the same way at the same time. At some point you will

More information

Test 1, / /130. MASSEY UNIVERSITY Institute of Information Sciences and Technology (Statistics)

Test 1, / /130. MASSEY UNIVERSITY Institute of Information Sciences and Technology (Statistics) MASSEY UNIVERSITY Institute of Information Sciences and Technology (Statistics) 161.120 INTRODUCTORY STATISTICS 161.130 BIOMETRICS Test 1, 2003 Duration: 1 hour Questions 1 and 2 are about the following

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Fundamentals of Algebra, Geometry, and Trigonometry. (Self-Study Course)

Fundamentals of Algebra, Geometry, and Trigonometry. (Self-Study Course) Fundamentals of Algebra, Geometry, and Trigonometry (Self-Study Course) This training is offered eclusively through the Pennsylvania Department of Transportation, Business Leadership Office, Technical

More information

Performance of fourth-grade students on an agility test

Performance of fourth-grade students on an agility test Starter Ch. 5 2005 #1a CW Ch. 4: Regression L1 L2 87 88 84 86 83 73 81 67 78 83 65 80 50 78 78? 93? 86? Create a scatterplot Find the equation of the regression line Predict the scores Chapter 5: Understanding

More information

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides Chapter 7 Inference for Distributions Introduction to the Practice of STATISTICS SEVENTH EDITION Moore / McCabe / Craig Lecture Presentation Slides Chapter 7 Inference for Distributions 7.1 Inference for

More information

3.2 Logarithmic Functions and Their Graphs

3.2 Logarithmic Functions and Their Graphs 96 Chapter 3 Eponential and Logarithmic Functions 3.2 Logarithmic Functions and Their Graphs Logarithmic Functions In Section.6, you studied the concept of an inverse function. There, you learned that

More information

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist sales $ (y - dependent variable) advertising $ (x - independent variable)

More information

3.3.1 Linear functions yet again and dot product In 2D, a homogenous linear scalar function takes the general form:

3.3.1 Linear functions yet again and dot product In 2D, a homogenous linear scalar function takes the general form: 3.3 Gradient Vector and Jacobian Matri 3 3.3 Gradient Vector and Jacobian Matri Overview: Differentiable functions have a local linear approimation. Near a given point, local changes are determined by

More information