Topic 1. Definitions

Size: px
Start display at page:

Download "Topic 1. Definitions"

Transcription

1 S Topic. Definitions. Scalar A scalar is a number. 2. Vector A vector is a column of numbers. 3. Linear combination A scalar times a vector plus a scalar times a vector, plus a scalar times a vector... etc. 4. Adding two vectors To add two (column) vectors, they must be of the same length, the same number of observations; adding two vectors together is accomplished by adding the contents of each row, one at a time, to form a new vector. A + B = C Multiplying a scalar by a vector To multiply a scalar by a vector, one simply creates a new vector the same length as the old vector, where every new value is calculated as the value in the old vector multiplied by the scalar. 2A + 3B = D

2 S Topic 2. Linear Independence. Definition A set of vectors is said to be linearly independent if no vector in the set can be expressed as a linear combination of others in the set.. How to Determine Linear Independence When faced with a set of vectors, it will sometimes be necessary to determine how many of the vectors are linearly independent. The steps below can be followed: ) Determine if any vector in the set of N can be written as a linear combination of the rest. If not, there are N linearly independent vectors. (Stop.) 2) If any one vector can be expressed as a linear combination of the rest (any scalars including 0 are permissible), then eliminate that vector. 3) Of the remaining N- vectors, determine if any one can be written as a linear combination of the rest. If not, then among the N vectors, N- are linearly independent. If yes, eliminate the vector and proceed with the set of N-2 vectors. 4) Continue the process until all vectors in the set remaining are linearly independent. If k vectors have been eliminated, there are (N-k) vectors that are linearly independent. 3. Examples ) 2) 3) A B C A B C D E A B C D E

3 S Topic 3. Simple Example Media Test. Simple Example Media, No Unit Vector: Problem A) Full The Data Sales Media S = a M + a 2 M2 Sales a M + a 2 M2 + Error ) Fill in the 7 values for M and M2 above. 2) Solve for a and a 2 : a = a 2 = 3) Fill in the 7 values in the error vector above. 4) Solve for the error sum-of-squares for the Full. ESS F =. 5) Write down: Value of expected value for Sales for media : Value of expected value for Sales for media 2: 4-3

4 S Topic 3. Media Test (Continued) B) Restricted S = bu Sales = bu + Error ) Fill in the 7 values for U above. 2) Solve for b: b = 3) Fill in the 7 values in the error vector above. 4) Solve for the error sum-of-squares for the Restricted. ESS R =. 5) Write down: Value of expected value for sales for media : Value of expected value for sales for media 2: C) Solve for F F = ESS R - ESS F NL F - NL R ESS F NOB - NL F Answer for F: 4-4

5 S Topic 3. Media Test (Continued) 2. Simple Example Media, No Unit Vector: Answer The Data Sales Media A) Full ) & 3) The Sales a M a 2 M2 Error ) a = 37 a 2 = 44 4) ESS F =6 5) Value of expected value for Sales for Media : 37 Value of expected value for Sales for Media 2:

6 S Linear restriction a = a 2 = b Sales = a M + a 2 M2 = bm + bm2 = b(m + M2) = bu Topic 3. Media Test (Continued) B) Restricted ) & 3) The Sales = bu Error ) b = 4 4) ESS R = 00 5) Value of expected value for Sales for Media : 4 Value of expected value for Sales for Media 2: 4 C) Solve for F F = ESS R - ESS F NL F - NL R 2 84 ESS = = = = F NOB - NL F

7 S Topic 3. Media Test (Continued) 3. Simple Example Media, With Unit Vector: Problem A) Full S = a 0 U + a 3 M Sales = a 0 U + a 3 M + Error ) Fill in the 7 values for U and M above. 2) Solve for a 0 and a 3 : a 0 = a 3 = 3) Fill in the 7 values in the error vector above. 4) Solve for the error sum-of-squares for this Full, ESS F =. 5) Write down: Value of expected value for sales, media : Value of expected value for sales, media 2: B) Restricted : Note: it is the same as the Restricted in the above example with no unit vectors. C) Solve for F. F = 4-7

8 S Topic 3. Media Test (Continued) 4. Simple Example Media, With Unit Vector: Answer A) Full ) & 3) Sales = a 0 U + a 3 M + Error ) a 0 = 44 a 3 = -7 4) ESS F =6 5) Value of expected value for Sales for Media : a 0 () + a 3 () = 44() - 7() = 37 Value of expected value for Sales for Media 2: a 0 () + a 3 (0) = a 0 = 44 Linear restriction a 3 = 0 B) Restricted C) Solve for F. S = a0u (Same as in example with no unit vector.) F = = = =

9 S Topic 3. Media Test (Continued) 5. Simple Example Media, With Unit Vector: SPSS A) Data sales media media

10 S Topic 3. Media Test (Continued) B) Regression Output Variables Entered/Removed b Variables Variables Entered Removed Method MEDIA a. Enter a. All requested variables entered. b. Dependent Variable: SALES Summary Adjusted Std. Error of R R Square R Square the Estimate.97 a a. Predictors: (Constant), MEDIA ESS R -ESS F Regression Residual Total ANOVA b Sum of Squares df Mean Square F Sig a a. Predictors: (Constant), MEDIA b. Dependent Variable: SALES ESS R ESS F (Constant) MEDIA a. Dependent Variable: SALES Unstandardized Coefficients Coefficients a Standardi zed Coefficien ts B Std. Error Beta t Sig a 3 a 0 4-0

11 Simple Example Topic 4. Simple Example Price, 3 Levels (A) The Now we illustrate the one independent variable test using a simple example with only seven observations. The dependent variable is sales and the independent variable is price with 3 levels: $5, $0, or $5. Here is the raw data. Trade Area Unit Sales Price Charged 40 $ $ $ $ $ $ $5 Full. Full in theory We construct the Full model in which we allow a different estimate for sales at each level of price: S = a P5 + a 2 P0 + a 3 P5.. () 2. Full in SPSS Since SPSS automatically adds the unit vector to our model, we must drop one of the three binary predictor vectors, either P5, P0 or P5. (We must drop one of the vectors because Unit = P5 + P0 + P5, and this would introduce a linear dependency into the model. Dropping the vector is not a problem, however, because model (2) below and model () above are equivalent models.) S = a 0 U + a P5 + a 2 P0.. (2) In this model (2), which we call our full model, sales is the dependent variable (measured at the interval or ratio level), U is the unit vector of all ones, a 0 is the weight on the unit vector, 4-

12 Topic 4. Price, 3 Levels (Continued) P5 is a binary predictor vector, a is the weight on P5, P0 is a binary predictor vector and its weight is a 2. What is in P5? P5 has ones and zeros. It has a one when the sales for the row came from a trade area where we charged $5 and a zero otherwise. What is in P0? P0 has ones and zeros. It has a one when the sales for the row came from a trade area where we charged $0 and a zero otherwise. 3. Converting raw observations into Full The above observations about P5 and P0 are illustrated in the following complete depiction of the model. S = a 0 U + a P5 + a 2 P (The binary predictor vectors P5 and P0 are created using the recode function in SPSS and the data on whether $5 or $0 was charged.) 4. Calculating expected value (EV) of Sales using Full From our full model, what is the expected value of sales given that $5 (or $0) was charged? To answer that question we first have to say what is in each vector of the model under the condition $5 was charged. U has a, P5 has a and P0 has a 0. EV(S: $5) = a 0 U + a P5 + a 2 P0 = a 0 () + a () + a 2 (0) = a 0 + a When $0 was charged, U still has a, but P5 has a 0 and P0 has a. EV(S: $0) = a 0 U + a P5 + a 2 P0 = a 0 () + a (0) + a 2 () = a 0 + a 2 4-2

13 Topic 4. Price, 3 Levels (Continued) When $5 was charged, U still has a, but P5 has a 0, and P0 also has a 0. EV(S: $5) = a 0 U + a P5 + a 2 P0 = a 0 () + a (0) + a 2 (0) = a 0 5. Parameters estimation in SPSS The parameters a 0, a and a 2 are estimated in SPSS by using the Regression function under the Statistics menu where sales is the dependent variable and P5 and P0 are the independent variables. (Don t worry about where they come from, this will be explained later.) In our example, a 0 =06, a =32, a 2 =7. (Details as per the SPSS outputs shown later.) So our full model can be written as: S = (06)U + (32)P5 + (7)P0 6. Restating the model with error term E Using the parameter estimates, we can restate the model with the error term E as follows. S = (06)U + (32)P5 + (7)P0 + E How can we get the value of E in the above model? First, we need to get the expected values of Sales at different price levels: $5, $0, and $5. To do so, we simply plug in the parameter estimates into our solutions in 4 earlier. EV(S: $5) = a 0 + a, so our estimate for sales at $5 is = 38. EV(S: $0) = a 0 + a 2, so estimate at $0 is = 23. EV(S: $5) = a 0, so estimate at $5 is 06. Then, by comparing the estimate value and the raw observations at each of the price level, we can get the value of E. 4-3

14 Topic 4. Price, 3 Levels (Continued) 7. Error sum-of-squares of full model (ESS F ) The error sum-of-squares of our full model is simply the sum of the squared errors in E : ESS F = (2) 2 + (-2) 2 + (-) 2 + () 2 + (-2) 2 + (2) 2 + (0) 2 = = 8 Restricted. The hypothesis in our test The hypothesis we wish to test is whether our sample could have come from a population where there is no relationship between price and sales. Put another way, whether our sample could have come from a population where the sales at all three price levels were equal. In another words, if there is no relationship between price and sales, the expected value of sales would be the same at different price levels. So We can state this hypothesis in null form: EV(S: $5) = EV(S: $0) = EV(S: $5) Substituting the appropriate parameters, it can be re-written as: a 0 + a = a 0 + a 2 = a 0 Note that the one and only condition when the above equation is true is where a = a 2 = Linear restriction So the linear restriction we impose on the Full (2) is a = a 2 = 0. This is what we are testing. Could our sample have come from a population where a = a 2 = 0? 3. Restricted model The linear restriction gives us our restricted model S = a 0 U.. (3) (We write a 0 because when SPSS automatically runs the restricted model with just a weight on the unit vector, the value for a 0 in such a model will almost always be different than the value for a 0 in (2), the full model. The least squares estimate for a 0 in (3) is 20. Note: For a model in which only the unit vector is present, the weight on the unit vector will simply be the average of the dependent variable.) 4-4

15 Topic 4. Price, 3 Levels (Continued) Rewriting (3) with the error vector E 2,we have: S = 20 U + E Note that in this model we estimate sales to be the same (every time our estimate is 20), regardless of the price charged. 4. Error sum-of-squares of restricted model (ESS R ) The error sum-of-squares of our restricted model is simply the sum of the squared errors in E 2 : ESS R = (20) 2 + (6) 2 + (2) 2 + (4) 2 + (-6) 2 + (-2) 2 + (-4) 2 = =,272 F Statistic Calculation. Now we calculate our F statistic with the following numbers: There are 3 linearly independent predictor vectors in the full model: (NL F = 3); There is linearly independent predictor vector in the restricted model: (NL R = ); There are 7 observations in our example: (NOB = 7) Error sum-of-squares of the full model is 8: (ESS F = 8); error sum-of-squares of the restricted model is,272 (ESS R =,272).,272 8, F = 3 = = =

16 2. Interpretation Topic 4. Price, 3 Levels (Continued) Using the SPSS output, we can get the probability that we would observe an F of or larger in a sample taken from a population where the true F is zero is a very, very low number, less than Since these odds are so small, we conclude that our sample did not come from a population where the linear restriction is true (equivalent to saying F in population is not zero). So, if the linear restriction is not true in the population, this means that sales are different when the price is different; and we must think carefully about the price we charge. Perhaps we can use cost and margin data to figure out its optimal price. Data (B) SPSS area sales price p5 p

17 Topic 4. Price, 3 Levels (Continued) Output Interpretation Regression Variables Entered/Removed b Variables Variables Entered Removed Method P0, P5 a. Enter a. All requested variables entered. b. Dependent Variable: SALES Summary Adjusted Std. Error of R R Square R Square the Estimate.993 a a. Predictors: (Constant), P0, P5 ESS R -ESS F Regression Residual Total ANOVA b Sum of Squares df Mean Square F Sig a a. Predictors: (Constant), P0, P5 b. Dependent Variable: SALES ESS R ESS F (Constant) P5 P0 a. Dependent Variable: SALES Unstandardized Coefficients Coefficients a Standardi zed Coefficien ts B Std. Error Beta t Sig a 2 a a 0 4-7

18 Topic 5. More On The Concept of Linear s And F Statistics Test Example Data Base Assume we are working with the data from the example file in which we test marketing a new product. In the test market, we systematically varied prices ($5 or $0), advertising (equivalent to $0,000 per market area or $20,000 per market area) and secret ingredient X (essentially, at 4 different levels). There were 96 different test market areas, each roughly equivalent in terms of size, income, and all other relevant characteristics and we recorded the sales of the product in the market area after a suitable interval. In this data set, it is easy to identify the dependent variable (Sales) because everything else was part of the carefully controlled experiment. So what we want to do is to test for relationships between each of the controlled variables and Sales. Does price affect sales? Does advertising affect sales? Does the level of the secret ingredient affect sales? Assume for the moment that our data is really data from a population of interest (and not the sample that it is). The Logic What would be true if there was a relationship between the dependent variable Sales and the independent variable Price? We would observe that for different values of Price we obtained different values for Sales. If this were a product which was price sensitive, then we would expect for Sales to be higher when Price was lower. Since we are dealing with 48 observations for each price level, we would expect the average Sales for the price of $5 to be higher than the average Sales for the price of $0. One way to test this would be to calculate the 2 averages and compare them. (Remember, we assumed this was our population of interest, so if the averages are different, we conclude there exists a relationship.) But simply comparing averages will not work for all of the hypotheses we wish to test. There are many fairly complex hypotheses we wish to test that require us to think differently than simply in terms of averages. Linear s. The Concept of Linear In Topic Part IV, we introduced the concept of a linear combination of a set of vectors. It is simply the sum of a weight times a vector, plus a weight times a vector,...etc. Put most simply: A linear model is a linear combination of a set of predictor vectors. 4-8

19 Topic 5. Linear s & F Test (Continued) It is a model in the sense that it is intended to reproduce (or fit) the values for one variable (we call it the dependent variable) given the values on or more other variables (we call them the independent variables). For example, we might create a linear model to predict Sales as a function of Price. Or advertising. Or Price and Advertising. Or Price, Advertising and our secret Ingredient X. 2. Full and Restricted To test our hypotheses, we need to create 2 models -- a full model and a restricted model -- and compare them in terms of their fit to a set of data. The restricted model is created by imposing a linear restriction on the weights in the full model. If the linear restriction is true, then the restricted model will fit the data almost as well as the full model. If the linear restriction is not true, then the restricted model will not fit the data as well as the full model. 3. Example Demonstration Now we use an example to illustrate the full and restricted model. Suppose we wish to test for a relationship between Price and Sales. Full model We know that Price has 2 levels ($5 and $0). So we first create a full model in which we express Sales as a function of Price. Because Price has 2 levels, we form 2 predictor vectors: to be associated with Sales values that resulted when Price was at $5 and the other to be associated with Sales values that resulted when Price was at $0. The predictor vectors will be binary, i.e., they will contain zeros or ones, and they can be thought of as membership vectors in the sense that they indicate whether a particular sales result is a "member of" the $5 price condition or the $0 price condition. The full model looks like this: S = a (P 5 ) + a 2 (P 0 ) Where: S is the sales value P5 is the binary predictor vector which will contain i) a one if the observed sales value came from a test market area where $5 was charged, and ii) a zero otherwise P0 is the binary predictor vector which will contain: i) a one if the observed sales value came from a test market area where $0 was charged; and, ii) a zero otherwise a is the weight (to be estimated) for predictor vector P5, and a 2 is the weight (to be estimated) for predictor vector P0. 4-9

20 Topic 5. Linear s & F Test (Continued) If we submitted the above model and data to a software package, it would produce estimates for a and a 2 equal to 34.83, and 22.3, respectively. (Incidentally, it turns out in this simple case that the estimate for a will be equal to the average sales when the price is $5 and the estimate for a 2 will be equal to the average sales when the price is $0.) Some definitions: i) We call a the expected value for sales at a price of $5. We call the value of the expected value for sales at a price of $5. ii) We call a 2 the expected value for sales at a price of $0. We call 22.3 the value for the expected value for sales at a price of $0. Full model with an error vector Because we almost never have a model which fits the data perfectly, we must add an error vector E to our model. So the full model with an error vector looks like this: S = a (P 5 ) + a 2 (P 0 ) + E While using the estimates of a and a 2, as well as the observations from our data base, we can get the values of this error vector. S = (P5) (P0) + E To calculate the value for the error term in row we would have: 02 = (0) () For rows 2, 3, 4, 95, and 96 we would have: 2 20 = (0) ()

21 3 37 = () (0) = () (0) = () (0) Topic 5. Linear s & F Test (Continued) = (0) () Focus, for a moment on the error vector. The weights, a = and a 2 = 22.3, are chosen so as to minimize the sum of the squares of the error terms. There is no other set of values for a and a 2 that would produce a lower error sum of squares. The error sum-of-squares is a measure of how well our model "fits" the data. In our full model, the error sum-of-squares ESS F = 8, Restricted model Remember that our model has allowed for one estimate for sales at a price of $5 (34.83) and another estimate for sales at a price of $0 (22.3). The fact that there is a difference between the averages suggests there is a relationship. But our way of testing this is to now create a restricted model which does not allow for differences in estimates for sales at price = $5 and price = $0 and compare the new error sum-of-squares to the old error sum-of-squares. To create our restricted model, we need to impose a linear restriction on the weights in the full model that embodies our hypothesis. In this case our hypothesis (stated in the "null" form) is that: There is no relationship between the price charged and the resulting sales. In terms of the expected values for the full model, our hypothesis is that: The expected value for sales at price = $5 is equal to the expected value for sales at price = $0: EV(S:P5) = EV(S:P0) But in our full model, the expected value for sales at price = $5 is a and the expected value for sales at price = $0 is a 2. So in terms of the weights, the hypothesis is represented by the linear restriction: EV(S:P5) = EV(S:P0) = a = a 2 Now we impose the linear restriction on the weights in the full model (let a = a 2 = c), we get our restricted model (with error vector): S = c(p5 + P0) + E 2 4-2

22 Topic 5. Linear s & F Test (Continued) But P5 + P0 gives us a vector with all ones. We label such a vector the unit vector, u. So our restricted model is S = c(u) + E 2. Our least-squares estimate for c is (Incidentally, when the restricted model is just the unit vector, the weight will be the average of all of the values for the dependent variable.) The error sum-of-squares for the restricted model, is ESS R = 2, Analysis By imposing the linear restriction the ESS went from 8, to 2, Thus, the restricted model is not nearly as good a fit as the full model. F Statistics. The concept But we can't use ESS alone as our index of fit. Differences between ESS for a full model and a restricted model, although affected by differences in fit, can also be affected by differences in the number of parameters being estimated. For this reason we need to construct an index which takes all relevant factors into consideration and provides one single summary of the difference between the full model and the restricted model. We call our index the F statistic and it is calculated using the following formula: F = ESS R - ESS F NL F - NL R ESS F NOB - NL F Where ESS R : the error sum-of-squares for the restricted model; ESS F : the error sum-of-squares for the full model; NL F : the number of linearly independent predictor vectors in the full model; NL R : the number of linearly independent predictor vectors in the restricted model; NOB: the number of observations on which the two models are based. Note that, all other things equal, the greater the difference between ESS R and ESS F, the greater will be the value for F. Also note that when ESS R = ESS F (ESS R can never be less than ESS F ), F equals zero. 4-22

23 2. Sampling error concern Topic 5. Linear s & F Test (Continued) Now suppose we reintroduce the fact that our data is really a sample. If no relationship exists between price and sales in the population then ESS R will equal ESS F in the population. That is, the average sales for both price levels will be the same. Therefore, it won't matter whether we allow 2 estimates (as we do in the full model) or estimate (as we do in the restricted model). If ESS R = ESS F in the population, then F = 0 in the population. Thus, when there is no relationship between Price and Sales in the population, the F will be zero. But because we are taking samples, it would be possible for us to obtain sample F values that were not zero, even though the true F for the population was zero. So we need to know the sampling distribution for the F statistic. The sampling distribution for F depends on degrees of freedom. But this time, instead of only, there are 2: DF and DF 2. DF = NL F NL R, the denominator in the numerator in the formula for F; DF 2 = NOB NL F, the denominator in the denominator in the formula for F. Once we know DF and DF 2, we can draw the sampling distribution for F. 3. Example Demonstration In our example F = 2, , , = 3, , = 3, = The probability that, with DF = and DF 2 = 94, we would get an F of or larger in a sample taken from a population where the true F was 0, is.000. Since this probability is so low, we can conclude that our linear restriction a = a 2 is probably not true in the population from which this sample was taken. Thus, the average sales in the population where we charge $5 would not be the same as the average sales where we charge $0, so there must be a relationship between price and sales. 4-23

24 Topic 6. Steps For One Variable Test Suggested Steps for Conducting One Independent Variable Test. Pick two variables where you believe one variable is dependent on (i.e. is possibly caused by) the other. Label the two variables as dependent and independent, respectively. The dependent variable must be at least interval scaled. (An exception for this will be made in this class for the Fail3/Fail4 database where the dependent variable is binary, or 0.) 2. Now inspect the values for the dependent variable. If a plot of the values for the dependent variable reveals that a few values are clearly outliers -- that is, a few are very large or very small and clearly set apart from the rest of the observations then create a new working file in which the entire row for each of these outlier observations has been deleted. 3. With the observations that remain after step 2, now focus on the values for the independent variable. If the independent variable is nominal and/or takes on only a few discrete values, then proceed to step 4. But if the independent variable is continuous, then try to divide its values into roughly 4 to 7 groups where the interval widths are equal. To group your observations: a) Decide on the number of groups you would like to have; b) Ignore the extreme values of the independent variable, calculate the interval width as (Max Min)/(# of intervals desired). 4. For each different group on the independent variable, use the recode feature to create a binary predictor vector (a membership vector). Make sure to recode missing values on the independent variable for a row into missing values in the binary predictor vector for that row. 5. Make certain you have at least 5 observations per group. If you don t, you need to recode differently and go back to step 4. Checking for at least 5 observations per group can be accomplished by running frequencies or descriptives on the binary predictor vectors. 6. Use regression under the statistics menu to run the model. 7. Pull the appropriate numbers from the output to complete the tables illustrated in the example one-variable test assignment. 4-24

25 Topic 7. Two Independent Variable Test (A) The Two Independent Variable Test With Binary Predictor Vectors. In this test, we select two independent variables and create binary predictor vectors for both. When you create the binary predictor vectors, make sure they contain either ones, zeros, or the system-missing value indicator. Create such vectors for all levels of both variables, not just the N- levels you have been creating. 2. Now create new binary predictor vectors by multiplying (Transform/Compute) every binary predictor vector on the first independent variable by every binary predictor vector on the second independent variable. If the first variable has N levels and the second has K levels, in this step you will be creating N times K vectors. For example, in test data, two levels on price crossed with four levels on X gives 8 new binary predictor vectors. P5x would be a vector with one when the sales came from a trade area where $5 was charged and xlevel was gram; it would have zeros otherwise. P5x2 would have ones where $5 was charged and there were.5 grams of the secret ingredient. This would continue all the way up to P0x4 which is the last of the eight vectors and would have ones where $0 was charged and there were 3 grams of the secret ingredient. If you were testing a 2-level variable by a 2-level variable, you would be creating 4 binary predictor vectors. 3. Now you need to create the full model ( ). To run the model we must drop one of our 8 (xlevel by price example) binary predictor vectors because SPSS is going to add the unit vector. We run this model by submitting the N*K- predictor vectors. We get this model s error sum-of-squares from the residual line of the output and the parameters from the output just like before. 4. SPSS automatically tests the linear restriction that all parameters (except the weight on the unit vector) are zero. 5. We now want to test to see if price mattered in our Full model and we perform that test by forcing the information on price out of our model and just running with the binary predictor vectors for xlevel. The results for this model ( 2) are compared to the results of the Full model in the form of an F test and using the F tables. 6. To test to see if xlevel mattered in our Full model we force the information on xlevel out of our model and see how much worse the model ( 3) with just price fits in the form of an F test. 4-25

26 Topic 7. Two Independent Variable Test (Continued) 7. Note: If you have some rows missing data on one of your two independent variables but not the other, it will be necessary to create the binary predictor vectors for models 2 and 3 by summing up the appropriate vectors from the full model. This is the only way that all 3 models will be run with exactly the same set of observations. For example, to create the vectors for running just price you need to recreate p5 by summing p5x+p5x2+p5x3+p5x4. x would be created by summing p5x+p0x, and so on. (B) SPSS Outputs Descriptives Because the mean of a column of s and 0 s is the proportion of s, the descriptives output can be used to easily calculate how many s are in each binary predictor vector. For example, 96 x.250 =2. Thus all 8 binary predictor vectors (BPVs) have exactly 2 observations. Given our knowledge of the data, this is what we would have expected. Descriptive Statistics P5X P5X2 P5X3 P5X4 P0X P0X2 P0X3 P0X4 Valid N (listwise) N Mean

27 Regression Topic 7. Two Independent Variable Test (Continued) Variables Entered/Removed b Variables Entered P0X3, P0X2, P0X, P5X4, P5X3, P5X2, P5X a Variables Removed a. All requested variables entered. b. Dependent Variable: SALES Method. Enter Summary Std. Error Adjusted of the R R Square R Square Estimate.89 a a. Predictors: (Constant), P0X3, P0X2, P0X, P5X4, P5X3, P5X2, P5X Regression Residual Total ANOVA b Sum of Mean Squares df Square F Sig a a. Predictors: (Constant), P0X3, P0X2, P0X, P5X4, P5X3, P5X2, P5X b. Dependent Variable: SALES (Constant) P5X P5X2 P5X3 P5X4 P0X P0X2 P0X3 a. Dependent Variable: SALES Unstandardized Coefficients Coefficients a Standardi zed Coefficien ts B Std. Error Beta t Sig

28 2 Regression Topic 7. Two Independent Variable Test (Continued) Variables Entered/Removed b Variables Variables Entered Removed Method RXLEVEL3, RXLEVEL2, RXLEVEL a. Enter a. All requested variables entered. b. Dependent Variable: SALES Summary Std. Error Adjusted of the R R Square R Square Estimate.639 a a. Predictors: (Constant), RXLEVEL3, RXLEVEL2, RXLEVEL Regression Residual Total ANOVA b Sum of Mean Squares df Square F Sig a a. Predictors: (Constant), RXLEVEL3, RXLEVEL2, RXLEVEL b. Dependent Variable: SALES (Constant) RXLEVEL RXLEVEL2 RXLEVEL3 a. Dependent Variable: SALES Coefficients a Unstandardized Coefficients Standardi zed Coefficien ts B Std. Error Beta t Sig

29 3 Regression Topic 7. Two Independent Variable Test (Continued) Variables Entered/Removed b Variables Variables Entered Removed Method P5 a. Enter a. All requested variables entered. b. Dependent Variable: SALES Summary Std. Error Adjusted of the R R Square R Square Estimate.407 a a. Predictors: (Constant), P5 Regression Residual Total a. Predictors: (Constant), P5 b. Dependent Variable: SALES ANOVA b Sum of Mean Squares df Square F Sig a (Constant) P5 a. Dependent Variable: SALES Unstandardized Coefficients Coefficients a Standardi zed Coefficien ts B Std. Error Beta t Sig

30 Topic 8. F Tables Steps In Testing Hypotheses Using The F Tables. Run the Full and Restricted s, calculate the F statistic using the appropriate error sumof-squares, and note the two degrees of freedom. Say our sample s calculated F was Pick the probability you wish to use for this test:.0,.025,.05,.0, then use the % table, the 2.5% table, the 5% table, or the 0% table respectively. 3. For your test s degrees of freedom, look up the F value. 4. What does the F value from the table indicate? Say we are working with the 5% table and the F we pull from the table is 3.0. This means that (only) 5% of the time would one get an F of 3.0 or larger from a sample taken from a population where the true F was 0. Put another way: We would say (only) 5% of the time would one get an F of 3.0 or larger from a sample taken from a population where the linear restriction on the parameters of the Full to get the Restricted was true. (In the case of the one independent variable test) Put another way: (Only) 5% of the time would one get an F of 3.0 or larger from a sample taken from a population where the average for the dependent variable was the same across all levels of the independent variable. 5. Since our calculated F of 7.2 is larger than the table F of 3.0, the odds in all three statements of 4 above are less than 5% for our sample. 6. Thus, we can conclude that: The F for the population form which our sample came is probably not zero. Or, Our sample probably did not come from a population where the linear restriction is true. Or, (In the case of the one independent variable test) Our sample probably did not come from a population where the average for the dependent variable was the same across all levels of the independent variable. 7. Thus, we conclude there is probably a relationship between the two variables in the population from which our sample came. 4-30

31 Topic 9: Test For Linearity The Logic In the test for linearity, we first specify a full model in which we create binary predictor vectors for each of several different levels (at least three) of an independent variable. We want to find out if constant increases in the independent variable will result in constant increases in the dependent variable. For example, assume that when the value for the independent variable increases from to 2 (an increase of unit) that the value for the dependent variable increases from 5 to 30 (an increase of 5 units). If it is also true that for any other one unit increase on the independent variable, the dependent variable increases by approximately 5 units, and for /2 unit increases on the independent variable the dependent variable increases by 7.5 units, then the relationship is probably linear. But to know whether this sample could have come from a population where the relationship is linear we must do a statistical test. Hypothesis for Test In reality we don't believe that XLEVEL is linearly related to Sales. But the null hypothesis is that XLEVEL is linearly related to sales. We will test this hypothesis by comparing, with an F statistic, the ESS for the full model (with binary vectors) to the ESS of a restricted model in which the relationship is forced to be linear. Full. Full model with unit vector The dependent variable is sales, the independent variable is Xlevel with 4 levels. The full model with unit vector is: S = a 0 u + a 2 X2 + a 3 X3 + a 4 X4 where X2 contains a if the sales figure came from an area where the level of ingredient "X" was.5 and a zero otherwise; and so on through X4. X2 has 24 's and 72 0's. The same is true for X3 through X4. The unit vector, of course, has all s. 2. Expected value of Sales at each xlevel in full model EV(S: X) = a 0 () + a 2 (0) + a 3 (0)+ a 4 (0)= a 0 EV(S: X2) = a 0 () + a 2 () + a 3 (0)+ a 4 (0)= a 0 + a 2 EV(S: X3) = a 0 () + a 2 (0) + a 3 ()+ a 4 (0)= a 0 + a 3 EV(S: X4) = a 0 () + a 2 (0) + a 3 (0)+ a 4 ()= a 0 + a 4 4-3

32 3. SPSS output full model Regression Variables Entered/Removed b Topic 9: Test For Linearity (Continued) Variables Variables Entered Removed Method XL4, XL3, XL2 a. Enter a. All requested variables entered. b. Dependent Variable: SALES Summary Adjusted Std. Error of R R Square R Square the Estimate.639 a a. Predictors: (Constant), XL4, XL3, XL2 Regression Residual Total ANOVA b Sum of Squares df Mean Square F Sig a a. Predictors: (Constant), XL4, XL3, XL2 b. Dependent Variable: SALES ESS F (Constant) XL2 XL3 XL4 a. Dependent Variable: SALES a 3 a 4 Unstandardized Coefficients Coefficients a Standardi zed Coefficien ts B Std. Error Beta t Sig a 2 a

33 Topic 9: Test For Linearity (Continued) 4. For this full model, the R-square is.408, adjusted R-square is.389, and the standard error of the estimate is Using the parameters estimated by SPSS, we can calculate the expected value of Sales. EV(S: X) = a 0 = EV(S: X2) = a 0 + a 2 = = EV(S: X3) = a 0 + a 3 = = EV(S: X4) = a 0 + a 4 = = Restricted. EV(S:X2) - EV(S:X) = (a 0 + a 2 ) - (a 0 ) = a 2 =.5c EV(S:X3) - EV(S:X2) = (a 0 + a 3 ) - (a 0 + a 2 ) = a 3 - a 2 =.5c EV(S:X4) - EV(S:X3) = (a 0 + a 4 ) - (a 0 + a 3 ) = a 4 - a 3 =c a 2 =.5c a 3 = a 2 +.5c =.5c +.5c = c a 4 = a 3 + c = c + c = 2c 2. Restricted model ' S = a0u +.5cX 2 + cx 3 + 2cX 4 ' = a 0U + c(.5x 2 + X 3 + 2X 4) 3. Expected value of Sales at each xlevel in restricted model ' ' ( S : X) = a0 () + c[.5(0) + (0) + 2(0)] a0 EV = ' ' EV ( S : X 2) = a () + c[.5() + (0) + 2(0)] = a. 5c ' ' EV ( S : X 3) = a0 () + c[.5(0) + () + 2(0)] = a0 + c ' ' EV ( S : X 4) = a () + c[.5(0) + (0) + 2()] = a 2c

34 4. SPSS output - restricted model Regression Variables Entered/Removed b Topic 9: Test For Linearity (Continued) Variables Variables Entered Removed Method LINVECT a. Enter a. All requested variables entered. b. Dependent Variable: SALES Summary Std. Error Adjusted of the R R Square R Square Estimate.346 a a. Predictors: (Constant), LINVECT Regression Residual Total ANOVA b Sum of Mean Squares df Square F Sig a a. Predictors: (Constant), LINVECT b. Dependent Variable: SALES ESS R (Constant) LINVECT a. Dependent Variable: SALES c Unstandardized Coefficients Coefficients a Standardi zed Coefficien ts B Std. Error Beta t Sig ' a

35 Topic 9: Test For Linearity (Continued) 4. For this restricted model, the R-square is.20, adjusted R-square is.0, and the standard error of estimate is Using the parameters estimated by SPSS, we can calculate the expected value of Sales. EV ( S ' : X ) = a = ' EV ( S : X 2) = a +.5c = (5.462) = ' EV ( S : X 3) = a + c = = ' EV ( S : X 4) = a0 + 2c = (5.462) = Analysis. Expected value of Sales SALE (in 000's) 47 TEST FOR LINEARITY GRAM.5 GRAMS 2 GRAMS 3 GRAMS XLEVEL FULL MODEL RESTRICTED MODEL 2. F-statistic calculation ESS R =, , ESS f = 7, , NL F = 4, NL R = 2, NOB = 96 F = = = =

36 3. Conclusion Topic 9: Test For Linearity (Continued) Reject the linear restriction. We would observe an F of 4.88 or larger (with degrees of freedom df = 2 and df 2 = 92) from a sample taken from a population where the true F was zero only % of the time. Since the above computed F is much larger, we would observe it, or one larger, even less often. Therefore, the sample probably did not come from a population where the linear restriction would have been true. The relationship is not linear, so the peak at the third level of XLEVEL, about two units of the secret ingredient, after which sales go down, is probably not just a chance occurrence. There does appear to be an ideal level on "X". 4-36

37 Topic 0. Steps For Linearity Test Suggested Steps For Conducting The Linearity Test. Pick two variables that you have already established are related to each other. As always in linear models, the dependent variable must be measured at the interval or ratio level. (We still include the exception where the dependent is two-level, reflected at or 0, as in Fail3/Fail4). In this test, however, the independent variable should be measured at the interval or ratio level as well. (If the independent variable is only ordinal but has 6 or more levels (i.e., 6 or more binary predictor vectors), then an exception can be made that allows use of this ordinal-level variable in the linearity test, but be sure and tell the reader.) 2. Now inspect the values for the dependent variable. If a plot of the values for the dependent variable reveals that a few values are clearly outliers that is, a few are very large or very small and clearly set apart from the rest of the observations then create a new working file in which the entire row for each of these outlier observations has been deleted. 3. With the observations that remain after step 2, now focus on the values for the independent variable. If there exist natural breaks or meaningful categories, then use these natural breaks or meaningful categories to decide on the separate levels for the independent variable. You are seeking 4 to 7 different groups. The absolute minimum number of different groups for running the linearity test is 3. If there are no natural breaks or logically meaningful categories, then: a) Decide on the number of groups you would like to have; b) Ignore the extreme values of the independent variable, calculate the interval width as (Max Min)/(# of intervals desired). 4. For each different group on the independent variable, use the recode feature to first create a recoded independent variable. Then use the recode feature and the recoded independent variable to create a binary predictor vector (a membership vector) for each level on the independent variable. Make sure to recode missing values on the independent variable for a row into missing values in the binary predictor vector for that row. 8. Make certain you have at least 5 observations per group. If you don t, you need to recode differently and go back to step 4. Checking for at least 5 observations per group can be accomplished by running frequencies or descriptives on the binary predictor vectors. 9. Use regression under the statistics menu to run this model with binary predictor vectors 4-37

38 Topic 0. Steps For Linearity Test (Continued) 0. Pull ESS (residual) as ESS for this full model, R-square, adjusted R-square, and the parameter estimates from the output. Use the parameter estimates and the values of or 0 in the predictor vectors to calculate the values of the expected values for the full model for each level on the independent variable.. For each level on the independent variable, write down a number that serves to indicate a typical number on the independent variable for that level. This can be the midpoint of the numbers in the range, or the average of the values for the independent variable in that range, or if the distribution of values in the range in skewed, you can make a rough estimate of where the median would be. 2. Use the indicator value (form step 8) for the range to create LINVEC. For the Xlevel example remember that the indicator values were,.5, 2, and 3 and LINVEC had 0,.5,, and 2; essentially, LINVEC will have a 0 for the first level, and then the difference from the first level to each of the other levels. For levels 2 through 4 in the Xlevel example, LINVEC has.5-=.5; 2-=; 3-=2. 3. Run the restricted model with LINVEC as the only independent variable. Of course SPSS adds the unit vector. 4. Pull ESS (residual) as ESS for this restricted model, R-square, adjusted R-square, and the two parameter estimates from the output. Use the parameter estimates and the values in U and LINVEC for the various levels to calculate the values of the expected values for this restricted model. 5. Calculate the F statistic. Select the critical probability you are using for this test and go to the table (one of 4: p=.0, p=.05, p=.025, or p=.0) for that critical probability. Use the 2 degrees of freedom for your calculated F to find the critical F in the table. 6. Compare your calculated F to the critical F in the table. If you calculated F is larger than the critical F, reject the linear restriction; if not, accept the linear restriction. If you reject, you are concluding that the sample probably did not come from a population where the relationship is linear. If you accept, you are concluding that the sample probably did come from a population where the relationship is linear. 4-38

39 Introduction Topic : Building Regression s In previous assignments, the models have been constructed with binary predictor vectors to represent different levels of an independent variable. The independent variable could be any level of measurement: nominal, ordinal, interval or ratio. If the variable was continuous, it had to be recoded based on ranges which then constituted each of the levels on the independent variable and these were reflected in binary predictor vectors. The models we will now build, however, can have both binary predictor vectors as well as the raw values of the independent variables. We may also create what are called pseudo-variables by squaring the independent variables or taking the product between pairs of independent variables. Variables In The The Dependent Variable The dependent variable should be measured at the interval or ratio level. (Exceptions are sometimes made if the dependent is nominal but only has two levels of 0 and, or if the variable is ordinal but has many levels.) It is a good idea to examine a histogram (or frequency distribution) of the dependent variable and identify any outliers. Outliers are values extremely far removed (either much lower or much higher) from the bulk of roughly continuous values. In examining the distribution of the dependent variable, it is also sometimes useful to have the mean and standard deviation of the dependent variable and consider how many standard deviations away from the mean a particular value is. Essentially, if you have a few values on the dependent variable that are very far removed from the other values, then you probably need to delete these observations from the dataset before building your regression model. The Independent Variables The independent variables with which you build your regression model can contain either the original, raw values of the independent variables, or they can be binary predictor vectors representing each of several different levels of an independent variable. If the independent variable is nominal, or if it is ordinal with only a few levels, then binary predictor vectors must be created to represent the different levels. If the independent variable is interval or ratio, then you can use either the raw values, or binary predictor vectors representing the various levels. 4-39

40 Curvilinear Relationships Topic : Regression s (Continued) If you use the raw values on the independent variable, then you are essentially assuming that the best way to capture the relationship between the dependent variable and the independent variable is linear, or in the form of a straight line. If, however, you believe the nature of the relationship between the two variables is best represented by a curve, then you need to include both the raw values on the independent variable as well as a pseudo-variable that contains the square of the raw values of the independent variable. Interaction Effects If one has several independent variables in a regression model, one way to describe the modeling of the effect of the independent variables on the dependent is that the effects are additive. That is, we can capture the overall effect by simply adding up the various effects from each variable separately. Sometimes, however, the combined or joint effects of the independent variables cannot be captured in a simple additive form. In these cases one needs to construct new pseudo-variables that are the product of two independent variables. Example Data Base We will use the testdata (X level) database to illustrate the points above as well as how to interpret the output and how to build a regression model. Test Objectives Recall that in testdata we had 96 observations. Unit sales is our dependent variable and advertising, price and X level are our independent variables. We wish to build and test regression models to accomplish two objectives: ) We want to test whether, from the experiment generated data, advertising, price or X level have an impact on sales; and, 2) We want to build a model that will allow us to predict sales for any particular combination of values for advertising, price and X level. Create New Independent Variables Based on our earlier tests we have good reason to believe that Sales may be related to X level in a curvilinear fashion. So, before running the regression, we create a new variable called XLSQ which is the square of X level. We also have reason to believe that to fully capture the effect of X level and price in terms of the way in which they affect sales we need an interaction term. So, we create another new variable called XLPR which is the product of price and X level. 4-40

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors.

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors. EDF 7405 Advanced Quantitative Methods in Educational Research Data are available on IQ of the child and seven potential predictors. Four are medical variables available at the birth of the child: Birthweight

More information

Multiple Comparisons

Multiple Comparisons Multiple Comparisons Error Rates, A Priori Tests, and Post-Hoc Tests Multiple Comparisons: A Rationale Multiple comparison tests function to tease apart differences between the groups within our IV when

More information

1 Correlation and Inference from Regression

1 Correlation and Inference from Regression 1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is

More information

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES 4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES FOR SINGLE FACTOR BETWEEN-S DESIGNS Planned or A Priori Comparisons We previously showed various ways to test all possible pairwise comparisons for

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

2 Prediction and Analysis of Variance

2 Prediction and Analysis of Variance 2 Prediction and Analysis of Variance Reading: Chapters and 2 of Kennedy A Guide to Econometrics Achen, Christopher H. Interpreting and Using Regression (London: Sage, 982). Chapter 4 of Andy Field, Discovering

More information

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and

More information

Regression With a Categorical Independent Variable: Mean Comparisons

Regression With a Categorical Independent Variable: Mean Comparisons Regression With a Categorical Independent Variable: Mean Lecture 16 March 29, 2005 Applied Regression Analysis Lecture #16-3/29/2005 Slide 1 of 43 Today s Lecture comparisons among means. Today s Lecture

More information

A particularly nasty aspect of this is that it is often difficult or impossible to tell if a model fails to satisfy these steps.

A particularly nasty aspect of this is that it is often difficult or impossible to tell if a model fails to satisfy these steps. ECON 497: Lecture 6 Page 1 of 1 Metropolitan State University ECON 497: Research and Forecasting Lecture Notes 6 Specification: Choosing the Independent Variables Studenmund Chapter 6 Before we start,

More information

Using SPSS for One Way Analysis of Variance

Using SPSS for One Way Analysis of Variance Using SPSS for One Way Analysis of Variance This tutorial will show you how to use SPSS version 12 to perform a one-way, between- subjects analysis of variance and related post-hoc tests. This tutorial

More information

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math. Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Sociology 593 Exam 1 Answer Key February 17, 1995

Sociology 593 Exam 1 Answer Key February 17, 1995 Sociology 593 Exam 1 Answer Key February 17, 1995 I. True-False. (5 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. A researcher regressed Y on. When

More information

Advanced Quantitative Data Analysis

Advanced Quantitative Data Analysis Chapter 24 Advanced Quantitative Data Analysis Daniel Muijs Doing Regression Analysis in SPSS When we want to do regression analysis in SPSS, we have to go through the following steps: 1 As usual, we choose

More information

TESTING FOR CO-INTEGRATION

TESTING FOR CO-INTEGRATION Bo Sjö 2010-12-05 TESTING FOR CO-INTEGRATION To be used in combination with Sjö (2008) Testing for Unit Roots and Cointegration A Guide. Instructions: Use the Johansen method to test for Purchasing Power

More information

Unit 27 One-Way Analysis of Variance

Unit 27 One-Way Analysis of Variance Unit 27 One-Way Analysis of Variance Objectives: To perform the hypothesis test in a one-way analysis of variance for comparing more than two population means Recall that a two sample t test is applied

More information

Regression. Notes. Page 1. Output Created Comments 25-JAN :29:55

Regression. Notes. Page 1. Output Created Comments 25-JAN :29:55 REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS CI(95) BCOV R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT favorability /METHOD=ENTER Zcontemp ZAnxious6 zallcontact. Regression Notes Output

More information

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means 4.1 The Need for Analytical Comparisons...the between-groups sum of squares averages the differences

More information

MORE ON SIMPLE REGRESSION: OVERVIEW

MORE ON SIMPLE REGRESSION: OVERVIEW FI=NOT0106 NOTICE. Unless otherwise indicated, all materials on this page and linked pages at the blue.temple.edu address and at the astro.temple.edu address are the sole property of Ralph B. Taylor and

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

The simple linear regression model discussed in Chapter 13 was written as

The simple linear regression model discussed in Chapter 13 was written as 1519T_c14 03/27/2006 07:28 AM Page 614 Chapter Jose Luis Pelaez Inc/Blend Images/Getty Images, Inc./Getty Images, Inc. 14 Multiple Regression 14.1 Multiple Regression Analysis 14.2 Assumptions of the Multiple

More information

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. Contingency Tables Definition & Examples. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. (Using more than two factors gets complicated,

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

SPSS LAB FILE 1

SPSS LAB FILE  1 SPSS LAB FILE www.mcdtu.wordpress.com 1 www.mcdtu.wordpress.com 2 www.mcdtu.wordpress.com 3 OBJECTIVE 1: Transporation of Data Set to SPSS Editor INPUTS: Files: group1.xlsx, group1.txt PROCEDURE FOLLOWED:

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

TOPIC 9 SIMPLE REGRESSION & CORRELATION

TOPIC 9 SIMPLE REGRESSION & CORRELATION TOPIC 9 SIMPLE REGRESSION & CORRELATION Basic Linear Relationships Mathematical representation: Y = a + bx X is the independent variable [the variable whose value we can choose, or the input variable].

More information

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 Answer Key March 28, 2002 Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably

More information

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i B. Weaver (24-Mar-2005) Multiple Regression... 1 Chapter 5: Multiple Regression 5.1 Partial and semi-partial correlation Before starting on multiple regression per se, we need to consider the concepts

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Regression Models. Chapter 4. Introduction. Introduction. Introduction Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager

More information

Inferences About the Difference Between Two Means

Inferences About the Difference Between Two Means 7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent

More information

Practical Biostatistics

Practical Biostatistics Practical Biostatistics Clinical Epidemiology, Biostatistics and Bioinformatics AMC Multivariable regression Day 5 Recap Describing association: Correlation Parametric technique: Pearson (PMCC) Non-parametric:

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Multiple Regression. Peerapat Wongchaiwat, Ph.D. Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com The Multiple Regression Model Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable Regression ith a Independent Variable ERSH 8320 Slide 1 of 34 Today s Lecture Regression with a single categorical independent variable. Today s Lecture Coding procedures for analysis. Dummy coding. Relationship

More information

In the previous chapter, we learned how to use the method of least-squares

In the previous chapter, we learned how to use the method of least-squares 03-Kahane-45364.qxd 11/9/2007 4:40 PM Page 37 3 Model Performance and Evaluation In the previous chapter, we learned how to use the method of least-squares to find a line that best fits a scatter of points.

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Eplained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module

More information

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity

More information

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests: One sided tests So far all of our tests have been two sided. While this may be a bit easier to understand, this is often not the best way to do a hypothesis test. One simple thing that we can do to get

More information

Frequency Distribution Cross-Tabulation

Frequency Distribution Cross-Tabulation Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Do not copy, post, or distribute. Independent-Samples t Test and Mann- C h a p t e r 13

Do not copy, post, or distribute. Independent-Samples t Test and Mann- C h a p t e r 13 C h a p t e r 13 Independent-Samples t Test and Mann- Whitney U Test 13.1 Introduction and Objectives This chapter continues the theme of hypothesis testing as an inferential statistical procedure. In

More information

MORE ON MULTIPLE REGRESSION

MORE ON MULTIPLE REGRESSION DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 MORE ON MULTIPLE REGRESSION I. AGENDA: A. Multiple regression 1. Categorical variables with more than two categories 2. Interaction

More information

Statistics: revision

Statistics: revision NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 29 / 30 April 2004 Department of Experimental Psychology University of Cambridge Handouts: Answers

More information

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS The data used in this example describe teacher and student behavior in 8 classrooms. The variables are: Y percentage of interventions

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

16.400/453J Human Factors Engineering. Design of Experiments II

16.400/453J Human Factors Engineering. Design of Experiments II J Human Factors Engineering Design of Experiments II Review Experiment Design and Descriptive Statistics Research question, independent and dependent variables, histograms, box plots, etc. Inferential

More information

Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons

Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons 1. Suppose we wish to assess the impact of five treatments while blocking for study participant race (Black,

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Multiple linear regression

Multiple linear regression Multiple linear regression Course MF 930: Introduction to statistics June 0 Tron Anders Moger Department of biostatistics, IMB University of Oslo Aims for this lecture: Continue where we left off. Repeat

More information

Survey on Population Mean

Survey on Population Mean MATH 203 Survey on Population Mean Dr. Neal, Spring 2009 The first part of this project is on the analysis of a population mean. You will obtain data on a specific measurement X by performing a random

More information

Sociology 593 Exam 1 February 17, 1995

Sociology 593 Exam 1 February 17, 1995 Sociology 593 Exam 1 February 17, 1995 I. True-False. (25 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. A researcher regressed Y on. When he plotted

More information

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors

More information

12.12 MODEL BUILDING, AND THE EFFECTS OF MULTICOLLINEARITY (OPTIONAL)

12.12 MODEL BUILDING, AND THE EFFECTS OF MULTICOLLINEARITY (OPTIONAL) 12.12 Model Building, and the Effects of Multicollinearity (Optional) 1 Although Excel and MegaStat are emphasized in Business Statistics in Practice, Second Canadian Edition, some examples in the additional

More information

Advanced Regression Topics: Violation of Assumptions

Advanced Regression Topics: Violation of Assumptions Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, 2005 Applied Regression Analysis Lecture #7-2/15/2005 Slide 1 of 36 Today s Lecture Today s Lecture rapping Up Revisiting residuals.

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

16.3 One-Way ANOVA: The Procedure

16.3 One-Way ANOVA: The Procedure 16.3 One-Way ANOVA: The Procedure Tom Lewis Fall Term 2009 Tom Lewis () 16.3 One-Way ANOVA: The Procedure Fall Term 2009 1 / 10 Outline 1 The background 2 Computing formulas 3 The ANOVA Identity 4 Tom

More information

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable Regression With a Independent Variable Lecture 10 November 5, 2008 ERSH 8320 Lecture #10-11/5/2008 Slide 1 of 54 Today s Lecture Today s Lecture Chapter 11: Regression with a single categorical independent

More information

Interactions and Centering in Regression: MRC09 Salaries for graduate faculty in psychology

Interactions and Centering in Regression: MRC09 Salaries for graduate faculty in psychology Psychology 308c Dale Berger Interactions and Centering in Regression: MRC09 Salaries for graduate faculty in psychology This example illustrates modeling an interaction with centering and transformations.

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Analysis of 2x2 Cross-Over Designs using T-Tests

Analysis of 2x2 Cross-Over Designs using T-Tests Chapter 234 Analysis of 2x2 Cross-Over Designs using T-Tests Introduction This procedure analyzes data from a two-treatment, two-period (2x2) cross-over design. The response is assumed to be a continuous

More information

- measures the center of our distribution. In the case of a sample, it s given by: y i. y = where n = sample size.

- measures the center of our distribution. In the case of a sample, it s given by: y i. y = where n = sample size. Descriptive Statistics: One of the most important things we can do is to describe our data. Some of this can be done graphically (you should be familiar with histograms, boxplots, scatter plots and so

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

Simple linear regression

Simple linear regression Simple linear regression Business Statistics 41000 Fall 2015 1 Topics 1. conditional distributions, squared error, means and variances 2. linear prediction 3. signal + noise and R 2 goodness of fit 4.

More information

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n = Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

Chapter 1 Review of Equations and Inequalities

Chapter 1 Review of Equations and Inequalities Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve

More information

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

More information

Statistics Primer. A Brief Overview of Basic Statistical and Probability Principles. Essential Statistics for Data Analysts Using Excel

Statistics Primer. A Brief Overview of Basic Statistical and Probability Principles. Essential Statistics for Data Analysts Using Excel Statistics Primer A Brief Overview of Basic Statistical and Probability Principles Liberty J. Munson, PhD 9/19/16 Essential Statistics for Data Analysts Using Excel Table of Contents What is a Variable?...

More information

Do students sleep the recommended 8 hours a night on average?

Do students sleep the recommended 8 hours a night on average? BIEB100. Professor Rifkin. Notes on Section 2.2, lecture of 27 January 2014. Do students sleep the recommended 8 hours a night on average? We first set up our null and alternative hypotheses: H0: μ= 8

More information

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 29 Multivariate Linear Regression- Model

More information

Analysis of Variance

Analysis of Variance Statistical Techniques II EXST7015 Analysis of Variance 15a_ANOVA_Introduction 1 Design The simplest model for Analysis of Variance (ANOVA) is the CRD, the Completely Randomized Design This model is also

More information

Uni- and Bivariate Power

Uni- and Bivariate Power Uni- and Bivariate Power Copyright 2002, 2014, J. Toby Mordkoff Note that the relationship between risk and power is unidirectional. Power depends on risk, but risk is completely independent of power.

More information

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878 Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each

More information

Is economic freedom related to economic growth?

Is economic freedom related to economic growth? Is economic freedom related to economic growth? It is an article of faith among supporters of capitalism: economic freedom leads to economic growth. The publication Economic Freedom of the World: 2003

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to moderator effects Hierarchical Regression analysis with continuous moderator Hierarchical Regression analysis with categorical

More information

Finding Relationships Among Variables

Finding Relationships Among Variables Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis

More information

Retrieve and Open the Data

Retrieve and Open the Data Retrieve and Open the Data 1. To download the data, click on the link on the class website for the SPSS syntax file for lab 1. 2. Open the file that you downloaded. 3. In the SPSS Syntax Editor, click

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression 1 Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable Y (criterion) is predicted by variable X (predictor)

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

Bivariate Relationships Between Variables

Bivariate Relationships Between Variables Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods

More information

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1 Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories. Chapter Goals To understand the methods for displaying and describing relationship among variables. Formulate Theories Interpret Results/Make Decisions Collect Data Summarize Results Chapter 7: Is There

More information