Chapter 14: Omitted Explanatory Variables, Multicollinearity, and Irrelevant Explanatory Variables

Size: px
Start display at page:

Download "Chapter 14: Omitted Explanatory Variables, Multicollinearity, and Irrelevant Explanatory Variables"

Transcription

1 Chapter 14: Omitted Explanatory Variables, Multicollinearity, and Irrelevant Explanatory Variables Chapter 14 Outline Review o Unbiased Estimation Procedures Estimates and Random Variables Mean of the Estimate s Probability Distribution Variance of the Estimate s Probability Distribution o Correlated and Independent (Uncorrelated) Variables Scatter Diagrams Correlation Coefficient Omitted Explanatory Variables o A Puzzle: Baseball Attendance o Goal of Multiple Regression Analysis o Omitted Explanatory Variables and Bias o Resolving the Baseball Attendance Puzzle o Omitted Variable Summary Multicollinearity o Perfectly Correlated Explanatory Variables o Highly Correlated Explanatory Variables o Earmarks of Multicollinearity Irrelevant Explanatory Variables Chapter 14 Prep Questions 1. Review the goal of multiple regression analysis. In words, explain what multiple regression analysis attempts to do? 2. Recall that the presence of a random variable brings forth both bad news and good news. a. What is the bad news? b. What is the good news? 3. Consider an estimate s probability distribution. Review the importance of its mean and variance: a. Why is the mean of the probability distribution important? Explain. b. Why is the variance of the probability distribution important? Explain. 4. Suppose that two variables are positively correlated. a. In words, what does this mean? b. What type of graph do we use to illustrate their correlation? What does the graph look like? c. What can we say about their correlation coefficient?

2 2 d. When two variables are perfectly positively correlated, what will their correlation coefficient equal? 5. Suppose that two variables are independent (uncorrelated). a. In words, what does this mean? b. What type of graph do we use to illustrate their correlation? What does the graph look like? c. What can we say about their correlation coefficient? Baseball Data: Panel data of baseball statistics for the 588 American League games played during the summer of Attendance t DateDay t DateMonth t DateYear t DayOfWeek t Paid attendance for game t Day of game t Month of game t Year of game t Day of the week for game t (Sunday=0, Monday=1, etc.) DH t Designator hitter for game t (1 if DH permitted; 0 otherwise) HomeGamesBehind t Games behind of the home team for before game t HomeIncome t HomeLosses t HomeNetWins t HomeSalary t HomeWins t PriceTicket t VisitGamesBehind t VisitLosses t VisitNetWins t VisitSalary t Per capita income in home team's city for game t Season losses of the home team before game t Net wins (wins less losses) of the home team before game t Player salaries of the home team for game t (millions of dollars) Season wins of the home team before the game before game t Average price of tickets sold for game t s home team (dollars) Games behind of the visiting team before game t Season losses of the visiting team before the game t Net wins (wins less losses) of the visiting team before game t Player salaries of the visiting team for game t (millions of dollars) Season wins of the visiting team before the game VisitWins t 6. Focus on the baseball data. a. Consider the following simple model: Attendance t = β Const + β Price PriceTicket t + e t

3 3 Attendance depends only on the ticket price. 1) What does the economist s downward sloping demand curve theory suggest about the sign of the PriceTicket coefficient, β Price? 2) Use the ordinary least squares (OLS) estimation procedure to estimate the model s parameters. Interpret the regression results. [Link to MIT-ALSummer-1996.wf1 goes here.] b. Consider a second model: Attendance t = β Const + β Price PriceTicket t + β HomeSalary HomeSalary t + e t Attendance depends not only on the ticket price, but also on the salary of the home team. 1) Devise a theory explaining the effect that home team salary should have on attendance. What does your theory suggest about the sign of the HomeSalary coefficient, β HomeSalary? 2) Use the ordinary least squares (OLS) estimation procedure to estimate both of the model s coefficients. Interpret the regression results. c. What do you observe about the estimates for the PriceTicket coefficients in the two models? 7. Again, focus on the baseball data and consider the following two variables: Attendance t Paid attendance at the game t PriceTicket t Average ticket price in terms of dollars for game t You can access these data by clicking the following link: [Link to MIT-ALSummer-1996.wf1 goes here.] Generate a new variable, PriceCents, to express the price in terms of cents rather than dollars: PriceCents = 100 PriceTicket a. What is the correlation coefficient for PriceTicket and PriceCents? b. Consider the following model: Attendance t = β Const + β PriceTicket PriceTicket t + β PriceCents PriceCents t + e t Run the regression to estimate the parameters of this model. You will get an unusual result. Explain this by considering what multiple regression analysis attempts to do.

4 4 8. The following are excerpts from an article appearing in the New York Times on September 1, 2008: Doubt Grow Over Flu Vaccine in Elderly By Brenda Goodman The influenza vaccine, which has been strongly recommended for people over 65 for more than four decades, is losing its reputation as an effective way to ward off the virus in the elderly. A growing number of immunologists and epidemiologists say the vaccine probably does not work very well for people over 70 The latest blow was a study in The Lancet last month that called into question much of the statistical evidence for the vaccine s effectiveness. The study found that people who were healthy and conscientious about staying well were the most likely to get an annual flu shot. [others] are less likely to get to their doctor s office or a clinic to receive the vaccine. Dr. David K. Shay of the Centers for Disease Control and Prevention, a co-author of a commentary that accompanied Dr. Jackson s study, agreed that these measures of health were not incorporated into early estimations of the vaccine s effectiveness and could well have skewed the findings. a. Does being healthy and conscientious about staying well increase or decrease the chances of getting flu? b. According to the article, are those who are healthy and conscientious about staying well more or less likely to get a flu shot? c. The article alleges that previous studies did not incorporate health and conscientious in judging the effectiveness of flu shots. If the allegation is true, have previous studies overestimated or underestimated the effectiveness of flu shots? d. Suppose that you were the director of your community s health department. You are considering whether or not to subsidize flu vaccines for the elderly. Would you find the previous studies useful? That is, would a study that did not incorporate health and conscientious in judging the effectiveness of flu shots help you decide if your department should spend your limited budget to subsidize flu vaccines? Explain.

5 5 Review Unbiased Estimation Procedures Estimates and Random Variables Estimates are random variables. Consequently, there is both good news and bad news. Before the data are collected and the parameters are estimated: Bad news: We cannot determine the numerical values of the estimates with certainty (even if we knew the actual values). Good news: On the other hand, we can often describe the probability distribution of the estimate telling us how likely it is for the estimate to equal its possible numerical values. Mean (Center) of the Estimate s Probability Distribution An unbiased estimation procedure does not systematically underestimate or overestimate the actual value. The mean (center) of the estimate s probability distribution equals the actual value. Applying the relative frequency interpretation of probability, when the experiment is repeated many, many times, the average of the numerical values of the estimates equals the actual value. Probability Distribution Actual Value Estimate Figure 14.1: Probability Distribution of an Estimate Unbiased Estimation Procedure

6 6 If the distribution is symmetric, we can provide an interpretation that is perhaps even more intuitive. When the experiment were repeated many, many times half the time the estimate is greater than the actual value; half the time the estimate is less than the actual value. Accordingly, we can apply the relative frequency interpretation of probability. In one repetition, the chances that the estimate will be greater than the actual value equal the chances that the estimate will be less. Variance (Spread) of the Estimate s Probability Distribution Variance Figure 14.2: Probability Distribution of an Estimate Importance of Variance When the estimation procedure is unbiased, the distribution variance (spread) indicates the estimate s reliability, the likelihood that the numerical value of the estimate will be close to the actual value.

7 7 Correlated and Independent (Uncorrelated) Variables Two variables are correlated whenever the value of one variable does help us predict the value of the other. independent (uncorrelated) whenever the value of one variable does not help us predict the value of the other; Scatter Diagrams Figure 14.3: Scatter Diagrams, Correlation, and Independence The Dow Jones and Nasdaq growth rates are positively correlated. Most of the scatter diagram points lie in the first and third quadrants. When the Dow Jones growth rate is high, the Nasdaq growth rate is usually high also. Similarly, when the Dow Jones growth rate is low, the Nasdaq growth rate is usually low also. Knowing one growth rate helps us predict the other. On the other hand, Amherst precipitation and the Nasdaq growth rate are independent, uncorrelated. The scatter diagram points are spread rather evenly across the graph. Knowing the Nasdaq growth rate does not help us predict Amherst precipitation and vice versa.

8 8 Correlation Coefficient The correlation coefficient indicates the degree to which two variables are correlated; the correlation coefficient ranges from 1 to +1: = 0 Independent (uncorrelated) Knowing the value of one variable does not help us predict the value of the other. > 0 Positive correlation Typically, when the value of one variable is high, the value of the other variable will be high. < 0 Negative correlation Typically, when the value of one variable is high, the value of the other variable will be low. Omitted Explanatory Variables We shall consider baseball attendance data to study the omitted variable phenomena. Project: Assess the determinants of baseball attendance. Baseball Data: Panel data of baseball statistics for the 588 American League games played during the summer of Attendance t Paid attendance for game t DateDay t Day of game t DateMonth t Month of game t DateYear t Year of game t DayOfWeek t Day of the week for game t (Sunday=0, Monday=1, etc.) DH t Designator hitter for game t (1 if DH permitted; 0 otherwise) HomeGamesBehind t Games behind of the home team before game t HomeIncome t Per capita income in home team's city for game t HomeLosses t Season losses of the home team before game t HomeNetWins t Net wins (wins less losses) of the home team before game t HomeSalary t Player salaries of the home team for game t (millions of dollars) HomeWins t Season wins of the home team before game t PriceTicket t Average price of tickets sold for game t s home team (dollars) VisitGamesBehind t Games behind of the visiting team before game t VisitLosses t Season losses of the visiting team before game t

9 9 VisitNetWins t VisitSalary t VisitWins t Net wins (wins less losses) of the visiting team before game t Player salaries of the visiting team for game t (millions of dollars) Season wins of the visiting team before the game A Puzzle: Baseball Attendance Let us begin our analysis by focusing on the price of tickets. Consider the following two models that attempt to explain game attendance: Model 1: Attendance depends on ticket price only. The first model has a single explanatory variable, ticket price, PriceTicket: Attendance t = β Const + β Price PriceTicket t + e t Downward Sloping Demand Theory: This model is based on the economist s downward sloping demand theory. An increase in the price of a good decreases the quantity demand. Higher ticket prices should reduce attendance; hence, the PriceTicket coefficient should be negative: β Price < 0 We shall use the ordinary least squares (OLS) estimation procedure to estimate the model s parameters: [Link to MIT-ALSummer-1996.wf1 goes here.] Ordinary Least Squares (OLS) Dependent Variable: Attendance Explanatory Variable(s): Estimate SE t-statistic Prob PriceTicket Const Number of Observations 585 Estimated Equation: EstAttendance = 3, ,897PriceTicket Interpretation of Estimates: b PriceTicket = 1,897. We estimate that a $1.00 increase in the price of tickets increases attendance by 1,897 per game. Table 14.1: Baseball Attendance Regression Results Ticket Price Only The estimated coefficient for the ticket price is positive suggesting that higher prices lead to an increase in quantity demanded. This contradicts the downward sloping demand theory, does it not?

10 10 Model 2: Attendance depends on ticket price and salary of home team. In the second model, we include not only the price of tickets, PriceTicket, as an explanatory variable, but also the salary of the home team, HomeSalary: Attendance t = β Const + β Price PriceTicket t + β HomeSalary HomeSalary t + e t We can justify the salary explanatory variable in the grounds that fans like to watch good players. We shall call this the star theory. Presumably, a high salary team has better players, more stars, on its roster and accordingly will draw more fans. Star Theory: Teams with higher salaries will have better players which will increase attendance. The HomeSalary coefficient should be positive: β HomeSalary > 0 Now, use the ordinary least squares (OLS) estimation procedure to estimate the parameters. Ordinary Least Squares (OLS) Dependent Variable: Attendance Explanatory Variable(s): Estimate SE t-statistic Prob PriceTicket HomeSalary Const Number of Observations 585 Estimated Equation: EstAttendance = 9, PriceTicket + 783HomeSalary Interpretation of Estimates: b PriceTicket = 591. We estimate that a $1.00 increase in the price of tickets decreases attendance by 591 per game. b HomeSalary = 783. We estimate that a $1 million increase in the home team salary increases attendance by 783 per game. Table 14.2: Baseball Attendance Regression Results Ticket Price and Home Team Salary These coefficient estimates lend support to our theories. The two models produce very different results concerning the effect of the ticket price on attendance. More specifically, the coefficient estimate for ticket price changes drastically from 1,897 to 591 when we add home team salary as an explanatory variable. This is a disquieting puzzle. We shall resolve this puzzle by reviewing the goal of multiple regression analysis and then explaining when omitting an explanatory variable will prevent us from achieving the goal.

11 11 Goal of Multiple Regression Analysis Multiple regression analysis attempts to sort out the individual effect of each explanatory variable. The estimate of an explanatory variable s coefficient allows us to assess the effect that an individual explanatory variable itself has on the dependent variable. An explanatory variable s coefficient estimate estimates the change in the dependent variable resulting from a change in that particular explanatory variable while all other explanatory variables remain constant. In Model 1 we estimate that a $1.00 increase in the ticket price increase attendance by nearly 2,000 per game whereas in Model 2 we estimate that a $1.00 increase decreases attendance by about 600 per game. The two models suggest that the individual effect of the ticket price is very different. The omitted variable phenomenon allows us to resolve this puzzle. Omitted Explanatory Variables and Bias Claim: Omitting an explanatory variable from a regression will bias the estimation procedure whenever two conditions are met. Bias results if the omitted explanatory variable influences the dependent variable; is correlated with an included explanatory variable. When these two conditions are met, the coefficient estimate of the included explanatory variable is a composite of two effects, the influence that the included explanatory variable itself has on the dependent variable (direct effect); omitted explanatory variable has on the dependent variable because the included explanatory variable also acts as a proxy for the omitted explanatory variable (proxy effect). Since the goal of multiple regression analysis is to sort out the individual effect of each explanatory variable we want to capture only the direct effect.

12 12 Econometrics Lab 14.1: Omitted Variable Proxy Effect We can now use the Econometrics Lab to justify our claims concerning omitted explanatory variables. The following regression model including two explanatory variables is used: Model: y t = β Const + β x1 x1 t + β x2 x2 t + e t [Link to MIT-Lab 14.1 goes here.] Figure 14.4: Omitted Variable Simulation The simulation provides us with two options; we can either include both explanatory variables in the regression, Both Xs or just one, Only X1. By default the Only X1 option is selected, consequently the second explanatory variable is omitted. That is, x1 t is the included explanatory variable and x2 t is the omitted explanatory variable. For simplicity, assume that x1 s coefficient, β x1, is positive. We shall consider three cases to illustrate when bias does and does not result:

13 13 Case 1. The coefficient of the omitted explanatory variable is positive and the two explanatory variables are independent (uncorrelated). Case 2. The coefficient of the omitted explanatory variable equals zero and the two explanatory variables are positively correlated. Case 3. The coefficient of the omitted explanatory variable is positive and the two explanatory variables are positively correlated. We shall now show that only in the last case does bias results because only in the last case is the proxy effect is present. Case 1: The coefficient of the omitted explanatory variable is positive and the two explanatory variables are independent (uncorrelated). Will bias result in this case? Since the two explanatory variables are independent (uncorrelated), an increase in the included explanatory variable, x1 t, typically will not affect the omitted explanatory variable, x2 t. Consequently, the included explanatory variable, x1 t, will not act as a proxy for the omitted explanatory variable, x2 t. Bias should not result. Included variable x1 t up Independence Typically, omitted variable x2 t unaffected β x1 > 0 β x2 > 0 y t up y t unaffected Direct Effect No Proxy Effect We shall use our lab to confirm this logic. By default, the actual coefficient for the included explanatory variable, x1 t, equals 2 and the actual coefficient for the omitted explanatory variable, x2 t, is nonzero, it equals 5. Their correlation coefficient, Corr X1&X2, equals.00; hence, the two explanatory variables are independent (uncorrelated). Be certain that the Pause checkbox is cleared. Click Start and after many, many repetitions, click Stop. Table 14.3 reports that the average value of the coefficient estimates for the included explanatory variable equals its actual value. Both equal 2.0. The ordinary least squares (OLS) estimation procedure is unbiased. Percent of Coef1 Estimates Actual Actual Corr Mean (Average) Below Actual Above Actual Coef 1 Coef 2 Coef of Coef1Estimates Value Value Table 14.3: Omitted Variables Simulation Results

14 14 The ordinary least squares (OLS) estimation procedure captures the individual influence that the included explanatory variable itself has on the dependent variable. This is precisely the effect that we wish to capture. The ordinary least squares (OLS) estimation procedure is unbiased; it is doing what we want it to do. Case 2: The coefficient of the omitted explanatory variable equals zero and the two explanatory variables are positively correlated. In the second case, the two explanatory variables are positively correlated; when the included explanatory variable, x1 t, increases, the omitted explanatory variable, x2 t, will typically increase also. But the actual coefficient of the omitted explanatory variable, β x2, equals 0; hence, the dependent variable, y t, is unaffected by the increase in x2 t. There is no proxy effect because the omitted variable, x2 t, does not affect the dependent variable; hence, bias should not result. Included variable x1 t up Positive Correlation Typically, omitted variable x2 t up β x1 > 0 β x2 = 0 y t up y t unaffected Direct Effect No Proxy Effect To confirm our logic with the simulation, be certain that the actual coefficient for the omitted explanatory variable equals 0 and the correlation coefficient equals.30. Click Start and then after many, many repetitions, click Stop. Table 14.4 reports that the average value of the coefficient estimates for the included explanatory variable equals its actual value. Both equal 2.0. The ordinary least squares (OLS) estimation procedure is unbiased. Percent of Coef1 Estimates Actual Actual Corr Mean (Average) Below Actual Above Actual Coef 1 Coef 2 Coef of Coef1Estimates Value Value Table 14.4: Omitted Variables Simulation Results Again, the ordinary least squares (OLS) estimation procedure captures the influence that the included explanatory variable itself has on the dependent variable. Again, there is no proxy effect and all is well.

15 15 Case 3: The coefficient of the omitted explanatory variable is positive and the two explanatory variables are positively correlated. As with Case 2, the two explanatory variables are positively correlated; when the included explanatory variable, x1 t, increases the omitted explanatory variable, x2 t, will typically increase also. But now, the actual coefficient of the omitted explanatory variable, β x2, is no longer 0, it is positive; hence, an increase in the omitted explanatory variable, x2 t, increases the dependent variable. In additional to having a direct effect on the dependent variable, the included explanatory variable, x1 t, also acts as a proxy for the omitted explanatory variable, x2 t. There is a proxy effect. Included variable x1 t up Positive Correlation Typically, omitted variable x2 t up β x1 > 0 β x2 > 0 y t up y t up Direct Effect Proxy Effect In the simulation, the actual coefficient of omitted explanatory variable, β x2, once again equals 5. The two explanatory variables are still positively correlated, the correlation coefficient equals.30. Click Start and then after many, many repetitions, click Stop. Table 14.5 reports that the average value of the coefficient estimates for the included explanatory variable, 3.5, exceeds its actual value, 2.0. The ordinary least squares (OLS) estimation procedure is biased upward. Percent of Coef1 Estimates Actual Actual Corr Mean (Average) Below Actual Above Actual Coef 1 Coef 2 Coef of Coef1Estimates Value Value Table 14.5: Omitted Variables Simulation Results Now we have a problem. The ordinary least squares (OLS) estimation procedure overstates the influence of the included explanatory variable, the effect that the included explanatory variable itself has on the dependent variable.

16 16 Let us now take a brief aside. Case 3 provides us with the opportunity to illustrate what bias does and does not mean. b Coef1 < Figure 14.5: Probability Distribution of an Estimate Upward Bias b Coef1 What bias does mean: Bias means that the estimation procedure systematically overestimates or underestimates the actual value. In this case, upward bias is present. The average of the estimates is greater than the actual value after many, many repetitions. What bias does not mean: Bias does not mean that the value of the estimate in a single repetition must be less than the actual value in the case of downward bias or greater than the actual value in the case of upward bias. Focus on the last simulation. The ordinary least squares (OLS) estimation procedure is biased upward as a consequence of the proxy effect. Despite the upward bias, however, the estimate of the included explanatory variable is less than the actual value in 12.5 percent of the repetitions. Upward bias does not guarantee that in any one repetition the estimate will be greater than the actual value. It just means that it will be greater on average. If the probability distribution is symmetric, the chances of the estimate being greater than the actual value exceed the chances of being less.

17 17 Now, we return to our three omitted variable cases by summarizing them: Does the omitted Is the omitted Estimation procedure variable influence the variable correlated with for the included Case dependent variable? an included variable? variable is 1 Yes No Unbiased 2 No Yes Unbiased 3 Yes Yes Biased Table 14.6: Omitted Variables Simulation Summary Econometrics Lab 14.2: Avoiding Omitted Variable Bias Question: Is the estimation procedure biased or unbiased when both explanatory variables are included in the regression? [Link to MIT-Lab 14.2 goes here.] To address this question, Both Xs is now selected. This means that both explanatory variables, x1 t and x2 t, will be included in the regression. Both explanatory variables affect the dependent variable and they are correlated. As we saw in Case 3, if one of the explanatory variables is omitted, bias will result. To see what occurs when both explanatory variables are included, click Start and after many, many repetitions, click Stop. When both variables are included, however, the ordinary least squares (OLS) estimation procedure is unbiased: Actual Actual Correlation Mean of Coef 1 Coef 1 Coef 2 Parameter Estimates Table 14.7: Omitted Variables Simulation Results No Omitted Variables Conclusion: To avoid omitted variable bias, all relevant explanatory variables should be included in a regression.

18 18 Resolving the Baseball Attendance Puzzle We begin by reviewing the baseball attendance models: Model 1: Attendance depends on ticket price only. Attendance t = β Const + β Price PriceTicket t + e t Estimated Equation: EstAttendance = 3, ,897PriceTicket Interpretation: We estimate that $1.00 increase in the price of tickets increases by 1,897 per game. Model 2: Attendance depends on ticket price and salary of home team. Attendance t = β Const + β Price PriceTicket t + β HomeSalary HomeSalary t + e t Estimated Equation: EstAttendance = 9, PriceTicket + 783HomeSalary Interpretation: We estimate that a $1.00 increase in the price of tickets decreases attendance by 591 per game. $1 million increase in the home team salary increases attendance by 783 per game. The ticket price coefficient estimate is affected dramatically by the presence of home team salary; in Model 1 the estimate is much higher 1,897 versus 591. Why? We shall now argue that when ticket price is included in the regression and home team salary is omitted, as in Model 1, there reason to believe that the estimation procedure for the ticket price coefficient will be biased. We just learned that the omitted variable bias results when the following two conditions are met; when an omitted explanatory variable: influences the dependent variable and is correlated with an included explanatory variable. Now focus on Model 1. Attendance t = β Const + β Price PriceTicket t + e t Model 1 omits home team salary, HomeSalary t. Are the two omitted variable bias conditions met? It certainly appears reasonable to believe that the omitted explanatory variable, HomeSalary t, affects the dependent variable, Attendance t. The club owner who is paying the high salaries certainly believes so. The owner certainly hopes that by hiring better players more fans will attend the games. Consequently, it appears that the first condition required for omitted variable bias is met. We can confirm the correlation by using statistical software to calculate the correlation matrix:

19 19 Correlation Matrix PriceTicket HomeSalary PriceTicket HomeSalary Table 14.8: Ticket Price and Home Team Salary Correlation Matrix The correlation coefficient between PriceTicket t and HomeSalary t is.78; the variables are positively correlated. The second condition required for omitted variable bias is met. We have reason to suspect bias in Model 1. When the included variable, PriceTicket t, increases the omitted variable, HomeSalary t, typically increases also. An increase in the omitted variable, HomeSalary t, increases the dependent variable, Attendance t : Typically, Included Positive omitted variable Correlation variable PriceTicket t up HomeSalary t up β Price < 0 β HomeSalary > 0 Attendance t Attendance down t up Direct Effect Proxy Effect In additional to having a direct effect on the dependent variable, the included explanatory variable, PriceTicket t, also acts as a proxy for the omitted explanatory variable, HomeSalary t. There is a proxy effect and upward bias results. This provides us with an explanation of why the ticket price coefficient estimate in Model 1 is greater than the estimate in Model 2.

20 20 Omitted Variable Summary Omitting an explanatory variable from a regression biases the estimation procedure whenever two conditions are met. Bias results if the omitted explanatory variable: influences the dependent variable; is correlated with an included explanatory variable. When these two conditions are met, the coefficient estimate of the included explanatory variable is a composite of two effects; the coefficient estimate of the included explanatory reflects the influence that the included explanatory variable itself has on the dependent variable (direct effect); omitted explanatory variable has on the dependent variable because the included explanatory variable also acts as a proxy for the omitted explanatory variable (proxy effect). The bad news is that the proxy effect leads to bias. The good news is that we can eliminate the proxy effect and its accompanying bias by including the omitted explanatory variable. But now, we shall learn that if two explanatory variables are highly correlated a different problem can emerge. Multicollinearity The phenomenon of multicollinearity occurs when two explanatory variables are highly correlated. Recall that multiple regression analysis attempts to sort out the influence of each individual explanatory variable. But what happens when we include two explanatory variables in a single regression that are perfectly correlated? Let us see. Perfectly Correlated Explanatory Variables In our baseball attendance workfile, ticket prices, PriceTicket t, are reported in terms of dollars. Generate a new variable, PriceCents t, reporting ticket prices in terms of cents rather than dollars: PriceCents t = 100 PriceTicket t Note that the variables PriceTicket t and PriceCents t are perfectly correlated. If we know one, we can predict the value of the other with complete accuracy. Just to confirm this, use statistical software to calculate the correlation matrix: Correlation Matrix PriceTicket PriceCents PriceTicket PriceCents Table 14.9: EViews Dollar and Cent Ticket Price Correlation Matrix

21 21 The correlation coefficient of PriceTicket t and PriceCents t equals The variables are indeed perfectly correlated. Now, run a regression with Attendance as the dependent variable and both PriceTicket and PriceCents as explanatory variables. Dependent variable: Attendance Explanatory variables: PriceTicket and PriceCents Your statistical software will report a diagnostic. Different software packages provide different messages, but basically the software is telling us that it cannot run the regression. Why does this occur? The reason is that the two variables are perfectly correlated. Knowing the value of one allows us to predict perfectly the value of the other with complete accuracy. Both explanatory variables contain precisely the same information. Multiple regression analysis attempts to sort out the influence of each individual explanatory variable. But if both variables contain precisely the same information, it is impossible to do this. How can we possibility separate out each variable s individual effect when the two variables contain the identical information? We are asking statistical software to do the impossible. Explanatory variables perfectly correlated Knowing the value of one explanatory value allows us to predict perfectly the value of the other Both variables contain precisely the same information Impossible to separate out the individual effect of each variable Next, we consider a case in which the explanatory variables are highly, although not perfectly, correlated.

22 22 Highly Correlated Explanatory Variables To investigate the problems created by highly correlated explanatory variable we shall use our baseball data to investigate a model that includes four explanatory variables: Attendance t = β Const + β Price PriceTicket t + β HomeSalary HomeSalary t + β HomeNW HomeNetWins t + β HomeGB HomeGamesBehind t + e t Attendance t Paid attendance for game t PriceTicket t Average price of tickets sold for game t s home team (dollars) HomeSalary t Player salaries of the home team for game t (millions of dollars) HomeNetWins t The difference between the number of wins and losses of the home team before game t HomeGamesBehind t Games behind of the home team before game t The variable HomeNetWins t equals the difference between the number of wins and losses of the home team. It attempts to capture the quality of the team. HomeNetWins t will be positive and large for a high quality team, a team that wins many more games than it losses. On the other hand, HomeNetWins t will be a negative number for a low quality team. Since baseball fans enjoy watching high quality teams, we would expect high quality teams to be rewarded with greater attendance: The variable HomeGamesBehind t captures the home team s standing in its divisional race. For those who are not baseball fans, note that all teams that win their division automatically qualify for the baseball playoffs. Ultimately, the two teams what win the American and National League playoffs meet in the World Series. Since it is the goal of every team to win the World Series, each team strives to win its division. Games behind indicates how close a team is to winning its division. To explain how games behind are calculated, consider the final standings of the American League Eastern Division in 2009: Team Wins Losses Home Net Wins Games Behind New York Yankees Boston Red Sox Tampa Bay Rays Toronto Blue Jays Baltimore Orioles Table 14.10: 2009 Final Season Standings AL East

23 23 The Yankees had the best record; the games behind value for the Yankees equals 0. The Red Sox won eight fewer games than the Yankees; hence, the Red Sox were 8 games behind. The Rays won 19 fewer games than the Yankees; hence the Rays were 19 games behind. Similarly, the Blue Jays were 28 games behind and the Orioles 39 games behind. 1 During the season if a team s games behind becomes larger, it becomes less likely the team will win its division, less likely for that team to qualify for the playoffs, and less likely for that team to eventually win the World Series. Consequently, if a team s games behind becomes larger, we would expect home team fans to become discourage resulting in less attendance. We use the terms team quality and division race to summarize our theories regarding home net wins and home team games behind: Team Quality Theory: More net wins increase attendance. β HomeNW > 0. Division Race Theory: More games behind decreases attendance. β HomeGB < 0. We would expect HomeNetWins t and HomeGamesBehind t to be negatively correlated. As HomeNetWins decreases, a team moves farther from the top of its division and consequently HomeGamesBehind t increases. We would expect the correlation coefficient for HomeNetWins t and HomeGamesBehind t to be negative. Let us check by computing their correlation matrix: [Link to MIT-ALSummer-1996.wf1 goes here.] Correlation Matrix HomeNetWins HomeGamesBehind HomeNetWins HomeGamesBehind Table 14.11: HomeNetWins and HomeGamesBehind Correlation Matrix Table reports that the correlation coefficient for HomeGamesBehind t and HomeNetWins t equals.962. Recall that the correlation coefficient must lie between 1 and +1. When two variables are perfectly negatively correlated their correlation coefficient equals 1. While HomeGamesBehind t and HomeNetWins t are not perfectly negatively correlated, they come close; they are highly negatively correlated.

24 24 We use the ordinary least squares (OLS) estimation procedure to estimate the model s parameters: Ordinary Least Squares (OLS) Dependent Variable: Attendance Explanatory Variable(s): Estimate SE t-statistic Prob PriceTicket HomeSalary HomeNetWins HomeGamesBehind Const Number of Observations 585 Estimated Equation: EstAttendance = 11, PriceTicket + 668HomeSalary + 61HomeNetWins 84HomeGamesBehind Interpretation of Estimates: b PriceTicket = 437. We estimate that a $1.00 increase in the price of tickets decreases attendance by 437 per game. b HomeSalary = 668. We estimate that a $1 million increase in the home team salary increases attendance by 668 per game. b HomeGamesBehind = 84. We estimate that 1 additional game behind decreases attendance by 84 per game. Table 14.12: Attendance Regression Results The sign of each estimate supports the theories. Focus on the two new variables included in the model: HomeNetWins t and HomeGamesBehind t. Construct the null and alternative hypotheses. Team Quality Theory H 0 : β HomeNW = 0 Team quality has no effect on attendance H 1 : β HomeNW > 0 Team quality increases attendance Division Race Theory H 0 :β HomeGB = 0 Games behind has no effect on attendance H 1 : β HomeGB < 0 Games behind decreases attendance

25 25 While the signs coefficient estimates are encouraging, some of results are disappointing: The coefficient estimate for HomeNetWins t is positive supporting our theory, but what about the Prob[Results IF H 0 True]? What is the probability that the estimate from one regression would equal or more, if the H 0 were true (that is, if the actual coefficient, β HomeNW, equals 0, if home team quality has no effect on attendance)? Using the tails probability:.4778 Prob[Results IF H 0 True] =.24 2 We cannot reject the null hypothesis at the traditional significance levels of 1, 5, or 10 percent, suggesting that it is quite possible for the null hypothesis to be true, quite possible that home team quality has no effect on attendance. Similarly, The coefficient estimate for HomeGamesBehind t is negative supporting our theory, but what about the Prob[Results IF H 0 True]? What is the probability that the estimate from one regression would equal or less, if the H 0 were true (that is, if the actual coefficient, β HomeGB, equals 0, if games behind has no effect on attendance)? Using the tails probability:.6138 Prob[Results IF H 0 True] =.31 2 Again, we cannot reject the null hypothesis at the traditional significance levels of 1, 5, or 10 percent, suggesting that it is quite possible for the null hypothesis to be true, quite possible that games behind has no effect on attendance.

26 26 Should we abandon our theory as a consequence of these regression results? Let us perform a Wald test to access the proposition that both coefficients equal 0: H 0 : β HomeNW = 0 and β HomeGB = 0 Neither team quality nor games behind have an effect on attendance H 1 : β HomeNW 0 and/or β HomeGB 0 Either team quality and/or games behind have an effect on attendance Wald Test Degrees of Freedom Value Num Dem Prob F-statistic Table 14.13: EViews Wald Test Results Prob[Results IF H 0 True]: What is the probability that the F-statistic would be or more, if the H 0 were true (that is, if both β HomeNW and β HomeGB equal 0, if both team quality and games behind have no effect on attendance)? Prob[Results IF H 0 True] =.0067 We can reject the null hypothesis at a 1 percent significance level; it is unlikely that both team quality and games behind have no effect on attendance. There appears to be a paradox when we compare the t-tests and the Wald test: t-tests Wald test ã é Cannot reject the Cannot reject the Can reject the null null hypothesis null hypothesis hypothesis that both that team quality that games behind team quality and games have no effect have no effect behind have no effect on attendance. on attendance. on attendance. é ã Individually, neither team quality nor games behind appear to influence attendance Team quality and/or games behind do appear to influence attendance Individually, neither team quality nor games behind appears to influence attendance significantly; but taken together by asking if team quality and/or games behind influence attendance, we conclude that they do.

27 27 Next, let us run two regressions each of which includes only one of the two troublesome explanatory variables: Ordinary Least Squares (OLS) Dependent Variable: Attendance Explanatory Variable(s): Estimate SE t-statistic Prob PriceTicket HomeSalary HomeNetWins Const Number of Observations 585 Estimated Equation: EstAttendance = 11, PriceTicket + 672HomeSalary + 100HomeNetWins Interpretation of Estimates: b PriceTicket = 449. We estimate that a $1.00 increase in the price of tickets decreases attendance by 449 per game. b HomeSalary = 672. We estimate that a $1 million increase in the home team salary increases attendance by 672 per game. b HomeNetWins = 100. We estimate that 1 additional home net win increases attendance by 100 per game. Table 14.14: EViews Attendance Regression Results HomeGamesBehind Omitted

28 28 Ordinary Least Squares (OLS) Dependent Variable: Attendance Explanatory Variable(s): Estimate SE t-statistic Prob PriceTicket HomeSalary HomeGamesBehind Const Number of Observations 585 Estimated Equation: EstAttendance = 12, PriceTicket + 671HomeSalary 194HomeGamesBehind Interpretation of Estimates: b PriceTicket = 433. We estimate that a $1.00 increase in the price of tickets decreases attendance by 433 per game. b HomeSalary = 671. We estimate that a $1 million increase in the home team salary increases attendance by 671 per game. b HomeGamesBehind = 194. We estimate that 1 additional game behind decreases attendance by 194 per game. Table 14.15: EViews Attendance Regression Results HomeNetWins Omitted When only a single explanatory variable is included the coefficient is significant. Earmarks of Multicollinearity We are observing what we shall call the earmarks of multicollinearity: Explanatory variables are highly correlated. Regression with both explanatory variables: o t-tests do not allow us to reject the null hypothesis that the coefficient of each individual variable equals 0; when considering each explanatory variable individually, we cannot reject the hypothesis that each individually has no influence. o a Wald test allows us to reject the null hypothesis that the coefficients of both explanatory variables equal 0; when considering both explanatory variables together, we can reject the hypothesis that they have no influence. Regressions with only one explanatory variable appear to produce good results.

29 29 How can we explain this? Recall that multiple regression analysis attempts to sort out the influence of each individual explanatory variable. When two explanatory variables are perfectly correlated, it is impossible for the ordinary least squares (OLS) estimation procedure to separate out the individual influences of each variable. Consequently, if two variables are highly correlated, as team quality and games behind are, it may be very difficult for the ordinary least squares (OLS) estimation procedure to separate out the individual influence of each explanatory variable. This difficulty evidences itself in the variance of the coefficient estimates probability distributions. When two highly correlated variables are included in the same regression, the variances of each estimate s probability distribution is large. This explains our t-test results. Explanatory variables Explanatory variables perfectly correlated highly correlated Knowing the value of one Knowing the value of one variable allows us to variable allows us to predict the other perfectly predict the other very accurately Both variables contain In some sense, both variables the same information contain nearly the same information Impossible to separate out their individual effects Difficult to separate out their individual effects Large variance of each coefficient estimate s probability distribution

30 30 We use a simulation to justify our explanation. Econometrics Lab 14.3: Multicollinearity Figure 14.6: Multicollinearity Simulation Our model includes two explanatory variables, x1 t and x2 t : Model: y = β Const + β x1 x1 t + β x2 x2 t + e t [Link to MIT-Lab 14.3 goes here.] By default the actual value of the coefficient for the first explanatory variable equals 2 and actual value for the second equals 5. Note that the Both Xs is selected; both explanatory variables are included in the regression. Initially, the correlation coefficient is specified as.00; that is, initially the explanatory variables are independent. Be certain that the Pause checkbox is cleared and click Start. After many, many repetitions click Stop. Next, repeat this process for a correlation coefficient of.30, a correlation coefficient of.60, and a correlation coefficient of.90.

31 31 Mean of Variance of Correlation Coef 1 Coef 1 Actual Coef 1 Parameter Estimates Estimates Table 14.16: Multicollinearity Simulation Results The simulation reveals both good news and bad news: Good news: The ordinary least squares (OLS) estimation procedure is unbiased. The mean of the estimate s probability distribution equals the actual value. The estimation procedure does not systematically underestimate or overestimate the actual value. Bad news: As the two explanatory variables become more correlated, the variance of the coefficient estimate s probability distribution increases. Consequently, the estimate from one repetition becomes less reliable. The simulation illustrates the phenomenon of multicollinearity. Irrelevant Explanatory Variables An irrelevant explanatory variable is a variable that does not influence the dependent variable. Including an irrelevant explanatory variable can be viewed as adding noise, an additional element of uncertainty, into the mix. An irrelevant explanatory variable adds a new random influence to the model. If our logic is correct, irrelevant explanatory variables should lead to both good news and bad news: Good news: Random influences do not cause the ordinary least squares (OLS) estimation procedure to be biased. Consequently, the inclusion of an irrelevant explanatory variable does not lead to bias. Bad news: The additional uncertainty added by the new random influence means that the coefficient estimate is less reliable; the variance of the coefficient estimate s probability distribution rises when an irrelevant explanatory variable is present.

32 32 We shall use our Econometrics Lab to justify our intuition. Econometrics Lab 14.4: Irrelevant Explanatory Variables Figure 14.7: Irrelevant Explanatory Variable Simulation [Link to MIT-Lab 14.4 goes here.] Once again we use a two explanatory variable model: Model: y = β Const + β x1 x1 t + β x2 x2 t + e t By default the first explanatory variable, x1 t, is the relevant explanatory variable; the default value of its coefficient is 2. The second explanatory variable, x2 t, is the irrelevant one. An irrelevant explanatory variable has no effect on the dependent variable; consequently, the actual value of its coefficient, β x2, equals 0. Initially, the Only X1 option is selected indicating that only the relevant explanatory variable, x1 t, is included in the regression; the irrelevant explanatory variable, x2 t, is not included. Click Start and then after many, many repetitions click Stop. Since the irrelevant explanatory variable is not included in the regression, correlation between the two explanatory variables should have no impact on the results. Confirm this by changing correlation coefficients from.00 to.30 in the Corr X1&X2 list. Click Start and then after many, many repetitions

33 33 click Stop. Similarly, show that the results are unaffected when the correlation coefficient is.60 and.90. Subsequently, investigate what happens when the irrelevant explanatory variable is included by selecting the Both Xs option; the irrelevant explanatory, x2 t, will now be included in the regression. Be certain that the correlation coefficient for the relevant and irrelevant explanatory variables initially equals.00. Click Start and then after many, many repetitions click Stop. Investigate how correlation between the two explanatory variables affects the results when the irrelevant explanatory variable is included by selecting correlation coefficient values of.30,.60, and.90. For each case, click Start and then after many, many repetitions click Stop. Table reports the results of the lab. Only Variable 1 Included Variables 1 and 2 Included Corr Coef Mean of Variance Mean of Variance Actual for Variables Coef 1 of Coef 1 Coef 1 of Coef 1 Coef 1 1 and 2 Estimates Estimates Estimates Estimates Table 14.17: Irrelevant Explanatory Variable Simulation Results The results reported in Table are not surprising; the results support our intuition: Only Relevant Variable (Variable 1) Included: o The mean of the coefficient estimate for relevant explanatory variable, x1 t, equals 2, the actual value; consequently, the ordinary least squares (OLS) estimation procedure for the coefficient estimate is unbiased. o Naturally, the variance of the coefficient estimate is not affected by correlation between the relevant and irrelevant explanatory variables because the irrelevant explanatory variable is not included in the regression. Both Relevant and Irrelevant Variables (Variables 1 and 2) Included: o The mean of the coefficient estimates for relevant explanatory variable, x1 t, still equals 2, the actual value; consequently, the ordinary least squares (OLS) estimation procedure for the coefficient estimate is unbiased.

Chapter 15: Other Regression Statistics and Pitfalls

Chapter 15: Other Regression Statistics and Pitfalls Chapter 15: Other Regression Statistics and Pitfalls Chapter 15 Outline Two-Tailed Confidence Intervals o Confidence Interval Approach: Which Theories Are Consistent with the Data? o A Confidence Interval

More information

Chapter 10: Multiple Regression Analysis Introduction

Chapter 10: Multiple Regression Analysis Introduction Chapter 10: Multiple Regression Analysis Introduction Chapter 10 Outline Simple versus Multiple Regression Analysis Goal of Multiple Regression Analysis A One-Tailed Test: Downward Sloping Demand Theory

More information

Monday, November 26: Explanatory Variable Explanatory Premise, Bias, and Large Sample Properties

Monday, November 26: Explanatory Variable Explanatory Premise, Bias, and Large Sample Properties Amherst College Department of Economics Economics 360 Fall 2012 Monday, November 26: Explanatory Variable Explanatory Premise, Bias, and Large Sample Properties Chapter 18 Outline Review o Regression Model

More information

Chapter 11 Handout: Hypothesis Testing and the Wald Test

Chapter 11 Handout: Hypothesis Testing and the Wald Test Chapter 11 Handout: Hypothesis Testing and the Wald Test Preview No Money Illusion Theory: Calculating True] o Clever Algebraic Manipulation o Wald Test Restricted Regression Reflects Unrestricted Regression

More information

Chapter 5: Ordinary Least Squares Estimation Procedure The Mechanics Chapter 5 Outline Best Fitting Line Clint s Assignment Simple Regression Model o

Chapter 5: Ordinary Least Squares Estimation Procedure The Mechanics Chapter 5 Outline Best Fitting Line Clint s Assignment Simple Regression Model o Chapter 5: Ordinary Least Squares Estimation Procedure The Mechanics Chapter 5 Outline Best Fitting Line Clint s Assignment Simple Regression Model o Parameters of the Model o Error Term and Random Influences

More information

Chapter 12: Model Specification and Development

Chapter 12: Model Specification and Development Chapter 12: Model Specification and Development Chapter 12 Outline Model Specification: Ramsey REgression Specification Error Test (RESET) o RESET Logic o Linear Demand Model o Constant Elasticity Demand

More information

Amherst College Department of Economics Economics 360 Fall 2012

Amherst College Department of Economics Economics 360 Fall 2012 Amherst College Department of Economics Economics 360 Fall 2012 Monday, December 3: Omitted Variables and the Instrumental Variable Estimation Procedure Chapter 20 Outline Revisit Omitted Explanatory Variable

More information

Chapter 8 Handout: Interval Estimates and Hypothesis Testing

Chapter 8 Handout: Interval Estimates and Hypothesis Testing Chapter 8 Handout: Interval Estimates and Hypothesis esting Preview Clint s Assignment: aking Stock General Properties of the Ordinary Least Squares (OLS) Estimation Procedure Estimate Reliability: Interval

More information

Wednesday, October 17 Handout: Hypothesis Testing and the Wald Test

Wednesday, October 17 Handout: Hypothesis Testing and the Wald Test Amherst College Department of Economics Economics 360 Fall 2012 Wednesday, October 17 Handout: Hypothesis Testing and the Wald Test Preview No Money Illusion Theory: Calculating True] o Clever Algebraic

More information

Solutions: Monday, October 15

Solutions: Monday, October 15 Amherst College Department of Economics Economics 360 Fall 2012 1. Consider Nebraska petroleum consumption. Solutions: Monday, October 15 Petroleum Consumption Data for Nebraska: Annual time series data

More information

Wednesday, September 19 Handout: Ordinary Least Squares Estimation Procedure The Mechanics

Wednesday, September 19 Handout: Ordinary Least Squares Estimation Procedure The Mechanics Amherst College Department of Economics Economics Fall 2012 Wednesday, September 19 Handout: Ordinary Least Squares Estimation Procedure he Mechanics Preview Best Fitting Line: Income and Savings Clint

More information

An Introduction to Econometrics. A Self-contained Approach. Frank Westhoff. The MIT Press Cambridge, Massachusetts London, England

An Introduction to Econometrics. A Self-contained Approach. Frank Westhoff. The MIT Press Cambridge, Massachusetts London, England An Introduction to Econometrics A Self-contained Approach Frank Westhoff The MIT Press Cambridge, Massachusetts London, England How to Use This Book xvii 1 Descriptive Statistics 1 Chapter 1 Prep Questions

More information

[Mean[e j ] Mean[e i ]]

[Mean[e j ] Mean[e i ]] Amherst College Department of Economics Economics 360 Fall 202 Solutions: Wednesday, September 26. Assume that the standard ordinary least square (OLS) premises are met. Let (x i, y i ) and (, y j ) be

More information

Monday, October 15 Handout: Multiple Regression Analysis Introduction

Monday, October 15 Handout: Multiple Regression Analysis Introduction Amherst College Department of Economics Economics 360 Fall 2012 Monday, October 15 Handout: Multiple Regression Analysis Introduction Review Simple and Multiple Regression Analysis o Distinction between

More information

Chapter 13: Dummy and Interaction Variables

Chapter 13: Dummy and Interaction Variables Chapter 13: Dummy and eraction Variables Chapter 13 Outline Preliminary Mathematics: Averages and Regressions Including Only a Constant An Example: Discrimination in Academia o Average Salaries o Dummy

More information

Wednesday, October 10 Handout: One-Tailed Tests, Two-Tailed Tests, and Logarithms

Wednesday, October 10 Handout: One-Tailed Tests, Two-Tailed Tests, and Logarithms Amherst College Department of Economics Economics 360 Fall 2012 Wednesday, October 10 Handout: One-Tailed Tests, Two-Tailed Tests, and Logarithms Preview A One-Tailed Hypothesis Test: The Downward Sloping

More information

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems Functional form misspecification We may have a model that is correctly specified, in terms of including

More information

Amherst College Department of Economics Economics 360 Fall 2015 Monday, December 7 Problem Set Solutions

Amherst College Department of Economics Economics 360 Fall 2015 Monday, December 7 Problem Set Solutions Amherst College epartment of Economics Economics 3 Fall 2015 Monday, ecember 7 roblem et olutions 1. Consider the following linear model: tate residential Election ata: Cross section data for the fifty

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 24, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 17, 2010 Instructor: John Parman Final Exam - Solutions You have until 12:30pm to complete this exam. Please remember to put your

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Eplained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

ECON2228 Notes 8. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 35

ECON2228 Notes 8. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 35 ECON2228 Notes 8 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 6 2014 2015 1 / 35 Functional form misspecification Chapter 9: More on specification and data problems

More information

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity 1/25 Outline Basic Econometrics in Transportation Heteroscedasticity What is the nature of heteroscedasticity? What are its consequences? How does one detect it? What are the remedial measures? Amir Samimi

More information

In the previous chapter, we learned how to use the method of least-squares

In the previous chapter, we learned how to use the method of least-squares 03-Kahane-45364.qxd 11/9/2007 4:40 PM Page 37 3 Model Performance and Evaluation In the previous chapter, we learned how to use the method of least-squares to find a line that best fits a scatter of points.

More information

Regression Analysis. BUS 735: Business Decision Making and Research

Regression Analysis. BUS 735: Business Decision Making and Research Regression Analysis BUS 735: Business Decision Making and Research 1 Goals and Agenda Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn

More information

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression. PBAF 528 Week 8 What are some problems with our model? Regression models are used to represent relationships between a dependent variable and one or more predictors. In order to make inference from the

More information

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H. ACE 564 Spring 2006 Lecture 8 Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information by Professor Scott H. Irwin Readings: Griffiths, Hill and Judge. "Collinear Economic Variables,

More information

date: math analysis 2 chapter 18: curve fitting and models

date: math analysis 2 chapter 18: curve fitting and models name: period: date: math analysis 2 mr. mellina chapter 18: curve fitting and models Sections: 18.1 Introduction to Curve Fitting; the Least-Squares Line 18.2 Fitting Exponential Curves 18.3 Fitting Power

More information

Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions

Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions Part 1: Conceptual ideas about correlation and regression Tintle 10.1.1 The association would be negative (as distance increases,

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C = Economics 130 Lecture 6 Midterm Review Next Steps for the Class Multiple Regression Review & Issues Model Specification Issues Launching the Projects!!!!! Midterm results: AVG = 26.5 (88%) A = 27+ B =

More information

APPENDIX 1 BASIC STATISTICS. Summarizing Data

APPENDIX 1 BASIC STATISTICS. Summarizing Data 1 APPENDIX 1 Figure A1.1: Normal Distribution BASIC STATISTICS The problem that we face in financial analysis today is not having too little information but too much. Making sense of large and often contradictory

More information

1 A Non-technical Introduction to Regression

1 A Non-technical Introduction to Regression 1 A Non-technical Introduction to Regression Chapters 1 and Chapter 2 of the textbook are reviews of material you should know from your previous study (e.g. in your second year course). They cover, in

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

Multiple Regression Analysis

Multiple Regression Analysis Chapter 4 Multiple Regression Analysis The simple linear regression covered in Chapter 2 can be generalized to include more than one variable. Multiple regression analysis is an extension of the simple

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47 ECON2228 Notes 2 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 2 2014 2015 1 / 47 Chapter 2: The simple regression model Most of this course will be concerned with

More information

Semester 2, 2015/2016

Semester 2, 2015/2016 ECN 3202 APPLIED ECONOMETRICS 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University of Guyana 1 Semester 2, 2015/2016 PREDICTION The true value of y when x takes some particular

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

Wednesday, September 26 Handout: Estimating the Variance of an Estimate s Probability Distribution

Wednesday, September 26 Handout: Estimating the Variance of an Estimate s Probability Distribution Amherst College Department of Economics Economics 60 Fall 2012 Wednesday, September 26 Handout: Estimating the Variance of an Estimate s Probability Distribution Preview: Review: Ordinary Least Squares

More information

Question 1 [17 points]: (ch 11)

Question 1 [17 points]: (ch 11) Question 1 [17 points]: (ch 11) A study analyzed the probability that Major League Baseball (MLB) players "survive" for another season, or, in other words, play one more season. They studied a model of

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Most of this course will be concerned with use of a regression model: a structure in which one or more explanatory

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Mgmt 469. Causality and Identification

Mgmt 469. Causality and Identification Mgmt 469 Causality and Identification As you have learned by now, a key issue in empirical research is identifying the direction of causality in the relationship between two variables. This problem often

More information

CHAPTER 6: SPECIFICATION VARIABLES

CHAPTER 6: SPECIFICATION VARIABLES Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

2 Prediction and Analysis of Variance

2 Prediction and Analysis of Variance 2 Prediction and Analysis of Variance Reading: Chapters and 2 of Kennedy A Guide to Econometrics Achen, Christopher H. Interpreting and Using Regression (London: Sage, 982). Chapter 4 of Andy Field, Discovering

More information

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity R.G. Pierse 1 Omitted Variables Suppose that the true model is Y i β 1 + β X i + β 3 X 3i + u i, i 1,, n (1.1) where β 3 0 but that the

More information

LI EAR REGRESSIO A D CORRELATIO

LI EAR REGRESSIO A D CORRELATIO CHAPTER 6 LI EAR REGRESSIO A D CORRELATIO Page Contents 6.1 Introduction 10 6. Curve Fitting 10 6.3 Fitting a Simple Linear Regression Line 103 6.4 Linear Correlation Analysis 107 6.5 Spearman s Rank Correlation

More information

Hint: The following equation converts Celsius to Fahrenheit: F = C where C = degrees Celsius F = degrees Fahrenheit

Hint: The following equation converts Celsius to Fahrenheit: F = C where C = degrees Celsius F = degrees Fahrenheit Amherst College Department of Economics Economics 360 Fall 2014 Exam 1: Solutions 1. (10 points) The following table in reports the summary statistics for high and low temperatures in Key West, FL from

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

Simple Linear Regression

Simple Linear Regression CHAPTER 13 Simple Linear Regression CHAPTER OUTLINE 13.1 Simple Linear Regression Analysis 13.2 Using Excel s built-in Regression tool 13.3 Linear Correlation 13.4 Hypothesis Tests about the Linear Correlation

More information

Regression Analysis. A statistical procedure used to find relations among a set of variables.

Regression Analysis. A statistical procedure used to find relations among a set of variables. Regression Analysis A statistical procedure used to find relations among a set of variables. Understanding relations Mapping data enables us to examine (describe) where things occur (e.g., areas where

More information

Econometrics -- Final Exam (Sample)

Econometrics -- Final Exam (Sample) Econometrics -- Final Exam (Sample) 1) The sample regression line estimated by OLS A) has an intercept that is equal to zero. B) is the same as the population regression line. C) cannot have negative and

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Handout 12. Endogeneity & Simultaneous Equation Models

Handout 12. Endogeneity & Simultaneous Equation Models Handout 12. Endogeneity & Simultaneous Equation Models In which you learn about another potential source of endogeneity caused by the simultaneous determination of economic variables, and learn how to

More information

Outline. Lesson 3: Linear Functions. Objectives:

Outline. Lesson 3: Linear Functions. Objectives: Lesson 3: Linear Functions Objectives: Outline I can determine the dependent and independent variables in a linear function. I can read and interpret characteristics of linear functions including x- and

More information

In order to carry out a study on employees wages, a company collects information from its 500 employees 1 as follows:

In order to carry out a study on employees wages, a company collects information from its 500 employees 1 as follows: INTRODUCTORY ECONOMETRICS Dpt of Econometrics & Statistics (EA3) University of the Basque Country UPV/EHU OCW Self Evaluation answers Time: 21/2 hours SURNAME: NAME: ID#: Specific competences to be evaluated

More information

ECON 497: Lecture Notes 10 Page 1 of 1

ECON 497: Lecture Notes 10 Page 1 of 1 ECON 497: Lecture Notes 10 Page 1 of 1 Metropolitan State University ECON 497: Research and Forecasting Lecture Notes 10 Heteroskedasticity Studenmund Chapter 10 We'll start with a quote from Studenmund:

More information

Chapter 14 Multiple Regression Analysis

Chapter 14 Multiple Regression Analysis Chapter 14 Multiple Regression Analysis 1. a. Multiple regression equation b. the Y-intercept c. $374,748 found by Y ˆ = 64,1 +.394(796,) + 9.6(694) 11,6(6.) (LO 1) 2. a. Multiple regression equation b.

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Regression Models. Chapter 4. Introduction. Introduction. Introduction Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager

More information

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variable In this lecture: We shall look at two quantitative variables.

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

Chapter 10 Nonlinear Models

Chapter 10 Nonlinear Models Chapter 10 Nonlinear Models Nonlinear models can be classified into two categories. In the first category are models that are nonlinear in the variables, but still linear in terms of the unknown parameters.

More information

Multiple Regression Theory 2006 Samuel L. Baker

Multiple Regression Theory 2006 Samuel L. Baker MULTIPLE REGRESSION THEORY 1 Multiple Regression Theory 2006 Samuel L. Baker Multiple regression is regression with two or more independent variables on the right-hand side of the equation. Use multiple

More information

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold. Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 1 SOLUTIONS Spring 2015 Instructor: Martin Farnham Unless provided with information to the contrary,

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

download instant at

download instant at Answers to Odd-Numbered Exercises Chapter One: An Overview of Regression Analysis 1-3. (a) Positive, (b) negative, (c) positive, (d) negative, (e) ambiguous, (f) negative. 1-5. (a) The coefficients in

More information

REED TUTORIALS (Pty) LTD ECS3706 EXAM PACK

REED TUTORIALS (Pty) LTD ECS3706 EXAM PACK REED TUTORIALS (Pty) LTD ECS3706 EXAM PACK 1 ECONOMETRICS STUDY PACK MAY/JUNE 2016 Question 1 (a) (i) Describing economic reality (ii) Testing hypothesis about economic theory (iii) Forecasting future

More information

405 ECONOMETRICS Chapter # 11: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED? Domodar N. Gujarati

405 ECONOMETRICS Chapter # 11: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED? Domodar N. Gujarati 405 ECONOMETRICS Chapter # 11: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED? Domodar N. Gujarati Prof. M. El-Sakka Dept of Economics Kuwait University In this chapter we take a critical

More information

MORE ON SIMPLE REGRESSION: OVERVIEW

MORE ON SIMPLE REGRESSION: OVERVIEW FI=NOT0106 NOTICE. Unless otherwise indicated, all materials on this page and linked pages at the blue.temple.edu address and at the astro.temple.edu address are the sole property of Ralph B. Taylor and

More information

13.7 ANOTHER TEST FOR TREND: KENDALL S TAU

13.7 ANOTHER TEST FOR TREND: KENDALL S TAU 13.7 ANOTHER TEST FOR TREND: KENDALL S TAU In 1969 the U.S. government instituted a draft lottery for choosing young men to be drafted into the military. Numbers from 1 to 366 were randomly assigned to

More information

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS QUESTIONS 5.1. (a) In a log-log model the dependent and all explanatory variables are in the logarithmic form. (b) In the log-lin model the dependent variable

More information

1 Correlation between an independent variable and the error

1 Correlation between an independent variable and the error Chapter 7 outline, Econometrics Instrumental variables and model estimation 1 Correlation between an independent variable and the error Recall that one of the assumptions that we make when proving the

More information

11.5 Regression Linear Relationships

11.5 Regression Linear Relationships Contents 11.5 Regression............................. 835 11.5.1 Linear Relationships................... 835 11.5.2 The Least Squares Regression Line........... 837 11.5.3 Using the Regression Line................

More information

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. B203: Quantitative Methods Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. Part I: Compulsory Questions. Answer all questions. Each question carries

More information

Objectives for Linear Activity. Calculate average rate of change/slope Interpret intercepts and slope of linear function Linear regression

Objectives for Linear Activity. Calculate average rate of change/slope Interpret intercepts and slope of linear function Linear regression Objectives for Linear Activity Calculate average rate of change/slope Interpret intercepts and slope of linear function Linear regression 1 Average Rate of Change & Slope On a graph, average rate of change

More information

Lecture #8 & #9 Multiple regression

Lecture #8 & #9 Multiple regression Lecture #8 & #9 Multiple regression Starting point: Y = f(x 1, X 2,, X k, u) Outcome variable of interest (movie ticket price) a function of several variables. Observables and unobservables. One or more

More information

Sociology 593 Exam 1 Answer Key February 17, 1995

Sociology 593 Exam 1 Answer Key February 17, 1995 Sociology 593 Exam 1 Answer Key February 17, 1995 I. True-False. (5 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. A researcher regressed Y on. When

More information

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variables In this lecture: We shall look at two quantitative variables.

More information

Chapter 1. An Overview of Regression Analysis. Econometrics and Quantitative Analysis. What is Econometrics? (cont.) What is Econometrics?

Chapter 1. An Overview of Regression Analysis. Econometrics and Quantitative Analysis. What is Econometrics? (cont.) What is Econometrics? Econometrics and Quantitative Analysis Using Econometrics: A Practical Guide A.H. Studenmund 6th Edition. Addison Wesley Longman Chapter 1 An Overview of Regression Analysis Instructor: Dr. Samir Safi

More information

ECON Introductory Econometrics. Lecture 16: Instrumental variables

ECON Introductory Econometrics. Lecture 16: Instrumental variables ECON4150 - Introductory Econometrics Lecture 16: Instrumental variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 12 Lecture outline 2 OLS assumptions and when they are violated Instrumental

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

LECTURE 15: SIMPLE LINEAR REGRESSION I

LECTURE 15: SIMPLE LINEAR REGRESSION I David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).

More information

Chapter 7. Testing Linear Restrictions on Regression Coefficients

Chapter 7. Testing Linear Restrictions on Regression Coefficients Chapter 7 Testing Linear Restrictions on Regression Coefficients 1.F-tests versus t-tests In the previous chapter we discussed several applications of the t-distribution to testing hypotheses in the linear

More information

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors: Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

Swarthmore Honors Exam 2012: Statistics

Swarthmore Honors Exam 2012: Statistics Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may

More information

FAQ: Linear and Multiple Regression Analysis: Coefficients

FAQ: Linear and Multiple Regression Analysis: Coefficients Question 1: How do I calculate a least squares regression line? Answer 1: Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables so that one variable

More information