Chapter 15: Other Regression Statistics and Pitfalls
|
|
- Stuart Davis
- 5 years ago
- Views:
Transcription
1 Chapter 15: Other Regression Statistics and Pitfalls Chapter 15 Outline Two-Tailed Confidence Intervals o Confidence Interval Approach: Which Theories Are Consistent with the Data? o A Confidence Interval Example: Television Growth Rates o Calculating Confidence Intervals with Statistical Software Coefficient of Determination, R-Squared (R ) Pitfalls o Explanatory Variable Has the Same Value for All Observations o One Explanatory Variable Is a Linear Combination of Other Explanatory Variables o Dependent Variable Is a Linear Combination of Explanatory Variables o Outlier Observations o Dummy Variable Trap Chapter 15 Prep Questions 1. A friend believes that the internet is displacing the television as a source of news and entertainment. The friend theorizes that after accounting for other factors, television usage is falling by 1 percent annually: 1.0 Percent Growth Rate Theory: After accounting for all other factors, the annual growth rate of television users is negative, 1.0 percent. Recall the model we used previously to explain television use: LogUsers = β + β + β CapitalHuman t Const t CapHum t + β CapitalPhysical + β GdpPC + β Auth + e CapPhy t GDP t Auth t t and the data we used: Internet and Data: Panel data of Internet,, economic, and political statistics for 08 countries from 1995 to 00. [Link to MIT-InternetFlat wf1 goes here.] LogUsersInternet t LogUsers t t Logarithm of Internet users per 1,000 people for observation t Logarithm of television users per 1,000 people for observation t for observation t
2 CapitalHuman t Literacy rate for observation t (percent of population 15 and over) CapitalPhysical t Telephone mainlines per 10,000 people for observation t GdpPC t Per capita real GDP in nation t (1,000 s of international dollars) Auth t The Freedom House measures of political authoritarianism for observation t normalized to a 0 to 10 scale. 0 represents the most democratic rating and 10 the most authoritarian. During the period, Canada and the U.S. had a 0 rating; Iraq and the Democratic Republic of Korea (North Korea) rated 10. Now, assess your friend s theory. a. Use the ordinary least squares (OLS) estimation procedure to estimate the model s parameters. [Link to MIT-InternetFlat wf1 goes here.] b. Formulate the appropriate null and alternative hypotheses. Is a onetailed or a two-tailed test appropriate? c. Use the Econometrics Lab to calculate the Prob[Results IF H 0 True]. [Link to MIT-TTest 0.1 goes here.]. A regression s coefficient of determination, called the R-Squared, is referred to as the goodness of fit. It equals the portion of the dependent variable s squared deviations from its mean that is explained by the parameter estimates: R Explained Squared Deviations from the Mean = = Actual Squared Deviations from the Mean t= 1 T T ( Esty y) t= 1 t ( y y) Calculate the R-Squared for Professor Lord s first quiz by filling in the following blanks: Actual y Actual Explained y Explained Deviation Squared Esty Deviation Squared from Mean Deviation Equals from Mean Deviation Student x t y t yt y ( yt y) x Estyt y ( Estyt y) t
3 T t= 1 y = t T ( yt y) = t= 1 T ( Estyt y) = t= 1 y = = R -Squared = = 3 3. Students frequently experience difficulties when analyzing data. To illustrate some of these we first review the goal of multiple regression analysis: Goal of Multiple Regression Analysis: Multiple regression analysis attempts to sort out the individual effect of each explanatory variable. An explanatory variable s coefficient estimate allows us to estimate the change in the dependent variable resulting from a change in that particular explanatory variable while all other explanatory variables remain constant. Reconsider our baseball data for Baseball Data: Panel data of baseball statistics for the 588 American League games played during the summer of Attendance t Paid attendance for game t DH t Designator hitter for game t (1 if DH permitted; 0 otherwise) HomeSalary t Player salaries of the home team for game t (millions of dollars) PriceTicket t Average price of tickets sold for game t s home team (dollars) VisitSalary t Player salaries of the visiting team for game t (millions of dollars) Now, consider several pitfalls that students often encounter: a. Explanatory variable has the same value for all observations. Run the following regression: [Link to MIT-ALSummer-1996.wf1 goes here.] Dependent variable: Attendance Explanatory variables: PriceTicket, HomeSalary, and DH 1) What happens? ) What is the value of DH t for each of the observations? 3) Why is it impossible to determine the effect of an explanatory variable if the explanatory variable has the same value for each observation? Explain. b. One explanatory variable is a linear combination of other explanatory variables. Generate a new variable, the ticket price in terms of cents: PriceCents = 100 PriceTicket Run the following regression: [Link to MIT-ALSummer-1996.wf1 goes here.]
4 4 Dependent variable: Attendance Explanatory variables: PriceTicket, PriceCents, and HomeSalary 1) What happens? ) Is it possible to sort out the effect of two explanatory variables when they contain redundant information? c. One explanatory variable is a linear combination of other explanatory variables another example. Generate a new variable, the total salaries of the two teams playing: TotalSalary = HomeSalary + VisitSalary Run the following regression: [Link to MIT-ALSummer-1996.wf1 goes here.] Dependent variable: Attendance Explanatory variables: PriceTicket, HomeSalary, VisitSalary, and TotalSalary 1) What happens? ) Is it possible to sort out the effect of explanatory variables when they are linear combinations of each other and therefore contain redundant information? d. Dependent variable is a linear combination of explanatory variables. Run the following regression: [Link to MIT-ALSummer-1996.wf1 goes here.] Dependent variable: TotalSalary Explanatory variables: HomeSalary and VisitSalary What happens? e. Outlier observations. First, run the following regression: [Link to MIT-ALSummer-1996.wf1 goes here.] Dependent variable: Attendance Explanatory variables: PriceTicket and HomeSalary, 1) What is the coefficient estimate for the ticket price? ) Look at the first observation. What is the value of HomeSalary for the first observation? Now, access a second workfile in which a single value was entered incorrectly. [Link to MIT-ALSummerOutlier-1996.wf1 goes here.]
5 5 3) Look at the first observation. What is the value of HomeSalary for the first observation? Is the value that was entered correctly? Run the following regression: Dependent variable: Attendance Explanatory variables: PriceTicket and HomeSalary 4) Compare the coefficient estimates in the two regressions. 4. Return to our faculty salary data. Faculty Salary Data: Artificially constructed cross section salary data and characteristics for 00 faculty members. Salary t Salary of faculty member t (dollars) Experience t Teaching experience for faculty member t (years) Articles t Number of articles published by faculty member t SexM1 t 1 if faculty member t is male; 0 if female As we did in Chapter 13, generate the dummy variable SexF1 which equals 1 for a woman and 0 for a man: Run the following three regressions specifying Salary as the dependent variable: [Link to MIT-FacultySalaries.wf1 goes here.] a. Explanatory variables: SexF1 and Experience b. Explanatory variables: SexM1 and Experience c. Explanatory variables: SexF1, SexM1, and Experience but without a constant Getting Started in EViews To estimate the third model (part c) using EViews, you must fool EViews into running the appropriate regression: In the Workfile window: highlight Salary and then while depressing <Ctrl> highlight SexF1, SexM1, and Experience. In the Workfile window: double click on a highlighted variable. Click Open Equation. In the Equation Specification window delete c so that the window looks like this: salary sexf1 sexm1 experience. Click OK. For each regression, what is the equation that estimates the salary for 1) men?
6 6 ) women? Last, run one more regression specifying Salary as the dependent variable: d. Explanatory variables: SexF1, SexM1, and Experience but with a constant. What happens? 5. Consider a system of linear equations of equations and 3 unknowns. Can you solve for all three unknowns? Two-Tailed Confidence Intervals: Which Theories Are Consistent with the Data? Our approach thus far has been to present a theory first and then use data to assess the theory: First, we presented a theory. Second, we analyzed the data to determine whether or not the data were consistent with the theory. In other words, we have started with a theory and then decided whether or not the data were consistent with the theory. The confidence interval approach reverses this process. Confidence intervals indicate the range of theories that are consistent with the data. First, we analyze the data. Second, we consider various theories and determine which theories are consistent with the data and which are not. In other words, the confidence interval approach starts with the data and then decides what theories are compatible. Hypothesis testing plays a key role in both approaches. Consequently, we must choose a significance level. A confidence interval s size determines the significance level. We use significance levels to distinguish between a small probability and a large probability. The significance level associated with a confidence interval equals 100 percent less the size of the two-tailed confidence interval. Three commonly used significance levels are 90, 95, and 99 percent: For a 90 percent confidence interval, the significance level is 10 percent. For a 95 percent confidence interval, the significance level is 5 percent. For a 99 percent confidence interval, the significance level is 1 percent. A theory is consistent with the data if we cannot reject the null hypothesis at the confidence interval s significance level.
7 7 A Confidence Interval Example: Television Growth Rates No doubt this sounds confusing, so let us work through an example using our international television data: Project: Which growth theories are consistent with the international television data? Internet and Data: Panel data of Internet,, economic, and political statistics for 08 countries from 1995 to 00. [Link to MIT-InternetFlat wf1 goes here.] LogUsersInternet t Logarithm of Internet users per 1,000 people for observation t LogUsers t Logarithm of television users per 1,000 people for observation t t for observation t CapitalHuman t Literacy rate for observation t (percent of population 15 and over) CapitalPhysical t Telephone mainlines per 10,000 people for observation t GdpPC t Per capita real GDP in nation t (1,000 s of international dollars) Auth t The Freedom House measures of political authoritarianism for observation t normalized to a 0 to 10 scale. 0 represents the most democratic rating and 10 the most authoritarian. During the period, Canada and the U.S. had a 0 rating; Iraq and the Democratic Republic of Korea (North Korea) rated 10.
8 8 We begin by specifying the size of the confidence interval. It is most common to specify a 95 percent confidence interval. This means that we are choosing a significance level of 5 percent. The following two steps formalize the procedure to decide whether a theory lies within the two-tailed 95 percent confidence interval: Step 1: Analyze the data. Use the ordinary least squares (OLS) estimation procedure to estimate the model s parameters. Step : Consider a specific theory. Is the theory consistent with the data? Does the theory lie within the confidence interval? o Step a: Based on the theory, construct the null and alternative hypotheses. The null hypothesis reflects the theory. o Step b: Compute Prob[Results IF H 0 True]. o Step c: Do we reject the null hypothesis? Yes: Reject the theory. The data are not consistent with the theory. The theory does not lie within the confidence interval. No: The data are consistent with the theory. The theory does lie within the confidence interval. Since our example uses a 95 percent confidence interval and hence a 5 percent significance level: Prob[Results IF H 0 True] <.05 Prob[Results IF H 0 True] >.05 Reject H 0 Do not reject H 0 Theory is not consistent with the Theory is consistent with the data. data. Theory does not lie within the Theory does lie within the 95 percent confidence interval 95 percent confidence interval We shall illustrate illustrate the steps by focusing on four growth rate theories postulating what the growth rate of television use equals after accounting for other relevant factors: 0.0 Percent Growth Rate Theory 1.0 Percent Growth Rate Theory 4.0 Percent Growth Rate Theory 6.0 Percent Growth Rate Theory
9 9 0.0 Percent Growth Rate Theory Since television is a mature technology we begin with a theory postulating that time will have no impact on television use after accounting for other factors; that is, after accounting for other factors the growth rate of television use will equal 0.0. We shall now apply our two steps to determine if the 0.0 percent growth rate theory lies within the 95 percent confidence interval: Step 1: Analyze the data. Use the ordinary least squares (OLS) estimation procedure to estimate the model s parameters. We shall apply the same model to explain television use that we used previously: Model: LogUsers = β + β + β CapitalHuman t Const t CapHum t + β CapitalPhysical + β GdpPC + β Auth + e CapPhy t GDP t Auth t t We already estimated the parameters of this model in Chapter 13: Ordinary Least Squares (OLS) Dependent Variable: LogUsers Explanatory Variable(s): Estimate SE t-statistic Prob CapitalHuman CapitalPhysical GdpPC Auth Const Number of Observations 74 Estimated Equation: EstLogUsers = CapitalHuman +.00CapitalPhysical +.058GdpPC +.064Auth Table 15.1: Television Regression Results Step : 0.0 Percent Growth Rate Theory. Focus on the effect of time. Is a 0.0 percent growth theory consistent with the data? Does the theory lie within the confidence interval? 0.0 Percent Growth Rate Theory: After accounting for all other explanatory variables, time has no effect on television use; that is, after accounting for all other explanatory variables, the annual growth rate of television use equals 0.0 percent. Accordingly, the actual coefficient of, β, equals.000.
10 10 o Step a: Based on the theory, construct the null and alternative hypotheses. H 0 : β =.000 H 1 : β.000 o Step b: Compute Prob[Results IF H 0 True]. Prob[Results IF H 0 True] = Probability that the coefficient estimate would be at least.03 from.000, if H 0 were true (if the actual coefficient equals, β,.000). OLS estimation If H 0 Standard Number of Number of procedure unbiased true error observations parameters é ã é ã Mean[ b ] = β = 0 SE[ b ] =.0159 DF = 74 6 = 736 We can use the Econometrics Lab to calculate the probability of obtaining the results if the null hypothesis is true. Remember that we are conducting a two-tailed test. Econometrics Lab 15.1: Calculate Prob[Results IF H 0 True]. First, calculate the right hand tail probability. [Link to MIT-Lab 15.1a goes here.] Question: What is the probability that the estimate lies at or above.03? Answer:.074. Student t-distribution Mean =.000 SE =.0159 DF = Figure 15.1: Probability Distribution of Coefficient Estimate 0.0 Percent Growth Rate Theory b T V Ye ar
11 11 Second, calculate the left hand tail probability. [Link to MIT-Lab 15.1b goes here.] Question: What is the probability that the estimate lies at or below.03? Answer:.074. The Prob[Results IF H 0 True] equals the sum of the of the right and left tail two probabilities: Left Tail Right Tail Prob[Results IF H 0 True] = o Step c: Do we reject the null hypothesis? No, we do not reject the null hypothesis at a 5 percent significance level; Prob[Results IF H 0 True] equals.148 which is greater than.05. The theory is consistent with the data; hence,.000 does lie within the 95 percent confidence interval. Let us now apply the procedure to three other theories: 1.0 Percent Growth Rate Theory: After accounting for all other factors, the annual growth rate of television users is 1.0 percent; that is, β equals Percent Growth Rate Theory: After accounting for all other factors, the annual growth rate of television users is 4.0 percent; that is, β equals Percent Growth Rate Theory: After accounting for all other factors, the annual growth rate of television users is 6.0 percent; that is, β equals.060.
12 1 We shall not provide justification for any of these theories. The confidence interval approach does not worry about justifying the theory. The approach is pragmatic; the approach simply asks whether or not the data support the theory. 1.0 Percent Growth Rate Theory Step 1: Analyze the data. Use the ordinary least squares (OLS) estimation procedure to estimate the model s parameters. We have already done this. Step : 1.0 Percent Growth Rate Theory. Is the theory consistent with the data? Does the theory lie within the confidence interval? o Step a: Based on the theory, construct the null and alternative hypotheses. H 0 : β =.010 H 1 : β.010 o Step b: Compute Prob[Results IF H 0 True]. To compute Prob[Results IF H 0 True] we first pose a question: Question: How far is the coefficient estimate,.03, from the value of the coefficient specified by the null hypothesis,.010? Answer:.033. Accordingly, Prob[Results IF H 0 True] = Probability that the coefficient estimate would be at least.033 from.010, if H 0 were true (if the actual coefficient equals, β,.010). OLS estimation If H 0 Standard Number of Number of procedure unbiased true error observations parameters é ã é ã b ] = β =.010 SE[ b ] =.0159 DF = 74 6 = 736 Mean[ We can use the Econometrics Lab to calculate the probability of obtaining the results if the null hypothesis is true. Once again, remember that we are conducting a two-tailed test:
13 13 Econometrics Lab 15.: Calculate Prob[Results IF H 0 True] First, calculate the right hand tail probability. [Link to MIT-Lab 15.a goes here.] Student t-distribution Mean =.010 SE =.0159 DF = b T V Ye ar Figure 15.: Probability Distribution of Coefficient Estimate 1.0 Percent Growth Rate Theory Question: What is the probability that the estimate lies.033 or more above.010, at or above.03? Answer: Second, calculate the left hand tail probability. [Link to MIT-Lab 15.b goes here.] Question: What is the probability that the estimate lies.033 or more below.010, at or below.043? Answer: The Prob[Results IF H 0 True] equals the sum of the of the two probabilities: Left Tail Right Tail Prob[Results IF H 0 True] = o Step c: Do we reject the null hypothesis? Yes, we do reject the null hypothesis at a 5 percent significance level; Prob[Results IF H 0 True] equals.038 which is less than.05.
14 14 The theory is not consistent with the data; hence.010 does not lie within the 95 percent confidence interval. 4.0 Percent Growth Rate Theory Following the same procedure for the 4.0 percent growth rate theory: Prob[Results IF H 0 True].85. We do not reject the null hypothesis at a 5 percent significance level. The theory is consistent with the data; hence.040 does lie within the 95 percent confidence interval. 6.0 Percent Growth Rate Theory Again, following the same procedure for the 6.0 percent growth rate theory: Prob[Results IF H 0 True].00. We do reject the null hypothesis at a 5 percent significance level. The theory is not consistent with the data; hence.060 does not lie within the 95 percent confidence interval.
15 15 Now, let us summarize the four theories: Figure 15.3: Probability Distribution of Coefficient Estimate Comparison of Growth Rate Theories
16 16 Growth Rate Prob[Results Confidence Theory Null and Alternative Hypotheses IF H 0 True] Interval 1% H 0 : β =.010 H 1 : β No 0% H 0 : β =.000 H 1 : β Yes 4% H 0 : β =.040 H 1 : β Yes 6% H 0 : β =.060 H 1 : β No Table 15.: Growth Rate Theories and the 95 Percent Confidence Interval Now, we shall make two observations and pose two questions: The 0.0 percent growth rate theory lies within the confidence interval, but the 1.0 percent theory does not. Question: What is the lowest growth rate theory that is consistent with the data; that is, what is the lower bound of the confidence interval, β? LB The 4.0 percent growth rate theory lies within the confidence interval, but the 6.0 percent theory does not. Question: What is the highest growth rate theory that is consistent with the data; that is, what is the upper bound of the confidence interval, β? UB
17 17 Prob[Results IF H 0 True] % 0.0% Within 95% 4.0% 6.0% Prob[Results IF H 0 True] Confidence Interval Growth Rate Theory β L UB Ye Bar β Y e ar Do Not Reject H 0 Reject H Reject H 0 0 Significance Level = 5% =.05 Figure 15.4: Lower and Upper Confidence Interval Bounds Figure 15.5 answers these questions visually by illustrating the lower and upper bounds. The Prob[Results IF H 0 True] equals.05 for both lower and upper bound growth theories because our calculations are based on a 95 percent confidence interval: The lower bound growth theory postulates a growth rate that is less than that estimated. Hence, the coefficient estimate,.03, marks the right tail border of the lower bound. The upper bound growth theory postulates a growth rate that is greater than that estimated. Hence, the coefficient estimate,.03, marks the left tail border of the upper bound.
18 18 Figure 15.5: Probability Distribution of Coefficient Estimate Lower and Upper Confidence Intervals Econometrics Lab 15.3: Calculating the 95 Percent Confidence Interval. We can use the Econometrics Lab to calculate the lower and upper bounds: LB Calculating the Lower Bound: β For the lower bound, the right tail probability equals.05. [Link to MIT-Lab 15.3a goes here.] The appropriate information is already entered for us: Standard Error:.0159 Value:.03 Degrees of Freedom: 736 Area to Right:.05 Click Calculate. The reported Mean is the lower bound. Mean:.008 β =.008. LB
19 19 Calculating the Upper Bound: UB β For the upper bound, the left tail probability equals.05. Accordingly, the right tail probability will equal.975. [Link to MIT-Lab 15.3b goes here.] The appropriate information is already entered for us: Standard Error:.0159 Value:.03 Degrees of Freedom: 736 Area to Right:.975 Click Calculate. The reported Mean is the upper bound. Mean:.054 β =.054. UB.008 and.054 mark the bounds of the two-tailed 95 percent confidence interval: For any growth rate theory between.8 percent and 5.4 percent: Prob[Results IF H 0 True] >.05 Do not reject H 0 at the 5 percent significance level. For any growth rate theory below.8 percent or above 5.4 percent: Prob[Results IF H 0 True] <.05 Reject H 0 at the 5 percent significance level.
20 0 Calculating Confidence Intervals with Statistical Software Fortunately, statistic software provides us with an easy and convenient way to compute confidence intervals. The software does all the work for us. Getting Started in EViews After running the appropriate regression: In the Equation window: Click View, Coefficient Diagnostics, and Coefficient Intervals. In the Confidence Intervals window: Enter the confidence levels you wish to compute. (By default the values of.90,.95, and.99 are entered.) Click OK. 95 Percent Interval Estimates Dependent Variable: LogUsers Explanatory Variable(s): Estimate Lower Upper CapitalHuman CapitalPhysical GdpPC Auth Const Number of Observations 74 Table 15.3: 95 Percent Confidence Interval Calculations Table 15.3 reports that the lower and upper bounds for the 95 percent confidence interval are.008 and.054. These are the same values that we calculated using the Econometrics Lab.
21 1 Coefficient of Determination (Goodness of Fit), R-Squared (R ) All statistical packages report the coefficient of determination, the R-squared, in their regression printouts. The R-squared seeks to capture the goodness of fit. It equals the portion of the dependent variable s squared deviations from its mean that is explained by the parameter estimates: R Explained Squared Deviations from the Mean = = Actual Squared Deviations from the Mean t= 1 T T ( Esty y) t= 1 t ( y y) To explain how the coefficient of determination is calculated, we shall revisit Professor Lord s first quiz: Student Minutes Studied (x) Quiz Score (y) Table 15.4: First Quiz Data Recall the theory, the model, and our analysis: Theory: An increase in the number of minutes studied results in an increased quiz score. Model: y t = β Const + β x x t + e t x = Minutes studied by student t y = Quiz score earned by student t Theory: β x > 0 We used the ordinary least squares (OLS) estimation procedure to estimate the model s parameters: [Link to MIT-Quiz1.wf1 goes here.] t
22 Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE t-statistic Prob x Const Number of Observations 3 R-squared Estimated Equation: Esty = x Interpretation of Estimates: b Const = 63: Students receive 63 points for showing up. b x = 1.: Students receive 1. additional points for each additional minute studied. Critical Result: The coefficient estimate equals 1.. The positive sign of the coefficient estimate, suggests that additional studying increases quiz scores. This evidence lends support to our theory. Table 15.5: First Quiz Regression Results Next, we formulated the null and alternative hypotheses to determine how much confidence we should have in the theory: H 0 : β x = 0 Studying has no impact on a student s quiz score H 1 : β x > 0 Additional studying increases a student s quiz score We then calculated Prob[Results IF H 0 True], the probability of the results like we obtained (or even stronger) if studying in fact had no impact on quiz scores. The tails probability reported in the regression printout allows us to calculate this easily. Since a one-tailed test is appropriate, we divide the tails probability by :.601 Prob[Results IF H 0 True] =.13 We cannot reject the null hypothesis that studying has no impact even at the 10 percent significance level.
23 3 The regression printout reports that the R-squared equals about.84; this means that 84 percent of the dependent variable s squared deviations from its mean are explained by the parameter estimates. Table 15.6 shows the calculations required to compute the R-squared: Actual y Actual Explained y Explained Deviation Squared Esty Deviation Squared from Mean Deviation Equals from Mean Deviation Student x t y t yt y ( yt y) x Estyt y ( Estyt y) T t= 1 y = 43 t T ( yt y) = 34 t= y = = 81 R -Squared = = Table 15.6: R-Squared Calculations for First Quiz T ( Estyt y) = 88 t= 1 The R-squared equals R T T ( Estyt y) divided by ( yt y) : t= 1 t= 1 ( Estyt y) Explained Squared Deviations from the Mean 88 = = = =.84 Actual Squared Deviations from the Mean 34 ( y y) t= 1 T 84 percent of the y s squared deviations are explained by the estimated constant and coefficient. Our calculation of the R-squared agrees with the regression printout. T t= 1 t
24 4 While the R-Squared is always calculated and reported by all statistical software, it is not useful in assessing theories. We shall justify this claim by considering a second quiz that Professor Lord administered. Each student studies the same number of minutes and earns the same score in the second quiz as he/she did in the first quiz: Student Minutes Studied (x) Quiz Score (y) Table 15.7: Second Quiz Data Before we run another regression that includes the data from both quizzes, let us apply our intuition: Begin by focusing on only the first quiz. Taken in isolation, first quiz suggests that studying improves quiz scores. We cannot be very confident of this, however, since we cannot reject the null hypothesis even at a 10 percent significance level. Next, consider only the second quiz. Since the data from the second quiz is identical to the data from the first quiz, the regression results would be identical. Hence, taken in isolation, the second quiz suggests that studying improves quiz scores. Each quiz in isolation suggests that studying improves quiz scores. Now, consider both quizzes together. The two quizzes taken together reinforce each other; this should make us more confident in concluding that studying improves quiz scores, should it not? If our intuition is correct, how should the Prob[Results IF H 0 True] be affected when we consider both quizzes together? Since we are more confident in concluding that studying improves quiz scores, the probability should be less. Let us run a regression using data from both the first and second quizzes to determine whether or not this is true: [Link to MIT-Quiz1&.wf1 goes here.]
25 5 Ordinary Least Squares (OLS) Dependent Variable: y Explanatory Variable(s): Estimate SE t-statistic Prob x Const Number of Observations 3 R-squared Table 15.8: First and Second Quiz Regression Results Using data from both quizzes:.0099 Prob[Results IF H 0 True] =.005 As a consequence of the second quiz, the probability has fallen from.13 to.005; clearly, our confidence in the theory rises. We can now reject the null hypothesis that studying has no impact at the traditional significance levels of 1, 5, and 10 percent. Our calculations confirm our intuition. Next, consider the R-squared for the last regression that includes both quizzes. The regression printout reports that the R-squared has not changed; the R-squared is still.84. Table 15.9 explains why: Actual y Actual Explained y Explained Deviation Squared Esty Deviation Squared Quiz/ from Mean Deviation Equals from Mean Deviation Student x t y t yt y ( yt y) x Estyt y ( Estyt y) 1/ / / / / / T t= 1 y = 486 t T ( yt y) = 684 t= 1 T ( Estyt y) = 576 t= y = = 81 R -Squared = = Table 15.9: R-Squared Calculations for First and Second Quizzes
26 6 R ( Estyt y) Explained Squared Deviations from the Mean 586 = = = =.84 Actual Squared Deviations from the Mean 684 ( y y) t= 1 T T t= 1 t The R-squared still equals.84. Both the actual and explained squared deviations have doubled; consequently, their ratio, the R-squared, remains unchanged. Clearly, the R-squared does not help us assess our theory. We are now more confident in the theory, but the value of the R-squared has not changed. The bottom line is that if we are interested in assessing our theories we should focus on hypothesis testing, not on the R-squared. Pitfalls Frequently econometrics students using statistical software encounter pitfalls that are frustrating. We shall now discuss several of these pitfalls and describe the warning signs that accompany them. We begin by reviewing the goal of multiple regression analysis: Goal of Multiple Regression Analysis: Multiple regression analysis attempts to sort out the individual effect of each explanatory variable. An explanatory variable s coefficient estimate allows us to estimate the change in the dependent variable resulting from a change in that particular explanatory variable while all other explanatory variables remain constant. We shall consider five common pitfalls that often befell students: Explanatory variable has the same value for all observations. One explanatory variable is a linear combination of other explanatory variables. Dependent variable is a linear combination of explanatory variables. Outlier observations. Dummy variable trap.
27 7 We shall illustrate the first four pitfalls by revisiting our baseball attendance data that reports on every game played in the American League during the summer of 1996 season. Project: Assess the determinants of baseball attendance. Baseball Data: Panel data of baseball statistics for the 588 American League games played during the summer of Attendance t Paid attendance for game t DH t Designator hitter for game t (1 if DH permitted; 0 otherwise) HomeSalary t Player salaries of the home team for game t (millions of dollars) PriceTicket t Average price of tickets sold for game t s home team (dollars) VisitSalary t Player salaries of the visiting team for game t (millions of dollars) [Link to MIT-ALSummer-1996.wf1 goes here.] We begin with a model that we have studied before in which attendance, Attendance, depends on two explanatory variables, ticket price, PriceTicket, and home team salary, HomeSalary: Attendance t = β Const + β Price PriceTicket t + β HomeSalary HomeSalary t + e t Recall the regression results from Chapter 14: Ordinary Least Squares (OLS) Dependent Variable: Attendance Explanatory Variable(s): Estimate SE t-statistic Prob PriceTicket HomeSalary Const Number of Observations 585 Estimated Equation: EstAttendance = 9,46 591PriceTicket + 783HomeSalary Interpretation: b PriceTicket = 591. We estimate that a $1.00 increase in the price of tickets decreases attendance by 591 per game. b HomeSalary = 783. We estimate that a $1 million increase in the home team salary increases attendance by 783 per game. Table 15.10: Baseball Attendance Regression Results
28 8 Explanatory Variable Has the Same Value for All Observations One common pitfall is to include an explanatory variable in a regression that has the same value for each observation. To illustrate this, consider the variable DH: DH t Designator hitter for game t (1 if DH permitted; 0 otherwise) Our baseball data includes only American League games in Since interleague play did not begin until 1997 and all American League games allowed designated hitters, the variable DH t equals 1 for each observation. Let us try to use the ticket price, PriceTicket, home team salary, HomeSalary, and the designated hitter dummy variable, DH, to explain attendance, Attendance: [Link to MIT-ALSummer-1996.wf1 goes here.] The statistical software issues a diagnostic. While the verbiage differs from software package to software package, the message is the same: the software cannot perform the calculations that we requested. That is, the statistical software is telling us that it is being asked to do the impossible. What is the intuition behind this? To determine how a dependent variable is affected by an explanatory variable, we must observe how the dependent variable changes when the explanatory variable changes. The intuition is straightforward: If the dependent variable tends to rise when the explanatory variable rises, the explanatory variable affects the dependent variable positively suggesting a positive coefficient. On the other hand, if the dependent variable tends to fall when the explanatory variable rises, the explanatory variable affects the dependent variable negatively suggesting a negative coefficient. The evidence of how the dependent variable changes when the explanatory variable changes is essential. In the case of our baseball example, there is no variation in the designated hitter explanatory variable, however; the DH t equals 1 for each observation. We have no way to assess the effect that the designated hitter has on attendance. We are asking our statistical software to do the impossible. While we have attendance information when the designated hitter was used, we have no attendance information when the designated hitter was not used. How then can we expect the software to assess the impact of the designed hitter on attendance?
29 9 One Explanatory Variable Is a Linear Combination of Other Explanatory Variables We have already seen one example of this when we discussed multicollinearity in the previous chapter. We included both the ticket price in terms of dollars and the ticket price in terms of cents as explanatory variables. The ticket price in terms of cents was a linear combination of the ticket price in terms of dollars: PriceCents = 100 PriceTicket Let us try to use the ticket price, PriceTicket, home team salary, HomeSalary, and the ticket price in terms of cents, PriceCents, to explain attendance, Attendance: [Link to MIT-ALSummer-1996.wf1 goes here.] When both measures of the price were included in the regression our statistical software will issue a diagnostic indicating that it is being asked to do the impossible. Statistical software cannot separate out the individual influence of the two explanatory variables, PriceTicket and PriceCents, because they contain precisely the same information; the two explanatory variables are redundant. We are asking the software to do the impossible. In fact, any linear combination of explanatory variables produces this problem. To illustrate this, we consider two regressions. The first specifies three explanatory variables: ticket price, home team salary, and visiting team salary. [Link to MIT-ALSummer-1996.wf1 goes here.] Ordinary Least Squares (OLS) Dependent Variable: Attendance Explanatory Variable(s): Estimate SE t-statistic Prob PriceTicket HomeSalary VisitSalary Const Number of Observations 585 Estimated Equation: EstAttendance = 3,59 587PriceTicket + 791HomeSalary + 163VisitSalary Table 15.11: Baseball Attendance Now, generate a new variable, TotalSalary: TotalSalary = HomeSalary + VisitSalary TotalSalary is a linear combination of HomeSalary and VisitSalary. Let us try to use the ticket price, PriceTicket, home team salary, HomeSalary, and visiting
30 30 team salary, VisitSalary, and total salary, TotalSalary, to explain attendance, Attendance: [Link to MIT-ALSummer-1996.wf1 goes here.] Our statistical software will issue a diagnostic indicating that it is being asked to do the impossible. The information contained in TotalSalary is already included in HomeSalary and VisitSalary. Statistical software cannot separate out the individual influence of the three explanatory variables because they contain redundant information. We are asking the software to do the impossible. Dependent Variable Is a Linear Combination of Explanatory Variables Suppose that the dependent variable is a linear combination of the explanatory variables. The following regression illustrates this scenario. TotalSalary is by definition the sum of HomeSalary and VisitSalary. Total salary, TotalSalary, is the dependent variable; home team salary, HomeSalary, and visiting team salary, VisitSalary, are the explanatory variables: [Link to MIT-ALSummer-1996.wf1 goes here.] Ordinary Least Squares (OLS) Dependent Variable: Attendance Explanatory Variable(s): Estimate SE t-statistic Prob HomeSalary E E VisitSalary E E Const E Number of Observations 588 Estimated Equation: EstTotalSalary = 1.000HomeSalary VisitSalary Table 15.1: Baseball Attendance The estimates of the constant and coefficients reveal the definition of TotalSalary: TotalSalary = HomeSalary + VisitSalary Furthermore, the standard errors are very small, approximately 0. In fact, they are precisely equal to 0, but they are not reported as 0 s as a consequence of how digital computers process numbers. We can think of these very small standard errors as telling us that we are dealing with an identity here, something that is true by definition.
31 31 Outlier Observations We should be aware of the possibility of outliers because the ordinary least squares (OLS) estimation procedure is very sensitive to them. An outlier can occur for many reasons. One observation could have a unique characteristic or one observation could include a mundane typo. To illustrate the effect that an outlier may have, once again consider the games played in the summer of the 1996 American League season. [Link to MIT-ALSummer-1996.wf1 goes here.] The first observation reports the game played in Milwaukee on June 1, 1996: the Cleveland Indians visited the Milwaukee Brewers. The salary for the Brewers totaled 0.3 million dollars in 1996: Home Visiting Home Team Observation Month Day Team Team Salary Milwaukee Cleveland Oakland New York Seattle Boston Toronto Kansas City Texas Minnesota Review the following regression: Ordinary Least Squares (OLS) Dependent Variable: Attendance Explanatory Variable(s): Estimate SE t-statistic Prob PriceTicket HomeSalary Const Number of Observations 585 Estimated Equation: EstAttendance = 9,46 591PriceTicket + 783HomeSalary Table 15.13: Baseball Attendance Regression with Correct Data Now, suppose that a mistake was made in entering the Milwaukee s player salary for the first observation; suppose that the decimal point was misplaced and that was entered instead of All the other values were entered correctly. You can access the data including this outlier: [Link to MIT-ALSummerOutlier-1996.wf1 goes here.]
32 3 Home Visiting Home Team Observation Month Day Team Team Salary Milwaukee Cleveland Oakland New York Seattle Boston Toronto Kansas City Texas Minnesota Ordinary Least Squares (OLS) Dependent Variable: Attendance Explanatory Variable(s): Estimate SE t-statistic Prob PriceTicket HomeSalary Const Number of Observations 585 Estimated Equation: EstAttendance = 9,46 591PriceTicket + 783HomeSalary Table 15.14: Baseball Attendance Regression with an Outlier Even though only a single value has been altered, the estimates of both coefficients changes dramatically. The estimate of the ticket price coefficient changes from about 591 to 1,896 and the estimate of the home salary coefficient changes from to.088. This illustrates how sensitive the ordinary least squares (OLS) estimation procedure can be to an outlier. Consequently, we must take care to enter data properly and to check to be certain that we have generated any new variables correctly.
33 33 Dummy Variable Trap To illustrate the dummy variable trap, we shall revisit our faculty salary data: Project: Assess the possibility of discrimination in academia. Faculty Salary Data: Artificially constructed cross section salary data and characteristics for 00 faculty members. Salary t Salary of faculty member t (dollars) Experience t s of teaching experience for faculty member t Articles t Number of articles published by faculty member t SexM1 t 1 if faculty member t is male; 0 if female We shall investigate models that include only dummy variables and years of teaching experience. More specifically, we shall consider four cases: Dependent Explanatory Model Variable Variables Constant 1 Salary SexF1 and Experience Yes Salary SexM1 and Experience Yes 3 Salary SexF1, SexM1, and Experience No 4 Salary SexF1, SexM1, and Experience Yes We begin by generating the variable SexF1 as we did in Chapter 13: SexF1 = 1 SexM1 Now, we shall estimate the parameters of the four models. First, Model 1. Model 1: Salary t = β Const + β SexF1 SexF1 t + β E Experience t + e t [Link to MIT-FacultySalaries.wf1 goes here.] Ordinary Least Squares (OLS) Dependent Variable: Salary Explanatory Variable(s): Estimate SE t-statistic Prob SexF Experience Const Number of Observations 00 Estimated Equation: EstSalary = 4,38,40SexF1 +,447Experience Table 15.15: Faculty Salary Regression
34 34 Now, calculate the estimated salary equation for men and women: For men, SexF1 = 0: EstSalary = 4,38,40SexF1 +,447Experience EstSalary Men = 4,38 0 +,447Experience = 4,38 +,447Experience The intercept for men equals $4,38; the slope equals,447. For women, SexF1 = 1: EstSalary = 4,38,40SexF1 +,447Experience EstSalary Women = 4,38,40 +,447Experience = 39,988 +,447Experience It is easy to plot the estimated salary equations for men and women. EstSalary EstSalary Men = 4,38 +,447Experience 4,38,40 Slope =,447 EstSalary Women = 39,998 +,447Experience 39,998 Experience Figure 15.6: Estimated Salaries Equations for Men and Women Both plotted lines have the same slope,,447. The intercepts differ, however. The intercept for men is 4,38 while the intercept for women is 39,998:
35 35 Model : Salary t = β Const + β SexM1 SexM1 t + β E Experience t + e t EstSalary = b Const + b SexM1 SexM1 + b E Experience Let us attempt to calculate the second model s estimated constant and the estimated male sex dummy coefficient, b Const and b SexM1, using the intercepts from Model 1. For men For women SexM1 = 1 SexM1 = 0 EstSalary Men = b Const + b SexM1 + b E Experience EstSalary Women = b Const + b E Experience Intercept Men = b Const + b SexM1 Intercept Women = b Const 4,38 = b Const + b SexM1 39,998 = b Const We now have two equations: 4,38 = b Const + b SexM1 39,998 = b Const and two unknowns, b Const and b SexM1. It is easy to solve for the unknowns. The second equation tells us that b Const equals 39,998: b Const = 39,998 Next, focus on the first equation: 4,38 = b Const + b SexM1 Substituting for b Const 4,38 = 39,998 + b SexM1 Solving for b SexM1 : b SexM1 = 4,38 39,998 =,40 Using the estimates from Model 1, we compute that the estimate for Model s estimate for the constant should be 39,998 and the estimate for the male sex dummy coefficient should be,40. Let us now run the regression: Ordinary Least Squares (OLS) Dependent Variable: Salary Explanatory Variable(s): Estimate SE t-statistic Prob SexM Experience Const Number of Observations 00 Estimated Equation: EstSalary = 39,998 +,40SexM1 +,447Experience Table 15.16: Faculty Salary Regression The regression confirms our calculations.
36 36 Model 3: Salary t = β SexF1 SexF1 t + β SexM1 SexM1 t + β E Experience t + e t EstSalary = b SexF1 SexF1 + b SexM1 SexM1 + b E Experience Again, let us attempt to calculate the third model s estimated female sex dummy coefficient and its male sex dummy coefficient, b SexF1 and b SexM1, using the intercepts from Model 1. For men For women SexF1 = 0 and SexM1 = 1 SexF1 = 1 and SexM1 = 0 EstSalary Men = b SexM1 + b E Experience EstSalary Women = b SexF1 + b E Experience Intercept Men = b SexM1 Intercept Women = b SexF1 4,38 = b SexM1 39,998 = b SexF1 We now have two equations: 4,38 = b SexM1 39,998 = b SexF1 and two unknowns, b SexF1 and b SexM1. Using the estimates from Model 1, we compute that the estimate for Model 3 s estimate for the male sex dummy coefficient should be 4,38 and the estimate for the female sex dummy coefficient should be 39,998. Let us now run the regression: Getting Started in EViews To estimate the third model (part c) using EViews, you must fool EViews into running the appropriate regression: In the Workfile window: highlight Salary and then while depressing <Ctrl> highlight SexF1, SexM1, and Experience. In the Workfile window: double click on a highlighted variable. Click Open Equation. In the Equation Specification window delete c so that the window looks like this: salary sexf1 sexm1 experience. Click OK.
37 37 Ordinary Least Squares (OLS) Dependent Variable: Salary Explanatory Variable(s): Estimate SE t-statistic Prob SexF SexM Experience Number of Observations 00 Estimated Equation: EstSalary = 39,998 SexM1 + 4,38SexM1 +,447Experience Table 15.17: Faculty Salary Regression Again, the regression results confirm our calculations. Model 4: Salary t = β Const + β SexF1 SexF1 t + β SexM1 SexM1 t + β E Experience + e t EstSalary = b Const + b SexF1 SexF1 + b SexM1 SexM1 + b E Experience Question: Can we calculate the fourth model s b Const, b SexF1, and b SexM1 using Model 1 s intercepts? For men For women SexF1 = 0 and SexM1 = 1 SexF1 = 1 and SexM1 = 0 EstSalary EstSalary Men = b Const + b SexM1 + b E Experience Women = b Const + b SexF1 + b E Experience Intercept Men = b Const + b SexM1 Intercept Women = b Const + b SexF1 4,38 = b Const + b SexM1 39,998 = b Const + b SexF1 We now have two equations: 4,38 = b Const + b SexM1 39,998 = b Const + b SexF1 and three unknowns, b Const, b SexF1, and b SexM1. We have more unknowns than equations. We cannot solve for the three unknowns. It is impossible. This is called a dummy variable trap: Dummy Variable Trap: A model in which there are more parameters representing the intercepts than there are intercepts. There are three parameters, b Const, b SexF1, and b SexM1, estimating the two intercepts.
38 38 Now, let us try to run the regression: [Link to MIT-FacultySalaries.wf1 goes here.] Our statistical software will issue a diagnostic telling us that it is being asked to do the impossible. In some sense, the software is being asked to solve for three unknowns with only two equations.
Chapter 13: Dummy and Interaction Variables
Chapter 13: Dummy and eraction Variables Chapter 13 Outline Preliminary Mathematics: Averages and Regressions Including Only a Constant An Example: Discrimination in Academia o Average Salaries o Dummy
More informationChapter 14: Omitted Explanatory Variables, Multicollinearity, and Irrelevant Explanatory Variables
Chapter 14: Omitted Explanatory Variables, Multicollinearity, and Irrelevant Explanatory Variables Chapter 14 Outline Review o Unbiased Estimation Procedures Estimates and Random Variables Mean of the
More informationChapter 5: Ordinary Least Squares Estimation Procedure The Mechanics Chapter 5 Outline Best Fitting Line Clint s Assignment Simple Regression Model o
Chapter 5: Ordinary Least Squares Estimation Procedure The Mechanics Chapter 5 Outline Best Fitting Line Clint s Assignment Simple Regression Model o Parameters of the Model o Error Term and Random Influences
More information[Mean[e j ] Mean[e i ]]
Amherst College Department of Economics Economics 360 Fall 202 Solutions: Wednesday, September 26. Assume that the standard ordinary least square (OLS) premises are met. Let (x i, y i ) and (, y j ) be
More informationChapter 12: Model Specification and Development
Chapter 12: Model Specification and Development Chapter 12 Outline Model Specification: Ramsey REgression Specification Error Test (RESET) o RESET Logic o Linear Demand Model o Constant Elasticity Demand
More informationSolutions: Monday, October 15
Amherst College Department of Economics Economics 360 Fall 2012 1. Consider Nebraska petroleum consumption. Solutions: Monday, October 15 Petroleum Consumption Data for Nebraska: Annual time series data
More informationChapter 8 Handout: Interval Estimates and Hypothesis Testing
Chapter 8 Handout: Interval Estimates and Hypothesis esting Preview Clint s Assignment: aking Stock General Properties of the Ordinary Least Squares (OLS) Estimation Procedure Estimate Reliability: Interval
More informationChapter 10: Multiple Regression Analysis Introduction
Chapter 10: Multiple Regression Analysis Introduction Chapter 10 Outline Simple versus Multiple Regression Analysis Goal of Multiple Regression Analysis A One-Tailed Test: Downward Sloping Demand Theory
More informationMonday, October 15 Handout: Multiple Regression Analysis Introduction
Amherst College Department of Economics Economics 360 Fall 2012 Monday, October 15 Handout: Multiple Regression Analysis Introduction Review Simple and Multiple Regression Analysis o Distinction between
More informationChapter 11 Handout: Hypothesis Testing and the Wald Test
Chapter 11 Handout: Hypothesis Testing and the Wald Test Preview No Money Illusion Theory: Calculating True] o Clever Algebraic Manipulation o Wald Test Restricted Regression Reflects Unrestricted Regression
More informationWednesday, September 19 Handout: Ordinary Least Squares Estimation Procedure The Mechanics
Amherst College Department of Economics Economics Fall 2012 Wednesday, September 19 Handout: Ordinary Least Squares Estimation Procedure he Mechanics Preview Best Fitting Line: Income and Savings Clint
More informationWednesday, October 10 Handout: One-Tailed Tests, Two-Tailed Tests, and Logarithms
Amherst College Department of Economics Economics 360 Fall 2012 Wednesday, October 10 Handout: One-Tailed Tests, Two-Tailed Tests, and Logarithms Preview A One-Tailed Hypothesis Test: The Downward Sloping
More informationWednesday, October 17 Handout: Hypothesis Testing and the Wald Test
Amherst College Department of Economics Economics 360 Fall 2012 Wednesday, October 17 Handout: Hypothesis Testing and the Wald Test Preview No Money Illusion Theory: Calculating True] o Clever Algebraic
More informationIn order to carry out a study on employees wages, a company collects information from its 500 employees 1 as follows:
INTRODUCTORY ECONOMETRICS Dpt of Econometrics & Statistics (EA3) University of the Basque Country UPV/EHU OCW Self Evaluation answers Time: 21/2 hours SURNAME: NAME: ID#: Specific competences to be evaluated
More informationAn Introduction to Econometrics. A Self-contained Approach. Frank Westhoff. The MIT Press Cambridge, Massachusetts London, England
An Introduction to Econometrics A Self-contained Approach Frank Westhoff The MIT Press Cambridge, Massachusetts London, England How to Use This Book xvii 1 Descriptive Statistics 1 Chapter 1 Prep Questions
More informationq3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide an appropriate response. 1) In 2007, the number of wins had a mean of 81.79 with a standard
More informationMaking sense of Econometrics: Basics
Making sense of Econometrics: Basics Lecture 4: Qualitative influences and Heteroskedasticity Egypt Scholars Economic Society November 1, 2014 Assignment & feedback enter classroom at http://b.socrative.com/login/student/
More informationWISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A
WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This
More informationCorrelation (pp. 1 of 6)
Correlation (pp. 1 of 6) Car dealers want to know how mileage affects price on used Corvettes. Biologists are studying the effects of temperature on cricket chirps. Farmers are trying to determine if there
More informationEconometrics -- Final Exam (Sample)
Econometrics -- Final Exam (Sample) 1) The sample regression line estimated by OLS A) has an intercept that is equal to zero. B) is the same as the population regression line. C) cannot have negative and
More informationChapter 13. Multiple Regression and Model Building
Chapter 13 Multiple Regression and Model Building Multiple Regression Models The General Multiple Regression Model y x x x 0 1 1 2 2... k k y is the dependent variable x, x,..., x 1 2 k the model are the
More informationSection 3: Simple Linear Regression
Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationChapter 3 Multiple Regression Complete Example
Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be
More informationMarketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)
Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12) Remember: Z.05 = 1.645, Z.01 = 2.33 We will only cover one-sided hypothesis testing (cases 12.3, 12.4.2, 12.5.2,
More informationappstats27.notebook April 06, 2017
Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves
More informationWISE International Masters
WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are
More informationLectures 5 & 6: Hypothesis Testing
Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across
More informationChapter 27 Summary Inferences for Regression
Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test
More informationWooldridge, Introductory Econometrics, 4th ed. Chapter 6: Multiple regression analysis: Further issues
Wooldridge, Introductory Econometrics, 4th ed. Chapter 6: Multiple regression analysis: Further issues What effects will the scale of the X and y variables have upon multiple regression? The coefficients
More informationWISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A
WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This
More informationSampling Distributions: Central Limit Theorem
Review for Exam 2 Sampling Distributions: Central Limit Theorem Conceptually, we can break up the theorem into three parts: 1. The mean (µ M ) of a population of sample means (M) is equal to the mean (µ)
More informationLECTURE 15: SIMPLE LINEAR REGRESSION I
David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).
More informationHint: The following equation converts Celsius to Fahrenheit: F = C where C = degrees Celsius F = degrees Fahrenheit
Amherst College Department of Economics Economics 360 Fall 2014 Exam 1: Solutions 1. (10 points) The following table in reports the summary statistics for high and low temperatures in Key West, FL from
More informationOne-Way ANOVA. Some examples of when ANOVA would be appropriate include:
One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement
More informationOrdinary Least Squares Regression Explained: Vartanian
Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent
More informationInference with Simple Regression
1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems
More informationMultiple Regression Analysis
Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators
More information28. SIMPLE LINEAR REGRESSION III
28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of
More informationChapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a
Chapter 9 Regression with a Binary Dependent Variable Multiple Choice ) The binary dependent variable model is an example of a a. regression model, which has as a regressor, among others, a binary variable.
More informationLecture 5: Omitted Variables, Dummy Variables and Multicollinearity
Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity R.G. Pierse 1 Omitted Variables Suppose that the true model is Y i β 1 + β X i + β 3 X 3i + u i, i 1,, n (1.1) where β 3 0 but that the
More informationwhere Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.
Notes on regression analysis 1. Basics in regression analysis key concepts (actual implementation is more complicated) A. Collect data B. Plot data on graph, draw a line through the middle of the scatter
More informationAmherst College Department of Economics Economics 360 Fall 2015 Monday, December 7 Problem Set Solutions
Amherst College epartment of Economics Economics 3 Fall 2015 Monday, ecember 7 roblem et olutions 1. Consider the following linear model: tate residential Election ata: Cross section data for the fifty
More informationLI EAR REGRESSIO A D CORRELATIO
CHAPTER 6 LI EAR REGRESSIO A D CORRELATIO Page Contents 6.1 Introduction 10 6. Curve Fitting 10 6.3 Fitting a Simple Linear Regression Line 103 6.4 Linear Correlation Analysis 107 6.5 Spearman s Rank Correlation
More informationInference in Regression Analysis
ECNS 561 Inference Inference in Regression Analysis Up to this point 1.) OLS is unbiased 2.) OLS is BLUE (best linear unbiased estimator i.e., the variance is smallest among linear unbiased estimators)
More information1 A Non-technical Introduction to Regression
1 A Non-technical Introduction to Regression Chapters 1 and Chapter 2 of the textbook are reviews of material you should know from your previous study (e.g. in your second year course). They cover, in
More informationContest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.
Updated: November 17, 2011 Lecturer: Thilo Klein Contact: tk375@cam.ac.uk Contest Quiz 3 Question Sheet In this quiz we will review concepts of linear regression covered in lecture 2. NOTE: Please round
More informationSimple Linear Regression
CHAPTER 13 Simple Linear Regression CHAPTER OUTLINE 13.1 Simple Linear Regression Analysis 13.2 Using Excel s built-in Regression tool 13.3 Linear Correlation 13.4 Hypothesis Tests about the Linear Correlation
More informationMATH 1150 Chapter 2 Notation and Terminology
MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the
More informationRegression Models. Chapter 4. Introduction. Introduction. Introduction
Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager
More informationMonday, November 26: Explanatory Variable Explanatory Premise, Bias, and Large Sample Properties
Amherst College Department of Economics Economics 360 Fall 2012 Monday, November 26: Explanatory Variable Explanatory Premise, Bias, and Large Sample Properties Chapter 18 Outline Review o Regression Model
More informationUniversity of Maryland Spring Economics 422 Final Examination
Department of Economics John C. Chao University of Maryland Spring 2009 Economics 422 Final Examination This exam contains 4 regular questions and 1 bonus question. The total number of points for the regular
More informationEssential Statistics. Gould Ryan Wong
Global Global Essential Statistics Eploring the World through Data For these Global Editions, the editorial team at Pearson has collaborated with educators across the world to address a wide range of subjects
More informationWISE International Masters
WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are
More informationLooking Ahead to Chapter 10
Looking Ahead to Chapter Focus In Chapter, you will learn about polynomials, including how to add, subtract, multiply, and divide polynomials. You will also learn about polynomial and rational functions.
More informationChapter 20 Comparing Groups
Chapter 20 Comparing Groups Comparing Proportions Example Researchers want to test the effect of a new anti-anxiety medication. In clinical testing, 64 of 200 people taking the medicine reported symptoms
More informationPsych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions
Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions Part 1: Conceptual ideas about correlation and regression Tintle 10.1.1 The association would be negative (as distance increases,
More information1 Correlation and Inference from Regression
1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is
More informationECON 497 Midterm Spring
ECON 497 Midterm Spring 2009 1 ECON 497: Economic Research and Forecasting Name: Spring 2009 Bellas Midterm You have three hours and twenty minutes to complete this exam. Answer all questions and explain
More informationANOVA - analysis of variance - used to compare the means of several populations.
12.1 One-Way Analysis of Variance ANOVA - analysis of variance - used to compare the means of several populations. Assumptions for One-Way ANOVA: 1. Independent samples are taken using a randomized design.
More informationECON 4230 Intermediate Econometric Theory Exam
ECON 4230 Intermediate Econometric Theory Exam Multiple Choice (20 pts). Circle the best answer. 1. The Classical assumption of mean zero errors is satisfied if the regression model a) is linear in the
More informationChapter 4: Regression Models
Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationSolutions to Exercises in Chapter 9
in 9. (a) When a GPA is increased by one unit, and other variables are held constant, average starting salary will increase by the amount $643. Students who take econometrics will have a starting salary
More information2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0
Introduction to Econometrics Midterm April 26, 2011 Name Student ID MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. (5,000 credit for each correct
More informationSESSION 5 Descriptive Statistics
SESSION 5 Descriptive Statistics Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
More informationEconometrics Review questions for exam
Econometrics Review questions for exam Nathaniel Higgins nhiggins@jhu.edu, 1. Suppose you have a model: y = β 0 x 1 + u You propose the model above and then estimate the model using OLS to obtain: ŷ =
More informationLECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity
LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists
More informationCHAPTER 6: SPECIFICATION VARIABLES
Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero
More informationRegression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur
Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur Lecture 10 Software Implementation in Simple Linear Regression Model using
More informationChapter 4. Regression Models. Learning Objectives
Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing
More informationApplied Quantitative Methods II
Applied Quantitative Methods II Lecture 4: OLS and Statistics revision Klára Kaĺıšková Klára Kaĺıšková AQM II - Lecture 4 VŠE, SS 2016/17 1 / 68 Outline 1 Econometric analysis Properties of an estimator
More informationChapter 14 Student Lecture Notes 14-1
Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this
More informationCh 7: Dummy (binary, indicator) variables
Ch 7: Dummy (binary, indicator) variables :Examples Dummy variable are used to indicate the presence or absence of a characteristic. For example, define female i 1 if obs i is female 0 otherwise or male
More informationMultiple Regression Analysis
Chapter 4 Multiple Regression Analysis The simple linear regression covered in Chapter 2 can be generalized to include more than one variable. Multiple regression analysis is an extension of the simple
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationChapter 9. Dummy (Binary) Variables. 9.1 Introduction The multiple regression model (9.1.1) Assumption MR1 is
Chapter 9 Dummy (Binary) Variables 9.1 Introduction The multiple regression model y = β+β x +β x + +β x + e (9.1.1) t 1 2 t2 3 t3 K tk t Assumption MR1 is 1. yt =β 1+β 2xt2 + L+β KxtK + et, t = 1, K, T
More informationPOL 681 Lecture Notes: Statistical Interactions
POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship
More informationSampling, Frequency Distributions, and Graphs (12.1)
1 Sampling, Frequency Distributions, and Graphs (1.1) Design: Plan how to obtain the data. What are typical Statistical Methods? Collect the data, which is then subjected to statistical analysis, which
More informationRegression Analysis IV... More MLR and Model Building
Regression Analysis IV... More MLR and Model Building This session finishes up presenting the formal methods of inference based on the MLR model and then begins discussion of "model building" (use of regression
More informationChapter 23: Inferences About Means
Chapter 3: Inferences About Means Sample of Means: number of observations in one sample the population mean (theoretical mean) sample mean (observed mean) is the theoretical standard deviation of the population
More informationPractice Questions for Exam 1
Practice Questions for Exam 1 1. A used car lot evaluates their cars on a number of features as they arrive in the lot in order to determine their worth. Among the features looked at are miles per gallon
More informationCh 13 & 14 - Regression Analysis
Ch 3 & 4 - Regression Analysis Simple Regression Model I. Multiple Choice:. A simple regression is a regression model that contains a. only one independent variable b. only one dependent variable c. more
More informationPsych 230. Psychological Measurement and Statistics
Psych 230 Psychological Measurement and Statistics Pedro Wolf December 9, 2009 This Time. Non-Parametric statistics Chi-Square test One-way Two-way Statistical Testing 1. Decide which test to use 2. State
More informationDo not copy, post, or distribute. Independent-Samples t Test and Mann- C h a p t e r 13
C h a p t e r 13 Independent-Samples t Test and Mann- Whitney U Test 13.1 Introduction and Objectives This chapter continues the theme of hypothesis testing as an inferential statistical procedure. In
More informationBlack White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126
Psychology 60 Fall 2013 Practice Final Actual Exam: This Wednesday. Good luck! Name: To view the solutions, check the link at the end of the document. This practice final should supplement your studying;
More informationOrdinary Least Squares Regression Explained: Vartanian
Ordinary Least Squares Regression Eplained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent
More informationECONOMETRIC MODEL WITH QUALITATIVE VARIABLES
ECONOMETRIC MODEL WITH QUALITATIVE VARIABLES How to quantify qualitative variables to quantitative variables? Why do we need to do this? Econometric model needs quantitative variables to estimate its parameters
More informationECO375 Tutorial 4 Wooldridge: Chapter 6 and 7
ECO375 Tutorial 4 Wooldridge: Chapter 6 and 7 Matt Tudball University of Toronto St. George October 6, 2017 Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 1 / 36 ECO375 Tutorial 4 Welcome
More information11.5 Regression Linear Relationships
Contents 11.5 Regression............................. 835 11.5.1 Linear Relationships................... 835 11.5.2 The Least Squares Regression Line........... 837 11.5.3 Using the Regression Line................
More informationCh. 16: Correlation and Regression
Ch. 1: Correlation and Regression With the shift to correlational analyses, we change the very nature of the question we are asking of our data. Heretofore, we were asking if a difference was likely to
More informationInferential statistics
Inferential statistics Inference involves making a Generalization about a larger group of individuals on the basis of a subset or sample. Ahmed-Refat-ZU Null and alternative hypotheses In hypotheses testing,
More informationMidterm 2 - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis February 24, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put
More informationOutline. Lesson 3: Linear Functions. Objectives:
Lesson 3: Linear Functions Objectives: Outline I can determine the dependent and independent variables in a linear function. I can read and interpret characteristics of linear functions including x- and
More informationtheir contents. If the sample mean is 15.2 oz. and the sample standard deviation is 0.50 oz., find the 95% confidence interval of the true mean.
Math 1342 Exam 3-Review Chapters 7-9 HCCS **************************************************************************************** Name Date **********************************************************************************************
More informationAnnouncements. J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 8, / 45
Announcements Solutions to Problem Set 3 are posted Problem Set 4 is posted, It will be graded and is due a week from Friday You already know everything you need to work on Problem Set 4 Professor Miller
More informationModule 7 Practice problem and Homework answers
Module 7 Practice problem and Homework answers Practice problem, page 1 Is the research hypothesis one-tailed or two-tailed? Answer: one tailed In the set up for the problem, we predicted a specific outcome
More informationAnswer all questions from part I. Answer two question from part II.a, and one question from part II.b.
B203: Quantitative Methods Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. Part I: Compulsory Questions. Answer all questions. Each question carries
More informationTrendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues
Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More information