Linear Regression Problems

Size: px
Start display at page:

Download "Linear Regression Problems"

Transcription

1 Linear Regression Problems Q.1. A multiple regression model is fit relating a response Y to 4 predictors: X1, X, X3, X4. We fit models (each based on a sample of n=0 cases): i) E[ Y] X X 0 1X1 X 3X 3 4 X 4 ii) E[ Y] For model i), the error sum of squares is 50. for model it is 300. Test H0: at the =0.05 significance level. Q.. A multiple regression model is fit relating a response Y to 6 predictors: X 1, X, X 3, X 4, X 5, X 6. We fit models (each based on a sample of n=34 cases): i) E[ Y] X X X 0 1X1 X 3X 3 4 X 4 5 X 5 6 X 6 ii) E[ Y] For model i), the error sum of squares is For model it is Test H 0: at the =0.05 significance level. Q.3. A regression model is fit, relating breaking strength of a fiber (Y) to the amount of an additive applied to it (X 1), the amount of time it is heated (X ), and the temperature (X 3) at which it is heated. The fitted equation and coefficient of multiple determination are given below (n=4): ^ Y X X 0.010X 3 R 0.75 Give the predicted breaking strength of a fiber with X 1=10 units of additive, heated for X =15 minutes, at X 3=300 degrees Test H 0: at the =0.05 significance level. Q.4. A linear regression model is fit, relating breaking strength of steel bars to thickness, length, and material type with 3 nominal levels (no interaction terms or polynomial terms are included in the model). The model is fit based on 50 experimental units and includes an intercept term. DF(Regression) DF(Error) DF(Total) Q.5. For any regression model, Adjusted-R will be larger than R. Q.6. If we continue adding new predictors to a regression model, the error sum of squares will never increase, but the error mean square may increase. Q.7. It is not possible to get a negative F-statistic when (properly) conducting a Complete versus Reduced F- test, when we are testing to determine whether one set of predictors is not associated with Y, after controlling for another set of predictors.

2 Q.8. A researcher states that her regression model explains 75% of the variation in her dependent variable. This means that SSE/TSS = 0.5, where SSE is the Error sum of squares and TSS is the Total sum of squares Q.9. A study was conducted to determine whether company size (X1=#Employees) and presence/absence of an active safety program (X=1 if yes, 0 if no) were related with the lost work hours by employees in a 1-year period (Y). The sample consisted of 40 firms, 0 used the safety program, 0 did not. The model fit is: Y = X1 + X + p.9.a. Complete the following tables ( and Coefficients) for the multiple regression: Source df SS MS F(obs) F(0.05) Regression Error #N/A #N/A Total #N/A #N/A #N/A Estimates Predictor Coefficien SE(Coef) t(obs) t(.05) Intercept #N/A #N/A X X p.9.b. Controlling for company size, firms with the safety program are estimated on average to have more / less lost work hours in 1 year than firms without the safety program. p.9.c. What proportion of variation in lost work hours is explained by the variables X1 and X p.9.d. Give the predicted number of lost work hours in 1 year for a firm with 10,000 workers and the safety program. p.9.e. Based on Backward elimination with Significance Level to Stay (SLS) of 0.05, would you drop either X1 or X from the model?

3 Q.10. A study considered failure times of tools (Y, in minutes) for n=4 tools. Variables used to predict Y were: cutting Speed (X1 feet/minute), feed rate (X), and Depth (X3). The following models were fit (TSS=3618): Model 1: E(Y) = X1 + X + X3 SSE1 = 874 Model : E(Y) = X1 + X + X3 + X1*X + X1*X3 + X*X3 + X1*X*X3 SSE = 483 p.10.a. Treating Model as the Complete Model with all potential predictors, compute Cp for Model 1 p.10.b. Test whether all of the interaction terms can be removed from the model, controlling for all main effects. That is, test H0: at significance level: p.10.b.i. Test Statistic: p.10.b.ii. Reject H0 if the test statistic falls in the range: p.10.b.iii. Based on this test, are we justified in dropping all interaction terms from the model? Yes / No Q.11. A regression model is fit with p=6 predictors (and an intercept), based on n=0 experimental units. How large does R for the model need to be to reject H0: at significance level? Q.1. A regression model is fit relating Peak Power Load (Y, in megawatts) to daily high temperature (X, in Degrees F) for a sample of n=10 days. The analyst believes that Peak Power Load will increase with temperature, and that the RATE of change will increase as temperature increases. The model to be fit is (along with estimates and standard errors): Model 1: E(Y) = X + X Model : E(Y) = W + W (where W is centered value of X, that is: W = X-91.5) Coeff StdErr t Stat P-value Coeff StdErr t Stat P-value Intercept Intercept X W X^ W^ p.1.a. Test H0: vs HA: based on Model 1 with = 0.05: p.1.a.i. Test Statistic: p.15.a.ii. Reject H0 if test stat falls in range: p.1.b. Obtain a 95% Confidence Interval for 1 for Model 1. Does it contain 0? p.1.c. Obtain a 95% Confidence Interval for 1 for Model. Does it contain 0?

4 p.1.d. The correlation between X and X for Model 1 is and between W and W for Model is Obtain the Variance Inflation Factors for Models 1 and (Note that since there are only predictors for each model, there will only be one VIF for each model). Model 1: VIFX = VIFX = Model : VIFW = VIFW = p.1.e. Obtain predicted Peak Power Load when X=90 for Model 1, and when W = = -1.5 for Model : Model 1: Model : Q.13. A simple linear regression was fit relating number of species of arctic flora observed (Y) and July mean temperature (X, in Celsius). The results of the regression model, based on n=19 temperature stations is given below. df SS MS F ignificance F Regression Residual Total Coefficientsandard Erro t Stat P-value Lower 95%Upper 95% Intercept JulyTemp SS_XX X-bar p.13.a. What proportion of the variation in number of species is explained by mean July temperature? p.13.b. Compute a 95% Confidence Interval for the population mean number of species, with mean July temperature of 6 degrees. p.13.c. Compute a 95% Prediction Interval for the number of species, at a single station with mean July temperature of 6 degrees. Q.14. An experiment was conducted, relating the penetration depth of missiles (Y) to its impact factor (X). The results from the regression, and the residual versus fitted plot are given below (n=5). df SS Regression Residual Total Coefficientsandard Erro -0. Intercept impact Residuals vs Fitted Values Residuals p.14.a. Test H 0: 1 = 0 (Penetration depth is not associated with impact factor) based on the t-test. Test Statistic: Rejection Region:

5 p.14.b. Test H 0: 1 = 0 (Penetration depth is not associated with impact factor) based on the F-test. Test Statistic: Rejection Region: p.14.c. The residual plot appears to display non-constant error variance. A regression of the squared residuals on the impact factors (X) is fit, and the is given below. Conduct the Breusch-Pagan test to test whether the errors are related to X. Do you reject the null hypothesis of constant variance? Yes or No df SS Regression Residual Total Test Statistic: Rejection Region: Q.15. An experiment was conducted to measure air permeability of fabric (Y) as a function of the following factors: warp density (X 1), weft density (X ), and Mass per unit area (X 3). There were n=30 observations, and 4 models are fit: E Y X X X X X X X X X X X X SSE E Y X X X X X X X X X SSE E Y X X X SSE E Y X X X X X SSE p.15.a. Use the first two models to test H 0:. Test Statistic: Rejection Region: p.15.b. Use the 3 rd and 4 th models to test whether the weft-mass interaction is significant, controlling for all main effects. Test Statistic: Rejection Region: Q.16. A regression model was fit, relating the share of big 3 television network prime-time market share (Y, %) to household penetration of cable/satellite dish providers (X = MVPD) for the years (n=5). The regression results and residual versus time plot are given below. df SS MS F Regression Residual Total Coefficientsandard Erro t Stat P-value Intercept mvpd Residuals Residuals p.16.a. Compute the correlation between big 3 market share and MVPD. p.16.b. The residual plot appears to display serial autocorrelation over time. Conduct the Durbin-Watson test, with null hypothesis that residuals are not autocorrelated. 5 t e e d n p d n p , 5, , 5, t t1 L U Test Statistic: Reject H 0? Yes or No

6 p.16.c. Data were transformed to conduct estimated generalized least squares (EGLS), to account for the auto-correlation. The parameter estimates and standard errors are given below. Obtain 95% confidence intervals for 1, based on Ordinary Least Squares (OLS) and EGLS. Note that the error degrees of freedom are 3 for OLS and for EGLS (estimated the autocorrelation coefficient). beta-egls SE(b-egls) OLS 95% CI: EGLS 95% CI: Q.17. Regression analyses were fit, relating various chemical levels to age for stranded bottlenose dolphins in South Carolina and Florida. This plot gives the quadratic fit, relating mercury/selenium molar ratio (Y) to age (X) for the Florida dolphins. Complete the following parts. Note: The data were NOT centered. The model fit was: E(Y) = 0 + 1X + X n = Predicted value when age = 15 Test H 0: 1 = = 0 Test Statistic: Rejection Region: Q.18. A study was conducted to determine which factors were associated with percent release (Y) of hydroxypropyl methylcellulose (HPMC) tablets. The factors were: X1 = Carr s compressibility index, X = angle of repose, X3 = solubility, X4 =molecular weight, X5 = compression force X6 = apparent viscosity of 4% (w/v) HPMC. The sample size was n=18, and the authors reported the fit of the following models.

7 p.18.a. Complete the table in terms of AIC and SBC (BIC). Predictors p' SSE AIC SBC X1,X,X3,X4,X5,X X1,X,X3,X4,X X1,X,X3,X X1,X3,X4,X X1,X3,X X,X3,X X3,X4,X p.18.b. Which model is best based on AIC: BIC: p.18.c. R for the complete model was Compute the total (corrected) sum of squares (TSS): Q.19. A study was conducted to relate construction plant maintenance cost (pounds) to 3 categorical predictors (industry: coal/slate, machine type: front shovel/backacter, and attitude to used oil analysis: regular use/not regular use) and one numeric predictor (machine weight, in tons). Due to the distributions of costs and machine weight (skewed), both are modeled with (natural) logs. Thus, the variables are (based on a sample of n=33 construction plants): Y = ln(costs) X 1 = 1 if coal industry, 0 if slate X = 1 if machine type = front shovel, 0 if backacter X 3 = 1 if attitude to used oil analysis = regular use, 0 if not X 4 = ln(machine Weight) They fit Models: Model 1: E{Y} = X 1 + X + X 3 + X 4 Model : E{Y} = X 1 + X + X 3 + X 4 + X 1X 4 + X X 4 + X 3X 4 Model1 Model df SS MS F ignificance F df SS MS F ignificance Regression Regression Residual Residual Total Total Coefficientsandard Erro t Stat P-value Coefficientsandard Erro t Stat P-value Intercept Intercept X X X X X X X X X1*X X*X X3*X

8 p.19.a. Based on Model 1, give the fitted value for a firm that is a coal industry, has machine type=backacter, is a regular user of used oil, and ln(machine Weight) = 4.0. Note that the units of the fitted value is ln(costs). p.19.b. Test whether any of the categorical predictors interact with machine weight. H 0:. Test Statistic: Rejection Region: Q.0. Consider the following models, relating Stature (Y) to foot dimensions (RFL = Right Foot Length, RFB = Right Foot Breadth) and Age among the Rajbanshi of North Bengal. We observe the following statistics, based on several regressions among n = 175 adult males. Complete the following table. Note: The total sum of squares is TSS = , and for C p, use s = MSE(RFL,RFB,Age) = All models contain an intercept ( 0). Predictors p' SSE R-square Cp AIC BIC(SBC) RFL RFB Age RFL,RFB RFL,RFB,Age p.0.a. What is the best model based on C p? p.0.b. What is the best model based on AIC? p.0.c. What is the best model based on BIC (SBC)? Q.1. A linear regression was (inappropriately) run, relating success in throwing a frisbee through a hula-hoop (Y=1, if good, 0 if not) to children s eye-color (X = 1 if dark, 0 if light). Note that this should be conducted as a chi-square test, or logistic regression (it was a very old paper). The authors report that r =.030 (well, that s what it should be) from a sample of n = 136 children. Based on the regression : E(Y) = X, test H 0: 0 vs H A: 0 based on the F- test. Q.. A simple linear regression model and Analysis of Variance (Completely Randomized Design) model were fit, relating stretch percentage of viscose rayon to specimen length (there were 4 lengths: 5, 10, 15, 0 inches). The results from each model are given below: Regression Model df SS MS F Regression Residual Total Coefficientsandard Erro t Stat P-value Intercept X Variable Completely Randomized Design (1-Way )

9 df SS MS F Treatments (Lengths) Error Total Below are the fitted values (regression) and sample means (1-way ), and sample sizes: Length Yhat(Reg) Ybar(Grp) n(grp) Conduct the Goodness-of-Fit F-test, where H 0 states that the relationship between stretch percentage and specimen length is linear. Hint: All numbers (sums of squares and degrees of freedom can be obtained (implicitly) from the tables). Test Statistic: Rejection Region: Q.3. A study was conducted, relating Total Medical Waste (Y, in kg/day) to hospital type (Government: X 1=1, Education and Non- Education: X =1, University: X 3=1, Private is the reference type), Hospital Capacity (X 4 = # of beds), and Occupancy Rate (X 5 = % of Beds in Use). Consider the following two models: Model 1: E Y X X X X X Model : E Y X X SUMMARY OUTPUT Model 1 SUMMARY OUTPUT Model df SS MS F ignificance F df SS MS F ignificance Regression Regression Residual Residual Total Total Coefficientsandard Erro t Stat P-value Coefficientsandard Erro t Stat P-value Intercept Intercept govtype beds eduntype occ_rate univtype beds occ_rate p.3.a. Test whether there is a Hospital type effect (controlling for beds and occupancy rate): H 0: Test Statistic: Rejection Region: Reject H 0? Yes or No p.3.b. The first hospital in the sample is University type, has 15 beds, and an occupancy rate of 47%. Based on Model 1, compute its fitted (predicted) value. Its observed value was 30. Compute its residual. Fitted Value: Residual: p.3.c. What proportion (or percentage) of the Variation in Total Medical waste is explained by number of beds and occupancy rate? Answer:

10 Q.4. A study related Personal Best Shot Put distance (Y, in meters) to best preseason power clean lift (X, in kilograms). The following models were fit, based on a sample of n = 4 male collegiate shot putters: Model 1: E Y X SSE R.686 Y X X ^ 0 1 Model : E Y X X SSE R.79 Y X, X X X p.4.a. Use Model to test H 0: (Y is not related to X) ^ Test Statistic: Rejection Region: Reject H 0? Yes or No p.4.b. Use Models 1 and to test H 0: (Y is linearly related to X) Test Statistic: Rejection Region: Reject H 0? Yes or No p.4.c. Obtain the predicted value from each model for a man with best power clean of 175 kg. Model 1: Model : Q.5. A regression model is fit relating weight (Y, in lbs) to height (X, in inches) among n=139 WNBA players (treating this as a random sample of all female athletes from sports such as basketball and volleyball). The results from the regression, and the residual versus fitted plot are given below. df SS Regression Residual Total e Coefficientsandard Erro Intercept Height p.5.a. Test H 0: 1 = 0 (Weight is not associated with Height) based on the t-test. Test Statistic: Rejection Region: Reject H 0? Yes or No p.5.b. Test H 0: 1 = 0 (Weight is not associated with Height) based on the F-test. Test Statistic: Rejection Region: Reject H 0? Yes or No p.5.c. The residual plot appears to display non-constant error variance. A regression of the squared residuals on height (X) is fit, and the is given below. Conduct the Breusch-Pagan test to test whether the errors are related to X. Do you reject the null hypothesis of constant variance? Yes or No df SS Regression Residual Total Test Statistic: Rejection Region: Reject H 0? Yes or No

11 Q.6. A data set consisted of n = 3 observations on the variables Y, X1, X, X3, and X4. Error Sum of Squares = SSE for each of all possible models. For each model, the variables that are in the model are also shown. Use this information to answer the questions following the table. The Total Sum of Squares = SSTO = #Variables SSE Cp AIC SBC=BIC Vars in Model X #N/A #N/A #N/A X #N/A #N/A #N/A X #N/A #N/A #N/A X4 55 X,X4 84 #N/A #N/A #N/A X,X3 90 #N/A #N/A #N/A X1,X3 95 #N/A #N/A #N/A X1,X 40 #N/A #N/A #N/A X1,X4 445 #N/A #N/A #N/A X3,X X1,X,X #N/A #N/A #N/A X1,X,X #N/A #N/A #N/A X,X3,X #N/A #N/A #N/A X1,X3,X X1,X,X3,X4 p.6.a. Complete the table by computing C p, AIC, and SBC=BIC for the best models with 1,,3, and 4 independent variables. p.6.b. Give the best model (in terms of which independent variables to be included), based on each criteria. C p: AIC: SBC=BIC: Q.7. A linear regression model was fit, relating the electric potential (Y) to the current density (X) for a series of n=18 consecutively observed pairs of X and Y. Model: Y X ~ N 0, t 0 1 t t t t 1 t t p.7.a. The estimated regression coefficients and standard errors for ordinary least squares are given below. Obtain a 95% Confidence Interval for the slope coefficient for current density ( 1). Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <e-16 *** x <e-16 *** Lower Bound: Upper Bound: p.7.b. The Durbin-Watson test is used to test H 0: = 0. The test statistic and P-value are given below. Do you conclude that the errors are serially correlated (not independent)? Yes or No lag Autocorrelation D-W Statistic p-value p.7.c. The estimated regression coefficients and standard errors for estimated generalized least squares are given below. Obtain a 95% Confidence Interval for the slope coefficient for current density ( 1) Coefficients: Value Std.Error t-value p-value (Intercept) x Lower Bound: Upper Bound:

12 Q.8. A study considered failure times of tools (Y, in minutes) for n=4 tools. Variables used to predict Y were: cutting Speed (X1 feet/minute), feed rate (X), and Depth (X3). The following models were fit (SSTO=3618): Model 1: E(Y) = X1 + X + X3 SSE1 = 874 Model : E(Y) = X1 + X + X3 + X1*X + X1*X3 + X*X3 + X1*X*X3 SSE = 483 Compute Cp, AIC, and SBC=BIC for each model. Based on each criteria, which model is selected? Criteria Model1 Model Best Model Cp AIC SBC=BIC Q.9. A study compared n = 43 island nations with respect to various demographic and transportation measures. The authors fit a linear regression relating Vehicles/road length (Y, cars/km) to GDP (X, $1000s/capita). The regression output and residual versus predicted plot are given below. df SS Regression Residual Total e Coefficients Standard Error Intercept GDP_K p.9.a. Test H 0: 1 = 0 (Car density is not related to GDP) based on the t-test. Test Statistic: Rejection Region: p.9.b. Test H 0: 1 = 0 (Car density is not related to GDP) based on the F-test. Test Statistic: Rejection Region: p.9.c. The residual plot appears to potentially display non-constant error variance. A regression of the squared residuals on GDP (X) is fit, and the is given below. Conduct the Breusch-Pagan test to test whether the errors are related to X. Do you reject the null hypothesis of constant variance? Yes or No df SS Regression Residual Total Test Statistic: Rejection Region:

13 Q.30. A regression model was fit, relating Weight (Y, in kg) to Height (X 1, in m) and Position for English Premier League Football Players. Position has 4 levels (Forward, Midfielder, Defender, and Goalkeeper). Thus, 3 dummy variables were generated: X = 1 if Forward, 0 otherwise; X 3 = 1 if Midfielder, 0 otherwise; X 4 = 1 if Defender, 0 otherwise. Goalkeepers were the reference position. The following regression models were fit, based on data for n = 441 league players. Model 1: E Y X X X X X X X X X X Model : E Y X X X X Model 3: E Y X TSS 4847 SSR 100 SSR 1184 SSR p.30.a. We wish to test whether the slopes relating weight to height is the same among the positions, allowing the intercepts to differ among positions. Conduct this test for an interaction between height and position. H 0: H A: Test Statistic: Rejection Region: p.30.b. Assuming the interaction is not significant, test whether there is a position effect, after controlling for height. H 0: H A: Test Statistic: Rejection Region: Q.31. A regression model was fit, relating points scored in 014 WNBA games by Skylar Diggins (Y) to whether the game was a Home game (X 1 = 1 if Home, 0 if Away) and the number of minutes she played (X ) over a season of n = 34 games. The regression results are given below for the model: Y = 0 + 1X 1 + X + df SS MS F F(0.05) R^ Regression Residual #N/A #N/A #N/A Total #N/A #N/A #N/A #N/A p.31.a. Complete the table. Do you conclude that Skylar s average point total is associated with the game being at home, and/or the number of minutes she played? Yes No p.31.b. Conduct the Durbin-Watson test, with null hypothesis that residuals are autocorrelated. 5 t e e d n p d n p , 34, , 34, 1.58 t t1 L U Test Statistic: Reject H 0? Yes or No

14 p.31.c. Data were transformed to conduct estimated generalized least squares (EGLS), to account for potential autocorrelation. The parameter estimates and standard errors are given below. Obtain 95% confidence intervals for, based on Ordinary Least Squares (OLS) and EGLS. Note that the error degrees of freedom are 34-3=31 for OLS and 30 for EGLS (estimated the autocorrelation coefficient). OLS OLS EGLS EGLS Parameter Estimate StdError Estimate StdError Intercept Home Minutes OLS 95% CI: EGLS 95% CI: Q.3. A model was fit, relating optimal solar panel tilt angle (Y, in degrees) to a city s latitude (X, in degrees) for a sample of n = 35 cities in the Northern Hemisphere. Consider the following 3 models (the X values have NOT been centered): Model 1: E Y X SSE Y X ^ 0 1 Model : E Y X X SSE Y X X ^ 1 3 Model 3: E Y X X SSE Y 1.445X X Note that Model 3 does not have an intercept (this is the model the authors fit). p.3.a. Give the predicted optimal solar tilt for a location at a latitude of 30 degrees for each model. ^ Model 1: Model : Model 3: P.3.b. Use Models (Full Model) and Model 3 (Reduced Model) to test H 0: 0 = 0 (Given Lat and Lat are in model). Test Statistic: Rejection Region: P.3.c. Use Models (Full Model) and Model 1 (Reduced Model) to test H 0: = 0 (Given an intercept is in model). Test Statistic: Rejection Region: Q.33. A multiple regression model was fit, relating consumer s satisfaction of culinary (food) event (Y) to the following predictor variables/scales: food tasting (X 1), food/beverage prices (X ), ease of coming/going to event (X 3), and convenient parking (X 4). The regression model was fit, based on n=77 respondents, with R = The regression coefficients are given below, for the model: E(Y) = X 1 X X 3 X 4 Parameter Estimate StdError t Constant X X X X p.33.a. Test H 0: Customer satisfaction is not associated with any of the predictors ( ).

15 p.33.a.i. Test Statistic: p.33.a.ii. Reject H 0 if the test statistic falls in the range p.33.b. Test H 0: Controlling for all other predictors, customer satisfaction is not associated with food/bev prices ( 0). p.33.b.i. Test Statistic: p.33.b.ii. Reject H 0 if the test statistic falls in the range Q.34. A linear regression model is fit, relating Y to 3 independent variables. The researcher is interested in determining whether any interactions are significant among the 3 independent variables. She fits the following models (n = 30): Model 1: E Y X X X R Model : E Y X X X X X X X X X R p.34.a. Compute the test statistic for testing H 0: No interactions among X 1, X, and X 3 ( ) p.34.b. Reject H 0 if the test statistic falls in the range Q.35. A regression model was fit, relating estimated cost of de-commissioning oil platforms (Y, in millions of $) to predictors: Total number of piles/legs (X 1) and water depth (X, in 100s of feet). The model was fit, based on n = 17 oil platforms. Consider the models: Model 1: E Y X X X X X X Model : E Y X X Model 1 Model df SS df SS Regression 7397 Regression 7163 Residual 551 Residual 785 Total Total Coefficientsandard Error Coefficientsandard Erro Intercept Intercept totpiles totpiles wtrdph wtrdph tp wtrdph tp*wd p.35.a. Test whether the linear (main effects) model is appropriate. H 0: 11 = = 1 = 0 p.35.b. For each model, obtain the predicted value and residual for rig 17 (Y = 78.5, X 1 = 44, X = 13) Model 1: Predicted = Residual = Model : Predicted = Residual =

16 Q.36. A study related subsidence rate (Y) to water table depth (X 1) for 3 crops: pasture (X = 0, X 3 = 0), truck crop (X = 1, X 3 = 0), and sugarcane (X = 0, X 3 = 1). Note the total sum of squares is TSS = , and n = 4. p.36.a. Give a model that allows separate intercepts for each crop type, with a common slope for water table depth among crop types. Sketch the graph for this model. For this model, SSE = Give the error degrees of freedom. p.36.b. Give a model that allows separate intercepts for among crop types, with separate slopes for water table depth among crop types. Sketch the graph for this model. For this model, SSE = Give the error degrees of freedom. p.36.c. Test the null hypothesis that the (simpler) model in p..a. is appropriate. That is, the extra parameters in the second model are not significantly different from 0. p.36.d. For the model in p..a., compute R Q.37. A study investigated meteorological effects on condition of wheat yield in Ohio (Y), based on a series of n = 4 years. The predictors were: Average October/November temperature (X 1), September precipitation (X ), October/November precipitation (X 3), and percent September sunshine (X 4). The best 1-,-,3-, and 4-variable models (minimum SSE) are given below. Model p' SSE Cp AIC BIC X X,X X1,X,X X1,X,X3,X p.37.a. Complete the table. Use MSE of the full (X 1,X,X 3,X 4) models as the estimate of when computing C p p.37.b. Give the best model based on each criteria. C p AIC BIC p.37.c. The following output gives the regression coefficients for the X 1,X,X 3 model. Give the fitted value and residual for the first year (Y = 9, X 1 = 46, X = 1.6, X 3 = 6.3). Coefficients Intercept tempon.x rains.x.38 rainon.x3.96 Fitted Value Residual p.37.d. For the model in p.3.c., we obtain: 4 t e e d n p d n p 306 4, , t t1 L U Test H 0: Errors are not autocorrelated versus H A: Errors are autocorrelated Circle the Best Answer Reject H 0 Accept H 0 Test is inconclusive

17 Q.38. A simple linear regression model was fit relating Weight (Y, in pounds) to Height (X, in inches) for a random sample of n=5 National Hockey League players. The total sum of squares, TSS = 937, and R = 0.6. p.38.a. Complete the following table. Source df SS MS F F(0.05) Regression Residual #N/A #N/A Total #N/A #N/A #N/A p.38.b. Complete the following table and use it to conduct the F-test for Lack-of-Fit (there are c = 9 distinct heights). H : X j 1,..., c H : X 0 j 0 1 j A j 0 1 j c ^ c j j j LF j 1 j PE j1 j1 SSLF n Y Y df c SSPE n S df n c Height n Y-bar Y-hat n*(yb-yh)(n-1)s^ Sum 5 #N/A #N/A Test Statistic: Reject H 0 if Test Stat falls in Range: Q.39. A simple linear regression model is fit, relating Orlando June Total Precipitation (Y) to Mean Temperature (X) over an n = 45 year period. The following table gives the results. df SS Regression Residual Total Coefficientsandard Erro Intercept meantemp p.39.a. Use the t-test to test H 0: 1 = 0 vs H A: 1 0 at = 0.10 significance level Test Statistic: Reject H 0 if Test Stat falls in range:

18 A plot of the residuals displays a possible case of unequal variances. The regression of the squared residuals on X is given below: 10.0 e e df SS Regression Residual Total p.39.b. Conduct the Breusch-Pagan test to test H 0: Equal Variances H A: Variance is related to X. (Use = 0.05): Test Statistic: Reject H 0 if Test Statistic falls in the Range: Q.40. A multiple regression model is fit with 3 predictors and an intercept, based on a sample of n=5 observations. How large must R /(1-R ) be to reject H 0: 1 =? Q.41. It is possible to fail to reject H 01: 1 = 0 and H 0: = 0 based on t-tests in multiple linear regression model with p > predictors, but still reject H 01: 1 = = 0, controlling for the remaining p- predictors. True or False Q.4. A multiple linear regression model is fit relating a dependent variable to a set of 3 numeric predictor variables, based on a sample on n=0 experimental units. How large does R need to be so that the null hypothesis H 0: will be rejected? Q.43. An experiment is conducted with 3 numeric predictors and categorical predictors, one with 3 levels, the other with levels. There are no interaction or polynomial terms in the model, and the sample size is n = 30. Give the degrees of freedom for Regression and Error. Df Reg = Df Error = Q.44. It is possible for a dataset to reject H 0: p = 0 based on the F-test, but fail to reject H 0: i = 0 for i=1,,p based on the individual t-tests. True or False

19 Q.45. A linear regression model is fit, relating salary (Y) to experience (X 1), gender (X =1 if female, 0 if male) and an experience/gender interaction term to employees in a large law firm. The fitted equation is ^ Y X 1000X 100X X 1 1. Give the predicted salaries for the following groups of individuals. Males with 0 experience Females with 0 experience Males with 10 experience Females with 10 experience Q.46. A regression model was fit, relating blood alcohol elimination rate measurements (Y, in grams/litre/hour) to Gender (X 1=1 if female, 0 if male), breath alcohol elimination measurements (X in mg/l/h) and a gender/breath interaction term. The sample was 59 adult Austrians. Y X X X X p.46.a. Complete the following Analysis of Variance Table and test H 0. 0 : 1 3 df SS MS F F(.05) Regression Residual #N/A #N/A Total #N/A #N/A #N/A p.46.b. Is the P-value for the test Larger or Smaller than 0.05? p.46.c. What proportion of the variation in Blood alcohol elimination rate measurements is explained by the model? p.46.d. The regression coefficient estimates are given below. Test H : 0 : 0 for each coefficient. 0 i H A i Coefficients Standard Error t_obs t(.05) Reject H0? Intercept #N/A #N/A #N/A female breath f*breath

20 Q.47. Monthly mean temperatures for Boston (Y, in Fahrenheit) for the years are fit using a linear regression model to Year (X 1=Year-190) and 11 monthly dummy variables (X = 1 if January, 0 otherwise,, X 1 = 1 if November, 0 otherwise, Note that December is the reference month). The table and regression coefficient estimates are given below. df SS MS F ignificance F Regression Residual Total Coefficientstandard Erro t Stat P-value Intercept year month month month month month month month month month month month p.47.a. Give the predicted temperatures for December 190, June (Month 6) 190, December 010, and June 010. December June p.47.b. Compute a 95% Confidence Interval for the change in annual mean temperature, controlling for month. Lower Bound: Upper Bound: 1140 p.47.c. Compute the Durbin-Watson statistic. e DW = t t et p.47.d. What proportion of the variation in temperature is explained by the model?

21 Q.48. A response surface model is fit, relating potato chip moistness (Y) to 3 factors: drying time (X 1), frying temperature (X ), and frying time (X 3). There were n = 0 experimental runs (observations). The following 3 models were fit: Model 1: E Y X X X SSR 475. SSE Model : E Y X X X X X X X X X SSR SSE Model 3: E Y X X X X X X X X X X X X SSR SSE 1.4 p.48.a. Test whether any of the -way interaction effects are significantly different from 0, controlling for main effects. H 0: Test Statistic: Rejection Region: P-value: > or < 0.05 p.48.b. Test whether any of the quadratic effects are significantly different from 0, controlling for main effects and - factor interactions. H 0: Test Statistic: Rejection Region: P-value: > or < 0.05 Q.49. A regression model was fit, relating Price (Y, in $1000s) to acceleration rate (X 1) and Miles per gallon (X ) for a sample of n = 5 models of hybrid compact cars. The fitted equation and summary model statistics are given below. ^ Y X 0.0X SSR 957 SSE p.49.a. Test whether price is related to either acceleration rate and/or Miles per gallon. H 0:. Test Statistic: Rejection Region: P-value: > or < 0.05 p.49.b. A plot of the residuals versus predicted values suggests a possible non-constant error variance. A regression of the squared residuals on X 1 and X yields SS(Reg * ) = Test: H : Equal Variance Among Errors i H : Unequal Variance Among Errors h X X 0 i A i 1 i1 i Residuals versus Predicted Test Statistic: Rejection Region: P-value: > or < 0.05

22 Q.50. A regression model is fit, relating energy consumption (Y) to 3 predictors: area (X 1), age (X ), and effective number of guest rooms (X 3 = rooms*occupancy rate) for a sample of n = 15 hotel rooms. p.50.a. Complete the following table of C p, AIC, and BIC for all possible regressions involving X 1, X, and X 3. Model p* SSE C_p AIC BIC X X X X1,X X1,X X,X X1,X,X p.50.b. Which model is selected based on each criteria? C p: AIC: BIC: p.50.c. To check for issues of multicollinearity, a regression relating each predictor on the other predictors is fit. The largest R of the 3 regressions is when X 1 is regressed on X and X 3. That R 1 value is Compute the Variance Inflation Factor VIF for X 1, where VIF1 1/ 1 Ri. Does it exceed 10? Q.51. A study of legitimacy theory was conducted after the Exxon Valdez oil spill in the late 1980s among a sample of n = 1 firms in Alaska. The response variable was the change in the number of environmental disclosures (Y, ) and the predictor variables were the firms log revenues for 1989 (X 1) and an indicator variable of whether the firm was E Y X X part owner of the Aleyska pipeline (X = 1 if Yes, 0 if No) p.51.a. Complete the following Analysis of Variance Table and test H 0:. Source df SS MS F F(.05) Regression 13.7 Error #N/A #N/A Total 7.95 #N/A #N/A #N/A p.51.b. Is the P-value for the test Larger or Smaller than 0.05? p.51.c. Give the residual standard error for the model (estimate of ).

23 p.51.d. The theory states that the regression coefficients for both firm size and part ownership of the pipeline should be positive. The regression coefficient estimates are given below. Test H 0: i = 0 vs H A: i > 0 for each coefficient. Coeffs Std. Err. t_obs t(.05) Reject H0? Intercept #N/A #N/A #N/A X X Q.5. Monthly mean temperatures for Houston (Y, in Fahrenheit) for the years are fit using a linear regression model to Year (X 1=Year-1947) and 11 monthly dummy variables (X = 1 if January, 0 otherwise,, X 1 = 1 if November, 0 otherwise, Note that December is the reference month). The table and regression coefficient estimates are given below. df SS MS F Regression Residual Total Coefficientsandard Erro t Stat P-value Intercept Year E-09 Month E-06 Month Month E-48 Month E-139 Month E-6 Month E-87 Month E-307 Month Month E-66 Month E-165 Month E-41 p.5.a. Give the predicted temperatures for December 1947, June (Month 6) 1947, December 007, and June 007. December June p.5.b. Compute a 95% Confidence Interval for the change in annual mean temperature, controlling for month. Lower Bound: Upper Bound: p.5.c. Compute the Durbin-Watson statistic. e DW = 816 t t et

24 Q.53. A response surface model is fit, relating lipstick color characteristic b* (yellow/blue) (Y) to 3 factors: liposoluble dye amount (X 1), titanium dioxide (X ), and pigment (X 3). There were n = 17 experimental runs (observations). The following 3 models were fit: Model 1: E Y X X X SSR SSE Model : E Y X X X X X X X X X SSR SSE Model 3: E Y X X X X X X X X X X X X SS R3 SSE p.53.a. Test whether any of the -way interaction effects or quadratic terms are significantly different from 0, controlling for main effects. H 0: Test Statistic: Rejection Region: P-value: > or < 0.05 p.53.b. What proportion of variation in Y is explained by each model? Model 1 Model Model 3 Q.54. A regression model was fit, relating Total Points (Y = 3*Wins + Draws) to Total Salary (X, in millions of pounds) for the n = 0 English Premier Football League 1998/1999 season. The fitted equation and summary model statistics are given below. ^ Y X SSR 050 SSE 1674 p.54.a. Test whether Total Points is related to Total Salary. H 0:. Test Statistic: Rejection Region: P-value: > or < 0.05 p.54.b. A plot of the squared residuals versus predicted values suggests a possible non-constant error variance. A regression of the squared residuals on X yields SS(Reg * ) = Test: H : Equal Variance Among Errors i H : Unequal Variance Among Errors h X 0 i A i 1 i e^ versus Y-hat Test Statistic: Rejection Region: P-value: > or < 0.05

25 Q.55. A regression model for language diversity was fit, relating log of the number of languages spoken (Y) to 3 predictors: log of area (X 1), log of population (X ), and mean growing season (X 3, in months) for n = 54 countries. p.55.a. Complete the following table of C p, AIC, and BIC for all possible regressions involving X 1, X, and X 3. Model p* SSE C_p AIC BIC X X X X1,X X1,X X,X X1,X,X p.55.b. Which model is selected based on each criteria? C p: AIC: BIC: p.55.c. To check for issues of multicollinearity, a regression relating each predictor on the other predictors is fit. The largest R of the 3 regressions is when X 1 is regressed on X and X 3. That R 1 value is Compute the Variance Inflation Factor VIF for X 1, where VIF1 1/ 1 Ri. Does it exceed 10? Q.56. Backward Elimination and Forward Selection based on using the model AIC will always choose the same model. True / False Q.57. In multiple regression it is possible to have a significant F-test for the full model, while none of the individual t-tests for the partial coefficients are significant. True / False Q.58. A researcher states that her regression model explains 75% of the variation in her dependent variable. This means that SSE/TSS = 0.5, where SSE is the Error sum of squares and TSS is the Total sum of squares True / False

STA 6167 Exam 1 Spring 2016 PRINT Name

STA 6167 Exam 1 Spring 2016 PRINT Name STA 6167 Exam 1 Spring 2016 PRINT Name Unless stated otherwise, for all significance tests, use = 0.05 significance level. Q.1. A regression model was fit, relating estimated cost of de-commissioning oil

More information

STA 4210 Practise set 2b

STA 4210 Practise set 2b STA 410 Practise set b For all significance tests, use = 0.05 significance level. S.1. A linear regression model is fit, relating fish catch (Y, in tons) to the number of vessels (X 1 ) and fishing pressure

More information

STA 4210 Practise set 2a

STA 4210 Practise set 2a STA 410 Practise set a For all significance tests, use = 0.05 significance level. S.1. A multiple linear regression model is fit, relating household weekly food expenditures (Y, in $100s) to weekly income

More information

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. B203: Quantitative Methods Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. Part I: Compulsory Questions. Answer all questions. Each question carries

More information

holding all other predictors constant

holding all other predictors constant Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y = b 0 + b 1 x 1 + + b p x p + e Partial Regression Coefficients: b i effect (on the mean response) of increasing

More information

STAT 212 Business Statistics II 1

STAT 212 Business Statistics II 1 STAT 1 Business Statistics II 1 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA STAT 1: BUSINESS STATISTICS II Semester 091 Final Exam Thursday Feb

More information

Ch 13 & 14 - Regression Analysis

Ch 13 & 14 - Regression Analysis Ch 3 & 4 - Regression Analysis Simple Regression Model I. Multiple Choice:. A simple regression is a regression model that contains a. only one independent variable b. only one dependent variable c. more

More information

Sem. 1 Review Ch. 1-3

Sem. 1 Review Ch. 1-3 AP Stats Sem. 1 Review Ch. 1-3 Name 1. You measure the age, marital status and earned income of an SRS of 1463 women. The number and type of variables you have measured is a. 1463; all quantitative. b.

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Chapter 14 Multiple Regression Analysis

Chapter 14 Multiple Regression Analysis Chapter 14 Multiple Regression Analysis 1. a. Multiple regression equation b. the Y-intercept c. $374,748 found by Y ˆ = 64,1 +.394(796,) + 9.6(694) 11,6(6.) (LO 1) 2. a. Multiple regression equation b.

More information

their contents. If the sample mean is 15.2 oz. and the sample standard deviation is 0.50 oz., find the 95% confidence interval of the true mean.

their contents. If the sample mean is 15.2 oz. and the sample standard deviation is 0.50 oz., find the 95% confidence interval of the true mean. Math 1342 Exam 3-Review Chapters 7-9 HCCS **************************************************************************************** Name Date **********************************************************************************************

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore What is Multiple Linear Regression Several independent variables may influence the change in response variable we are trying to study. When several independent variables are included in the equation, the

More information

Regression of Inflation on Percent M3 Change

Regression of Inflation on Percent M3 Change ECON 497 Final Exam Page of ECON 497: Economic Research and Forecasting Name: Spring 2006 Bellas Final Exam Return this exam to me by midnight on Thursday, April 27. It may be e-mailed to me. It may be

More information

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12) Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12) Remember: Z.05 = 1.645, Z.01 = 2.33 We will only cover one-sided hypothesis testing (cases 12.3, 12.4.2, 12.5.2,

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

LI EAR REGRESSIO A D CORRELATIO

LI EAR REGRESSIO A D CORRELATIO CHAPTER 6 LI EAR REGRESSIO A D CORRELATIO Page Contents 6.1 Introduction 10 6. Curve Fitting 10 6.3 Fitting a Simple Linear Regression Line 103 6.4 Linear Correlation Analysis 107 6.5 Spearman s Rank Correlation

More information

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Stat 500 Midterm 2 12 November 2009 page 0 of 11 Stat 500 Midterm 2 12 November 2009 page 0 of 11 Please put your name on the back of your answer book. Do NOT put it on the front. Thanks. Do not start until I tell you to. The exam is closed book, closed

More information

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.

More information

SMAM 314 Practice Final Examination Winter 2003

SMAM 314 Practice Final Examination Winter 2003 SMAM 314 Practice Final Examination Winter 2003 You may use your textbook, one page of notes and a calculator. Please hand in the notes with your exam. 1. Mark the following statements True T or False

More information

CHAPTER EIGHT Linear Regression

CHAPTER EIGHT Linear Regression 7 CHAPTER EIGHT Linear Regression 8. Scatter Diagram Example 8. A chemical engineer is investigating the effect of process operating temperature ( x ) on product yield ( y ). The study results in the following

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

ECON 4230 Intermediate Econometric Theory Exam

ECON 4230 Intermediate Econometric Theory Exam ECON 4230 Intermediate Econometric Theory Exam Multiple Choice (20 pts). Circle the best answer. 1. The Classical assumption of mean zero errors is satisfied if the regression model a) is linear in the

More information

Multiple linear regression S6

Multiple linear regression S6 Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple

More information

ECON 497 Midterm Spring

ECON 497 Midterm Spring ECON 497 Midterm Spring 2009 1 ECON 497: Economic Research and Forecasting Name: Spring 2009 Bellas Midterm You have three hours and twenty minutes to complete this exam. Answer all questions and explain

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Study Guide AP Statistics

Study Guide AP Statistics Study Guide AP Statistics Name: Part 1: Multiple Choice. Circle the letter corresponding to the best answer. 1. Other things being equal, larger automobile engines are less fuel-efficient. You are planning

More information

Test 3A AP Statistics Name:

Test 3A AP Statistics Name: Test 3A AP Statistics Name: Part 1: Multiple Choice. Circle the letter corresponding to the best answer. 1. Other things being equal, larger automobile engines consume more fuel. You are planning an experiment

More information

Eco and Bus Forecasting Fall 2016 EXERCISE 2

Eco and Bus Forecasting Fall 2016 EXERCISE 2 ECO 5375-701 Prof. Tom Fomby Eco and Bus Forecasting Fall 016 EXERCISE Purpose: To learn how to use the DTDS model to test for the presence or absence of seasonality in time series data and to estimate

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

SECTION I Number of Questions 42 Percent of Total Grade 50

SECTION I Number of Questions 42 Percent of Total Grade 50 AP Stats Chap 7-9 Practice Test Name Pd SECTION I Number of Questions 42 Percent of Total Grade 50 Directions: Solve each of the following problems, using the available space (or extra paper) for scratchwork.

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 24, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

Use of Dummy (Indicator) Variables in Applied Econometrics

Use of Dummy (Indicator) Variables in Applied Econometrics Chapter 5 Use of Dummy (Indicator) Variables in Applied Econometrics Section 5.1 Introduction Use of Dummy (Indicator) Variables Model specifications in applied econometrics often necessitate the use of

More information

STAT 212: BUSINESS STATISTICS II Third Exam Tuesday Dec 12, 6:00 PM

STAT 212: BUSINESS STATISTICS II Third Exam Tuesday Dec 12, 6:00 PM STAT212_E3 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICS & STATISTICS Term 171 Page 1 of 9 STAT 212: BUSINESS STATISTICS II Third Exam Tuesday Dec 12, 2017 @ 6:00 PM Name: ID #:

More information

Mrs. Poyner/Mr. Page Chapter 3 page 1

Mrs. Poyner/Mr. Page Chapter 3 page 1 Name: Date: Period: Chapter 2: Take Home TEST Bivariate Data Part 1: Multiple Choice. (2.5 points each) Hand write the letter corresponding to the best answer in space provided on page 6. 1. In a statistics

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu Last Name (Print): Solution First Name (Print): Student Number: MGECHY L Introduction to Regression Analysis Term Test Friday July, PM Instructor: Victor Yu Aids allowed: Time allowed: Calculator and one

More information

Tribhuvan University Institute of Science and Technology 2065

Tribhuvan University Institute of Science and Technology 2065 1CSc. Stat. 108-2065 Tribhuvan University Institute of Science and Technology 2065 Bachelor Level/First Year/ First Semester/ Science Full Marks: 60 Computer Science and Information Technology (Stat. 108)

More information

STAT 350 Final (new Material) Review Problems Key Spring 2016

STAT 350 Final (new Material) Review Problems Key Spring 2016 1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,

More information

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal yuppal@ysu.edu Sampling Distribution of b 1 Expected value of b 1 : Variance of b 1 : E(b 1 ) = 1 Var(b 1 ) = σ 2 /SS x Estimate of

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Regression Models. Chapter 4. Introduction. Introduction. Introduction Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager

More information

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1 9.1 Scatter Plots and Linear Correlation Answers 1. A high school psychologist wants to conduct a survey to answer the question: Is there a relationship between a student s athletic ability and his/her

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 4: Qualitative influences and Heteroskedasticity Egypt Scholars Economic Society November 1, 2014 Assignment & feedback enter classroom at http://b.socrative.com/login/student/

More information

STA 6207 Practice Problems Nonlinear Regression

STA 6207 Practice Problems Nonlinear Regression STA 6207 Practice Problems Nonlinear Regression Q.1. A study was conducted to measure the effects of pea density (X 1, in plants/m 2 ) and volunteer barley density (X 2, in plants/m 2 ) on pea seed yield

More information

Multiple regression: Model building. Topics. Correlation Matrix. CQMS 202 Business Statistics II Prepared by Moez Hababou

Multiple regression: Model building. Topics. Correlation Matrix. CQMS 202 Business Statistics II Prepared by Moez Hababou Multiple regression: Model building CQMS 202 Business Statistics II Prepared by Moez Hababou Topics Forward versus backward model building approach Using the correlation matrix Testing for multicolinearity

More information

Multiple Regression: Chapter 13. July 24, 2015

Multiple Regression: Chapter 13. July 24, 2015 Multiple Regression: Chapter 13 July 24, 2015 Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors)

More information

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1 Chapter 10 Correlation and Regression McGraw-Hill, Bluman, 7th ed., Chapter 10 1 Chapter 10 Overview Introduction 10-1 Scatter Plots and Correlation 10- Regression 10-3 Coefficient of Determination and

More information

bivariate correlation bivariate regression multiple regression

bivariate correlation bivariate regression multiple regression bivariate correlation bivariate regression multiple regression Today Bivariate Correlation Pearson product-moment correlation (r) assesses nature and strength of the linear relationship between two continuous

More information

NATCOR Regression Modelling for Time Series

NATCOR Regression Modelling for Time Series Universität Hamburg Institut für Wirtschaftsinformatik Prof. Dr. D.B. Preßmar Professor Robert Fildes NATCOR Regression Modelling for Time Series The material presented has been developed with the substantial

More information

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do

More information

The simple linear regression model discussed in Chapter 13 was written as

The simple linear regression model discussed in Chapter 13 was written as 1519T_c14 03/27/2006 07:28 AM Page 614 Chapter Jose Luis Pelaez Inc/Blend Images/Getty Images, Inc./Getty Images, Inc. 14 Multiple Regression 14.1 Multiple Regression Analysis 14.2 Assumptions of the Multiple

More information

SMAM 314 Exam 3 Name. 1. Mark the following statements true or false. (6 points 2 each)

SMAM 314 Exam 3 Name. 1. Mark the following statements true or false. (6 points 2 each) SMAM 314 Exam 3 Name 1. Mark the following statements true or false. (6 points 2 each) F A. A t test on independent samples is appropriate when the results of an algebra test are being compared for the

More information

2. Regression Review

2. Regression Review 2. Regression Review 2.1 The Regression Model The general form of the regression model y t = f(x t, β) + ε t where x t = (x t1,, x tp ), β = (β 1,..., β m ). ε t is a random variable, Eε t = 0, Var(ε t

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Chapter 3 Review Chapter 3: Examining Relationships 1. A study is conducted to determine if one can predict the yield of a crop based on the amount of yearly rainfall. The response variable in this study

More information

Econometrics Homework 1

Econometrics Homework 1 Econometrics Homework Due Date: March, 24. by This problem set includes questions for Lecture -4 covered before midterm exam. Question Let z be a random column vector of size 3 : z = @ (a) Write out z

More information

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points EEP 118 / IAS 118 Elisabeth Sadoulet and Kelly Jones University of California at Berkeley Fall 2008 Introductory Applied Econometrics Final examination Scores add up to 125 points Your name: SID: 1 1.

More information

5, 0. Math 112 Fall 2017 Midterm 1 Review Problems Page Which one of the following points lies on the graph of the function f ( x) (A) (C) (B)

5, 0. Math 112 Fall 2017 Midterm 1 Review Problems Page Which one of the following points lies on the graph of the function f ( x) (A) (C) (B) Math Fall 7 Midterm Review Problems Page. Which one of the following points lies on the graph of the function f ( ) 5?, 5, (C) 5,,. Determine the domain of (C),,,, (E),, g. 5. Determine the domain of h

More information

Simple Linear Regression: One Qualitative IV

Simple Linear Regression: One Qualitative IV Simple Linear Regression: One Qualitative IV 1. Purpose As noted before regression is used both to explain and predict variation in DVs, and adding to the equation categorical variables extends regression

More information

MATH 1710 College Algebra Final Exam Review

MATH 1710 College Algebra Final Exam Review MATH 1710 College Algebra Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Solve the problem. 1) There were 480 people at a play.

More information

SMAM 314 Exam 42 Name

SMAM 314 Exam 42 Name SMAM 314 Exam 42 Name Mark the following statements True (T) or False (F) (10 points) 1. F A. The line that best fits points whose X and Y values are negatively correlated should have a positive slope.

More information

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section: Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 You have until 10:20am to complete this exam. Please remember to put your name,

More information

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0 Introduction to Econometrics Midterm April 26, 2011 Name Student ID MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. (5,000 credit for each correct

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X 1.04) =.8508. For z < 0 subtract the value from

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

1-2 Analyzing Graphs of Functions and Relations

1-2 Analyzing Graphs of Functions and Relations Use the graph of each function to estimate the indicated function values. Then confirm the estimate algebraically. Round to the nearest hundredth, if necessary. 2. 6. a. h( 1) b. h(1.5) c. h(2) a. g( 2)

More information

Multiple Regression Methods

Multiple Regression Methods Chapter 1: Multiple Regression Methods Hildebrand, Ott and Gray Basic Statistical Ideas for Managers Second Edition 1 Learning Objectives for Ch. 1 The Multiple Linear Regression Model How to interpret

More information

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.

More information

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E Salt Lake Community College MATH 1040 Final Exam Fall Semester 011 Form E Name Instructor Time Limit: 10 minutes Any hand-held calculator may be used. Computers, cell phones, or other communication devices

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Answer Key: Problem Set 6

Answer Key: Problem Set 6 : Problem Set 6 1. Consider a linear model to explain monthly beer consumption: beer = + inc + price + educ + female + u 0 1 3 4 E ( u inc, price, educ, female ) = 0 ( u inc price educ female) σ inc var,,,

More information

[ z = 1.48 ; accept H 0 ]

[ z = 1.48 ; accept H 0 ] CH 13 TESTING OF HYPOTHESIS EXAMPLES Example 13.1 Indicate the type of errors committed in the following cases: (i) H 0 : µ = 500; H 1 : µ 500. H 0 is rejected while H 0 is true (ii) H 0 : µ = 500; H 1

More information

Possibly useful formulas for this exam: b1 = Corr(X,Y) SDY / SDX. confidence interval: Estimate ± (Critical Value) (Standard Error of Estimate)

Possibly useful formulas for this exam: b1 = Corr(X,Y) SDY / SDX. confidence interval: Estimate ± (Critical Value) (Standard Error of Estimate) Statistics 5100 Exam 2 (Practice) Directions: Be sure to answer every question, and do not spend too much time on any part of any question. Be concise with all your responses. Partial SAS output and statistical

More information

FinQuiz Notes

FinQuiz Notes Reading 10 Multiple Regression and Issues in Regression Analysis 2. MULTIPLE LINEAR REGRESSION Multiple linear regression is a method used to model the linear relationship between a dependent variable

More information

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. 1/48 Linear regression Linear regression is a simple approach

More information

TREY COX SCOTT ADAMSON COLLEGE ALGEBRA ACTIVITY GUIDE TO ACCOMPANY COLLEGE ALGEBRA: A MAKE IT REAL APPROACH

TREY COX SCOTT ADAMSON COLLEGE ALGEBRA ACTIVITY GUIDE TO ACCOMPANY COLLEGE ALGEBRA: A MAKE IT REAL APPROACH TREY COX SCOTT ADAMSON COLLEGE ALGEBRA ACTIVITY GUIDE TO ACCOMPANY COLLEGE ALGEBRA: A MAKE IT REAL APPROACH College Algebra Activity Guide For College Algebra: A Make It Real Approach Trey Cox Chandler-Gilbert

More information

Regression Analysis II

Regression Analysis II Regression Analysis II Measures of Goodness of fit Two measures of Goodness of fit Measure of the absolute fit of the sample points to the sample regression line Standard error of the estimate An index

More information

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts. Slide 1 Slide 2 Daphne Phillip Kathy Slide 3 Pick a Brick 100 pts 200 pts 500 pts 300 pts 400 pts 200 pts 300 pts 500 pts 100 pts 300 pts 400 pts 100 pts 400 pts 100 pts 200 pts 500 pts 100 pts 400 pts

More information

Multiple Choice Circle the letter corresponding to the best answer for each of the problems below (4 pts each)

Multiple Choice Circle the letter corresponding to the best answer for each of the problems below (4 pts each) Math 221 Hypothetical Exam 1, Wi2008, (Chapter 1-5 in Moore, 4th) April 3, 2063 S. K. Hyde, S. Barton, P. Hurst, K. Yan Name: Show all your work to receive credit. All answers must be justified to get

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 Answer Key March 28, 2002 Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably

More information

Unit 11: Multiple Linear Regression

Unit 11: Multiple Linear Regression Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable

More information

THE UNIVERSITY OF HONG KONG School of Economics & Finance Answer Keys to st Semester Examination

THE UNIVERSITY OF HONG KONG School of Economics & Finance Answer Keys to st Semester Examination THE UNIVERSITY OF HONG KONG School of Economics & Finance Answer Keys to 2003-2004 1st Semester Examination Economics: ECON1003 Analysis of Economic Data Dr K F Wong 1. (6 points) State and explain briefly

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

No other aids are allowed. For example you are not allowed to have any other textbook or past exams. UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In

More information

Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons

Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons 1. Suppose we wish to assess the impact of five treatments while blocking for study participant race (Black,

More information

Part Possible Score Base 5 5 MC Total 50

Part Possible Score Base 5 5 MC Total 50 Stat 220 Final Exam December 16, 2004 Schafer NAME: ANDREW ID: Read This First: You have three hours to work on the exam. The other questions require you to work out answers to the questions; be sure to

More information

a. Yes, it is consistent. a. Positive c. Near Zero

a. Yes, it is consistent. a. Positive c. Near Zero Chapter 4 Test B Multiple Choice Section 4.1 (Visualizing Variability with a Scatterplot) 1. [Objective: Analyze a scatter plot and recognize trends] Doctors believe that smoking cigarettes lowers lung

More information

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013 Midterm 3 4/5/2013 Instructions: You may use a calculator, and one sheet of notes. You will never be penalized for showing work, but if what is asked for can be computed directly, points awarded will depend

More information

Simple linear regression

Simple linear regression Simple linear regression Business Statistics 41000 Fall 2015 1 Topics 1. conditional distributions, squared error, means and variances 2. linear prediction 3. signal + noise and R 2 goodness of fit 4.

More information

Algebra 1. Functions and Modeling Day 2

Algebra 1. Functions and Modeling Day 2 Algebra 1 Functions and Modeling Day 2 MAFS.912. F-BF.2.3 Which statement BEST describes the graph of f x 6? A. The graph of f(x) is shifted up 6 units. B. The graph of f(x) is shifted left 6 units. C.

More information

ALGEBRA I SEMESTER EXAMS PRACTICE MATERIALS SEMESTER (1.1) Examine the dotplots below from three sets of data Set A

ALGEBRA I SEMESTER EXAMS PRACTICE MATERIALS SEMESTER (1.1) Examine the dotplots below from three sets of data Set A 1. (1.1) Examine the dotplots below from three sets of data. 0 2 4 6 8 10 Set A 0 2 4 6 8 10 Set 0 2 4 6 8 10 Set C The mean of each set is 5. The standard deviations of the sets are 1.3, 2.0, and 2.9.

More information

Model Building Chap 5 p251

Model Building Chap 5 p251 Model Building Chap 5 p251 Models with one qualitative variable, 5.7 p277 Example 4 Colours : Blue, Green, Lemon Yellow and white Row Blue Green Lemon Insects trapped 1 0 0 1 45 2 0 0 1 59 3 0 0 1 48 4

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables

Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables Lecture 8. Using the CLR Model Relation between patent applications and R&D spending Variables PATENTS = No. of patents (in 000) filed RDEP = Expenditure on research&development (in billions of 99 $) The

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information