STAT5044: Regression and ANOVA, Fall 2011 Final Exam on Dec 14. Your Name:

STAT5044: Regression and ANOVA, Fall 2011 Final Exam on Dec 14 Your Name: Please make sure to specify all of your notations in each problem GOOD LUCK! 1

Problem# 1. Consider the following model, y i = β 0 + β 1 x 1i + β 2 x 2 1i + β 3 x 2i + σε i, i = 1,...,n where E(ε i ) = 0, Var(ε i ) = 1, Cov(ε i,ε j ) = 0. Parameters (β 0, β 1, β 2, β 3, σ 2 ) are unknown. (a) Under the assumption that ε i are iid and normally distributed, we want to test H 0 : β 0 + β 1 β 2 + β 3 = 0 vs H a : not H 0. What is your test statistic and its distribution? Justify why your test statistic follows this distribution and provide your decision rule. Be sure to define all terms. (b) We found that there is a significant evidence to reject H 01 in (b). We then fit the simple linear model between y and x 1. We found that the p-value obtained from the Shapiro-Wilk test was still smaller than the significant level 0.05. However, by taking the log transformation of y, we obtained a larger p-value from the Shapiro-Wilk test. Hence we consider the following simple linear regression of log(y) on x 1 log(y i ) = γ 0 + γ 1 x 1i + σε i. From this regression model, we observe that the p-value from Breusch-Pagan test is smaller than the significant level. The scatter plot between abs-residuals and x shows that there is a strong linear relationship between them. Under this situation, we want to obtain a prediction interval of y new for a given new x 1,new. Explain in a step by step manner how to obtain the prediction interval of y new. Be sure to define all terms. 2

Problem# 2. An experiment analyzes imperfection rates for two processes used to fabricate silicon wafers for computer chips. For treatment A applied to 10 wafers, the numbers of imperfections are 8, 7, 6,6,3,4,4,7,2,3,4. Treatment B applied to 10 other wafers has 9,9,8,14,8,13,11,5,7,6 imperfections. An experimenter wants to know whether imperfection rates for two treatments are the same or not. (a) Construct generalized linear model (GLM) for this experiment. (b) Based on your GLM in (a), obtain likelihood. (c) Explain in a step by step manner how to estimate parameter. Interpret your parameters. (d) What are the distributions of your parameters? Explain in a step by step manner how to calculate confidence interval for your parameter estimators. (e) Explain in a step by step manner how to use the confidence interval to know whether imperfection rates for two treatments are the same or not. 3

Problem# 3. For each problem, identify what analysis or test you can perform. If necessary, give model, null hypothesis, test statistics, and decision rule. Explain how you estimate parameters in your models in detail. (Please make sure to specify all of your notations and index.) (i) The data refer to 10 army corps, each observed for 20 years. In 109 corps-years of exposure, there were no deaths, in 65 corps-years there was one death and so on. We would like to test whether probabilities of occurrences in these five categories follow a Poisson distribution. Number of Number of Deaths Corps-Years 0 109 1 65 2 22 3 3 4 1 5 0 (ii) A sample of 100 women suffer from dysmenorrhea. A new analgesic is claimed to provide greater relief than a standard one. After using each analgesic in a crossover experiment, 40 reported greater relief with the standard analgesic and 60 reported greater relief with the new one. (iii) A study of homicides in a given year for a sample of cities might model the homicide rate, defined for a city as its number of homicides that year divided by its population size. We want to know how the rate depends on the city s unemployment rate, its residents median income, and the percentage of residents having completed high school. (iv) The dataset from Dalal, Fowlkes, and Hoadley contains data on O-rings on 23 U.S. space shuttle missions prior to the Challenger disaster of January 20, 1986. For each of the previous missions, the temperature at take-off and the pressure of a prelaunch test were recorded, along with the number of O-rings that filed out of six. Use these data to try to understand the probability of failure as a function of temperature, and of temperature and pressure. We want to estimate the probability of failure of an O-ring when the temperature was 31 F, the launch temperature on January 20, 1986. We also want to predict the the probability of failure of an O-ring when the temperature is 31 F, the launch temperature on January 20, 1988. 4

Some formula Large sample distribution of log of relative risk (r) log(ˆr) N[log( π 1 π 2 ),( 1 π 1 π 1 n 1 π 2 π 2 n 2 )] Large sample distribution of log of odds ratio (θ) log(ˆθ) N[log(θ),( 1 nπ 11 nπ 12 nπ 21 nπ 22 )] where n is the total sample size and π i j is the proportion of cell (i,j) in contingency table 5