ISQS 5347 Final Exam Spring 2017 Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you. 1. Recall the commute time example in class. Let Y1,, Yn be the commute times of next semester s students (Fall 2017). A. (5) Write down a good model for how these data will appear. Solution: Y1,, Yn ~iid p(y) B. (10) Explain why your model of A. is good, using the definition in the book of a good model, but also state ways in which it might not be so good. Solution: This model is good because it produces data that look like what we will actually observe; namely, variable data. The model predicts that one student will give a number different from another, which is what we will actually observe next semester. The independence assumption could be questioned, though. If there are married couples who take the class, and if they come to class together, obviously their times cannot be considered independent of one another. 2. (10) Refer to the commute time example of problem 1. Use the definition of independence given in the book to explain why a student s commute time (Y) and that student s distance in miles from the university (X) are not independent. Solution: X and Y are independent if p(y X = x1) = p(y X = x2), for all x1 x2. Here, let x1 = 0.5 miles and x2 = 10 miles. The conditional distribution of commute times for people living 10 miles from campus will clearly be shifted to the right of the distribution of commute times for people living 0.5 miles from campus. So p(y X = 0.5) p(y X = 10), implying that X and Y are not independent. 3. (10) Suppose the distribution of a random variable Y is continuous and right-skewed (i.e., the skewness is greater than zero). Draw and annotate (i.e., write on) a graph of the distribution of Y that shows why there is a 90% chance that Y will be between the 0.05 quantile and the 0.95 quantile.
Solution: You should draw on the graph indicating that there is 0.05 area to the left of the red line, and indicating that there is 0.05 area to the right of the blue line. Thus there is 0.90 area between the red and blue lines, because the total area under the curve is 1.0.
4. (10) Suppose the estimates ˆ θ1 and ˆ θ2 are unbiased estimators of θ. Show that ˆ θ = 0.2 ˆ θ + 0.8 ˆ θ is also an unbiased estimator of θ. Justify every step carefully; in other 1 words, provide the reasons. Solution: 2 E ( ˆ) θ = E(0.2 ˆ θ 0.8 ˆ 1 + θ2 ) (by substitution) = E ( 0.2 ˆ θ ) (0.8 ˆ 1 + E θ2 ) (by the additivity property of expectation) = 0.2E ( ˆ θ ) 0.8 ( ˆ 1 + E θ2 ) (by the linearity property of expectation) = 0.2θ + 0. 8θ (because both θ1 and θ2 are unbiased estimators of θ, as given in the problem statement) = θ (by algebra) Hence, E ( ˆ) θ = θ, implying by that θˆ is an unbiased estimator of θ. 5. Consider the following R commands and the output. > Y = rnorm(7, 81, 12) > mean(y) [1] 85.22049 > sd(y) [1] 7.782028 Fill in the blanks: (five points each) Solution: A. The data Y1,, Yn are produced independently by a normal distribution. B. The sample size is: n = 7 C. The process mean is: µ = 81 D. The process standard deviation is: σ = 12 E. The sample average is: y = 85.22049 F. The sample standard deviation is: σˆ = 7.782028
6. Suppose you will sample Y1,, Yn, independent and identically distributed from a distribution p(y) whose mean is µ. You decide to estimate µ using Y1, the first observation that you sample, ignoring all other data. A. (5) Use the definition of unbiasedness given in the book to show that Y1 is an unbiased estimator of µ. Solution: Since it states that Y1 is from a distribution whose mean is µ, it follows that E(Y1) = µ. Hence Y1 is an unbiased estimator of µ, by definition. B. (5) Use the definition of consistency given in the book to show that Y1 is an inconsistent estimator of µ. Solution. The value of Y1 does not change, regardless of the sample size n. And Y1 is random, and so does not become closer to µ as n gets larger. To be consistent, Y1 would have to get closer to µ as n increases, by definition. 7. (10) Think for a second: When did your child, or your friend s or relative s child, or your niece, nephew, etc., speak their first word? (Don t answer that question, just think about it.) Now, consider the parameter θ = average age (in years) where children speak their first word. Obviously, you do not know this precise number, but you should have some prior ideas about it based on your own direct observations. Draw a graph of your prior distribution, p(θ), labeling axes carefully, and explain why you drew your prior the way that you did. Solution. Somewhere around 1 year, maybe a little younger? Not quite sure. Here is my guess:
You might argue that some people are developmentally challenged and therefore take longer to learn their first words. But this misses the point that the prior is for the mean age at first word, not for an individual s age. 8. (5) Suppose a posterior distribution is p(θ data) = c f (θ), for 0 < θ < 1. What is c? Solution: c is the reciprocal of the area under the curve f (θ) between 0 and 1.
9. Suppose that life expectancy (Y, in years from inception) for a restaurant has probability density function p(y). A. (5) Draw a graph of how p(y) might look, labelling axes. (Hint: It s not a normal distribution.) Solution: Something like this:
B. (5) In terms of your graph in A., what is the median life expectancy? Annotate (draw on) your graph to illustrate. Solution: You should locate a point around 4 (your number might vary, depending on your graph) on the horizontal axis and annotate the graph showing that 50% of the area under the curve is to the left of 4 and 50% of the area is to the right. C. (5) In terms of your graph of A., what is the mean life expectancy? Annotate (draw on) your graph to illustrate. Solution: You should locate a point around 5 (your number might vary, depending on your graph) on the horizontal axis, and note that the whole graph balances atop that point. In other words, it will tip over to the right for a value less than 5 and it will tip over to the left for a value more than 5. 10. (10) I made a big deal about the population definition of the probability model p(y). Suppose the population is All TTU students and Y is commute time. Give the population probability model p(y) in list form. As part of your answer, assume that there are N = 35,000 students at TTU. Solution: y p(y) y1 1/35,000 y2 1/35,000 y35,000 1/35,000 Here, the individual yi values are the commute times for each TTU student. Note: The repeats on the yi need to be collated and the probabilities summed. For example, if 5,000 students claimed 15 minutes, then the probability for the outcome 15 in the y column will be 5,000/35,000.
11. (10) A firms ROA is determined every year, giving data y1, y2,, yn, where y1 is the ROA in the first year of the study, and yn is the ROA in the last year of the study. You can check the assumption of independence by constructing and analyzing a certain scatterplot. Describe the scatterplot, and explain how you will check for independence using it. Draw a scatterplot or scatterplots as part of your answer. Solution: Graph the scatterplot of (yi-1, yi); that is, put the lag of the observation on the horizontal axis and the observation itself on the vertical axis. If the observations are independent, then there should be no apparent trend; if there is a clear trend, then there is clear evidence of dependence. Two examples:
Lag scatterplot showing dependence.
Lag scatterplot consistent with independence. 12. (15) In the HW where you compared the Spring 2016 commute times with the Spring 2017 commute times, there were n1 = 14 students in Spring 2016 and n2 = 19 students in Spring 2017. The two-sample t statistic was T = 1.3157 and the p-value was 0.1979. Describe, step-by-step, a simulation study that will give you the precise p-value 0.1979, if the simulation were performed infinitely many times. Don t write R code or any other
code, just write the instructions so it is perfectly clear how to perform the simulation using any software. Solution: First, simulate 14 observations in class1 and 19 observations from class2, both from the same normal distribution, e.g., from the N(20, 10 2 ) distribution. Second, calculate the T-statistic from the simulated data and note whether it is greater than 1.3157. Third, repeat steps 1 and 2 many times. Fourth, the p-value 0.1979 is 2*(the proportion of simulated data sets giving a T-statistic greater than 1.3157), at least with an infinite number of simulations. The next problems are all 5 points each. Choose one answer only for each multiple choice problem. 13. { is to real data} as {simulation model is to simulated data} A. average B. standard deviation C. probability D. nature 14. The histogram of a data set shows that it is strongly skewed. If the mean of the data is 10 and the standard deviation is 2, then A. 75% of the data are between 6 and 14. B. 95% of the data are between 6 and 14. C. At least 75% of the data are between 6 and 14. D. At least 95% of the data are between 6 and 14. 15. Why is it better to simulate 100,000 values from the posterior distribution p(θ data) it is to simulate 1,000 values? A. Because a larger sample size is needed for the Central Limit Theorem to take hold B. Because you will get a more accurate estimate of p(θ data) C. Because Bayesian analysis is only valid when sample size > 1,000 D. Because your credible interval will have a higher confidence level 16. If Y is calculated from an iid sample from a distribution whose mean is µ, then E{ ( ) 2 Y } A. = µ 2 Β. > µ 2 C. < µ 2
17. What do you get when you run the following R code? 0.5 numbers = c(4, 4, 5, 1, 1, 2) mean(numbers < 3) 18. The R code is: try = 1:3 What is in the R object called try? 1 2 3 19. What does t.test(data) give you, among other things? A. 95% confidence limits for µ B. 90% confidence limits for µ C. 10% confidence limits for µ D. 5% confidence limits for µ 20. The null distribution of the likelihood ratio test statistic is A. approximately F B. approximately T C. approximately chisquared D. approximately normal 21. The unrestricted (full) model is Y (X1, X2, X3) = (x1, x2, x3) ~ N(β0 + β1x1 + β2x2 + β3x3, σ 2 ). The restricted (null) model is Y (X1, X2, X3) = (x1, x2, x3) ~ N(β0, σ 2 ). The degrees of freedom for the likelihood ratio test statistic are df = 3. 22. What happens when you select a larger sample size n for your study? (Choose one). A. Your standard error will decrease B. Your standard deviation will decrease C. Your data Y will become closer to normally distributed D. Your 95% confidence interval will become closer to a 100% confidence interval