Part Possible Score Base 5 5 MC Total 50

Similar documents
Statistics 100 Exam 2 March 8, 2017

STAT 350 Final (new Material) Review Problems Key Spring 2016

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Correlation and Regression (Excel 2007)

UNIVERSITY OF TORONTO Faculty of Arts and Science

Final Exam STAT On a Pareto chart, the frequency should be represented on the A) X-axis B) regression C) Y-axis D) none of the above

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Chapter 5: Writing Linear Equations Study Guide (REG)

STATISTICS 110/201 PRACTICE FINAL EXAM

In order to carry out a study on employees wages, a company collects information from its 500 employees 1 as follows:

Chapter 3 Multiple Regression Complete Example

Correlation & Simple Regression

Conditions for Regression Inference:

Simple Linear Regression for the Climate Data

Final Exam - Solutions

PLEASE MARK YOUR ANSWERS WITH AN X, not a circle! 1. (a) (b) (c) (d) (e) 2. (a) (b) (c) (d) (e) (a) (b) (c) (d) (e) 4. (a) (b) (c) (d) (e)...

Example 1: Dear Abby. Stat Camp for the MBA Program

MS&E 226: Small Data

This gives us an upper and lower bound that capture our population mean.

Midterm 2 - Solutions

Final Exam - Solutions

Business Statistics Midterm Exam Fall 2015 Russell. Please sign here to acknowledge

Stat 101 L: Laboratory 5

Business Statistics 41000: Homework # 5

AP Final Review II Exploring Data (20% 30%)

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions

Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions

Review of Multiple Regression

Turn to Section 4 of your answer sheet to answer the questions in this section.

STATISTICS 1 REVISION NOTES

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Algebra 1 Fall Semester Final Review Name

Inference for Regression

Ch Inference for Linear Regression

STAT 525 Fall Final exam. Tuesday December 14, 2010

Turn to Section 4 of your answer sheet to answer the questions in this section.

ISQS 5349 Final Exam, Spring 2017.

Math 2000 Practice Final Exam: Homework problems to review. Problem numbers

Unit 5. Linear equations and inequalities OUTLINE. Topic 13: Solving linear equations. Topic 14: Problem solving with slope triangles

STAT:2020 Probability and Statistics for Engineers Exam 2 Mock-up. 100 possible points

Section 2.5 from Precalculus was developed by OpenStax College, licensed by Rice University, and is available on the Connexions website.

Ch 13 & 14 - Regression Analysis

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Math: Question 1 A. 4 B. 5 C. 6 D. 7

MATH 2070 Test 3 (Sections , , & )

2014 Summer Review for Students Entering Geometry

Inferences for Regression

Swarthmore Honors Exam 2012: Statistics

4. The table shows the number of toll booths driven through compared to the cost of using a Toll Tag.

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Test 2C AP Statistics Name:

Exam 2 (KEY) July 20, 2009

The Purpose of Hypothesis Testing

You have 3 hours to complete the exam. Some questions are harder than others, so don t spend too long on any one question.

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

ANOVA - analysis of variance - used to compare the means of several populations.

Time: 1 hour 30 minutes

Be sure that your work gives a clear indication of reasoning. Use notation and terminology correctly.

Inference with Simple Regression

Ch. 1: Data and Distributions

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Business Statistics (BK/IBA) Tutorial 4 Full solutions

Soc 3811 Basic Social Statistics Second Midterm Exam Spring Your Name [50 points]: ID #: ANSWERS

20 Hypothesis Testing, Part I

ALGEBRA 1 FINAL EXAM 2006

appstats27.notebook April 06, 2017

MAT 111 Final Exam Fall 2013 Name: If solving graphically, sketch a graph and label the solution.

Chapter 3: Examining Relationships

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

1 Least Squares Estimation - multiple regression.

Block 3. Introduction to Regression Analysis

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

2. A music library has 200 songs. How many 5 song playlists can be constructed in which the order of the songs matters?

Math Review Sheet, Fall 2008

Page Points Score Total: 100

Algebra I Exam Review

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

1. A machine produces packets of sugar. The weights in grams of thirty packets chosen at random are shown below.

Machine Learning, Fall 2009: Midterm

SIMPLE REGRESSION ANALYSIS. Business Statistics

Statistics 191 Introduction to Regression Analysis and Applied Statistics Practice Exam

Practice Final Examination

Chapter 12 : Linear Correlation and Linear Regression

ALGEBRA 1 SEMESTER 1 INSTRUCTIONAL MATERIALS Courses: Algebra 1 S1 (#2201) and Foundations in Algebra 1 S1 (#7769)

Chapter 27 Summary Inferences for Regression

Business 320, Fall 1999, Final

Solutions exercises of Chapter 7

Midterm Exam Business Statistics Fall 2001 Russell

Mathematics for Economics MA course

Algebra Calculator Skills Inventory Solutions

Exercises from Chapter 3, Section 1

MINI LESSON. Lesson 2a Linear Functions and Applications

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Unit 5. Linear equations and inequalities OUTLINE. Topic 13: Solving linear equations. Topic 14: Problem solving with slope triangles

Ch 2: Simple Linear Regression

Chapter Systems of Equations

LI EAR REGRESSIO A D CORRELATIO

ISP 207L Supplementary Information

Transcription:

Stat 220 Final Exam December 16, 2004 Schafer NAME: ANDREW ID: Read This First: You have three hours to work on the exam. The other questions require you to work out answers to the questions; be sure to show your work clearly. No credit will be given for illegible work, even if the answers are correct. The number of points each part is worth is indicated in parentheses after the question. Work all answers out to two decimal places, when appropriate. If a problem is asking for a probability, provide the answer to two decimal places. Do not leave answers in choose notation, work them out to the end. No sharing of calculators or note sheets during the exam. PART ONE Part Possible Score Base 5 5 MC 12 1 6 2 12 3 10 4 5 Total 50 PART TWO Part Possible Score Base 11 11 Page 6 13 Page 7 9 Page 8 3 Page 9 5 Page 10 12 Page 11 16 Page 12 6 Page 13 14* Page 14 7 Page 15 7 Total 100 *Three of these points are extra credit. 1

PART ONE Multiple Choice Questions: For each question, circle the one correct letter. Each question is worth 3 points. 1. Suppose A and B are such that P(A) = P(B) = 0.30. What is the largest P(A or B) could be? (a) 0.00 (b) 0.30 (c) 0.60 (d) 1.00 2. Random variable X has the binomial(n = 5, p = 0.2) distribution. The probability that X equals one is closest to... (a) 0.20 (b) 0.40 (c) 0.60 (d) 0.80 (e) Cannot determine with the given information. density 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 0 200 400 600 800 1000 1200 CEO Salary (in thousands of dollars) Figure 1: Histogram showing the distribution of salaries in 1993 for CEOs of 59 top small businesses. (Source: Forbes Magazine) 3. Figure 1 depicts the distribution of the salary of the CEOs of 59 leading small businesses in 1993. The median salary in this group of 59 salaries is closest to... (a) $50,000 (b) $100,000 (c) $300,000 (d) $600,000 2

4. Again, refer to Figure 1. The sample mean is larger than the sample median for this set of data. This should not be surprising since... (a) the histogram is positively skewed. (b) the sample size is fairly large. (c) the range of the data is large. (d) CEOs are usually paid more than the other employees in their company. 1. A particular system on the space shuttle consists of two components, named A and B. The probability that component A fails during launch is 0.001, and the probability that component B fails during launch is 0.01. The probability that both fail during launch is 0.0005. Both components need to function correctly for the system to work. What is the probability the system works? (6) 2. The length of the life (measured in days) of a certain brand of light bulb is modeled well by the exponential(λ = 0.01) distribution. (a) A light bulb of this brand is chosen at random. What is the probability the length of its life is between 75 and 100 days? (5) (b) The light fixture in my living room requires four of these light bulbs. If I install four new ones at the same time, what is the probability that at least one will still be working after 50 days of use? Assume that the bulbs are independent of each other. (4) 3

(c) Suppose that a light bulb of this brand has been in use for 100 days, and it has yet to fail. What is the probability it lasts at least for 25 additional days? (3) I provide a solution here. If you think it is correct, then write This solution is correct. If you believe this solution is incorrect, then explain the mistake that is made. Let X = length of life for one light bulb. Then, P(lasts at least 25 more days has lasted 100 days) = P(X 125 X 100) = P(X 125 and X 100) P(X 100) = P(X 125) P(X 100) = exp( (0.01) (125)) exp( (0.01) (100)) 0.779 3. The company has two production facilities. The one in Pittsburgh can produce 120 Model A widgets and 50 Model B widgets in a day. The facility in Philadelphia can produce 100 Model A and 60 Model B widgets in a day. Suppose all 330 widgets produced by both factories in one day are brought together. (a) If two widgets are chosen (without replacement) at random from the group of 330 widgets, what is the probability there is exactly one of Model A and exactly one of Model B? (5) (b) One widget is chosen at random from the group of 330, and it is of Model A. What is the probability it was produced in the Pittsburgh factory? (5) 4

4. The probability density function (pdf) of random variable X is 0, x < 0 0.2, 0 x 1 f(x) = k, 1 < x 3 0, x > 3 What is the probability that X is between 0.5 and 1.5? (Your answer should not include k: figure out what k is.) (5) 5

PART TWO 1. Which of the following statements describes why the central limit theorem is so important in the application of statistical methods? (a) Provided that n is sufficiently large, it allows one to approximate the distribution of the sample mean (X) without needing to know the distribution of X 1, X 2,..., X n. (b) It allows for the construction of probability plots. (c) It shows that if X is a normal random variable with mean µ and variance σ 2, then (X µ)/σ has the standard normal distribution. (d) It allows one to approximate the probability that X is less than a, for any value a, without knowing the distribution of X. Questions 2 and 3 refer to the following situation: The Brand A bolts currently in use break, on average, when the weight they hold reaches 150 pounds. The Brand B bolt is being considered. Switching the brand of bolt will come at significant expense, but if there is evidence that Brand B bolts can withstand, on average, more than 150 pounds, the switch will be made. 2. What would be the type I Error in this case? (a) 0.05 (b) Deciding to stay with Brand A when Brand B has higher average breaking point than Brand A. (c) Deciding to stay with Brand A when Brand B has lower average breaking point than Brand A. (d) Deciding to switch to Brand B when Brand B does not have higher average breaking point than Brand A. (e) Using a sample size which is too large. 3. A sample of 45 Brand B bolts are chosen and tested. The average breaking point for the sample of bolts is 156.8 pounds, and the sample standard deviation is 20.9. Is there strong evidence that Brand B has higher average breaking point than Brand A? State the P-value, or choose an acceptable probability of type I error (i.e. choose α) and come to a conclusion. Either way, write out your conclusions in one sentence. Should the company switch brands? (7) 6

Questions 4, and 5 refer to the following situation: The production department predicted the customer demand for each of four colors of shirt. A sample of 100 sales are tracked, and the colors of the shirts sold are recorded. The following table shows the predicted proportions and color counts in the sample. Color Prediction Sample Count Red 0.30 25 Blue 0.25 28 Black 0.25 28 Yellow 0.20 19 4. First, suppose the goal is to see if there is strong evidence that the actual color distribution deviated from the predicted color distribution. In this situation we should... (a) calculate the mean squared error. (b) use a goodness-of-fit hypothesis test. (c) use a two-sample t test. (d) use an analysis of variance (ANOVA) procedure. 5. Next, suppose the goal is to estimate the proportion of all of the shirt sales that were red (ignoring the prediction). We know that the sample proportion of red shirts sold, denoted p, is approximately normal with mean p and (approximate) variance p (1 p)/n. Thus, ( ) p p P z α/2 z α/2 = 1 α. p(1 p)/n This can be rearranged to show that ( P p z α/2 p(1 p) n p p + z α/2 p(1 p) n ) = 1 α. Use this to form a 95% confidence interval for the unknown p, the proportion of all the shirt sales that were red. (6) 7

Questions 6 and 7 refer to the following situation: From Devore, Exercise 12.46: The article The Incorporation of Uranium and Silver by Hydrothermally Synthesized Galena (Econ. Geology, 1964: 1003-1024) reports on the determination of silver content of galena crystals grown in a closed hydrothermal system over a range of temperature. With x =crystallization temperature in degc and y = Ag 2 S in mol%. Figure 2 shows the scatter plot of Ag 2 S versus crystallization temperature. Note that there are n = 12 data pairs. (I removed one potential outlier from the data set.) Ag2S (mol %) 0.1 0.2 0.3 0.4 0.5 0.6 300 350 400 450 500 550 600 Crystallization Temperature (C) Figure 2: Scatter plot of Ag 2 S versus crystallization temperature for Questions 6 and 7. A linear model is fit to the data, i.e. the parameters β 0 and β 1 are estimated in the following equation. Y i = β 0 + β 1 X i + ɛ i Below is a portion of the computer output (using the statistical analysis package R ). Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -0.3973155 0.1367342-2.906 0.015678 (Slope) 0.0016663 0.0002883 5.780 0.000178 Finally, Figure 3 shows the plot of residuals versus fitted values for this case. 6. Does the linear model seem appropriate in this case? Why or why not? (a) Yes, since there is a linear pattern in both the scatter plot and the plot of residuals versus fitted values. (b) No, since there is not a linear pattern in the plot of residuals versus fitted values. (c) No, the estimate of β 1 is too small. 8

Residuals 0.10 0.05 0.00 0.05 0.10 0.15 0.1 0.2 0.3 0.4 0.5 0.6 Fitted Values Figure 3: Plot of residuals versus fitted values for Questions 6 and 7. (d) Yes, since there is a linear pattern in the scatter plot and there is no noticeable pattern in the plot of residuals versus fitted values. 7. Form a 95% confidence interval for β 1, the slope of the regression line. Assume that the model errors ɛ i are normally distributed with mean zero and variance σ 2. (5) 9

Questions 8, 9, and 10 refer to the following situation: Suppose that random variable X has the binomial(n = 10, p) distribution, with p unknown. The objective is to determine if there is strong evidence to claim p is larger than 0.4. So, we will test H 0 : p = 0.4 versus H a : p > 0.4. 8. What would the P-value for this test be if we were to observe a value of 8 for X? (6) 9. Suppose you decide that you will reject the null hypothesis only if you observe a value of either 9 or 10 for X. What is the probability of type I error? (3) 10. Again, suppose you will reject the null hypothesis only if you observe a value of either 9 or 10 for X. What is the probability of type II error if the true value of p is 0.5? (3) 10

Questions 11,12, 13, and 14 refer to the following situation: Read the following dialogue from a meeting between an engineer and her boss. Boss: What progress has been made in reducing the number of defective widgets produced on line ten? For the longest time, 10% of the widgets produced on that line have been bad. Engineer: I have implemented some improvements, and I am confident that the proportion of defectives has been reduced below 10%. Boss: That would be impressive. If you can provide me with strong evidence that the rate of defectives has been brought below 10%, I would give you a significant raise. Engineer: I will test a sample of widgets and see what proportion are defective. Boss: But I want strong proof. If there really is no improvement, I want the probability I give you a raise to be at most 5%. Engineer: I understand, you want the probability of type I error to be at most 5%. Boss: Yes, and keep in mind that your tests will cost money. In order to test a widget, it must be dismantled and inspected. It can no longer be used after that. Don t test a large sample, we can t afford it. Engineer: I know, but I want a reasonable chance of getting the raise. You want to avoid a type I error, but I want to avoid a type II error, which in this case would occur if I really did improve the rate of defectives, but you do not give me the raise. The engineer leaves and prepares a graph, shown as Figure 4. Engineer: Here I have a plot showing the probability of type II error as a function of sample size (n) and the true proportion of defectives (p). 11. Again, let p denote the true proportion of defectives. What hypothesis test is being discussed here? (a) H 0 : p = 0.10 versus H a : p < 0.10. (b) H 0 : p = 0.10 versus H a : p > 0.10. (c) H 0 : p = 0.05 versus H a : p < 0.05. (d) H 0 : p = 0.05 versus H a : p > 0.05. (e) None of the above. 12. There are three curves on Figure 4, one represents the case where p = 0.03, one where p = 0.05, and one where p = 0.09. Complete the labels on the plot. (5) 13. Suppose the boss agrees that the probability of type II error should be at most 0.20 if, in fact, p = 0.05. How large of a sample should the engineer test? (Approximate the answer using the graph.) (5) 14. The engineer believes that her boss is being unfair. She says, Using that sample size, the probability I get the raise is around 10% if, in fact, I reduced the proportion of defectives to 9%. The boss replies, I know that if I choose large enough, we can get the probability of type II error when p = 0.09 to be small, but not only would that cost us a lot of money, but also in terms p = 0.09 is not a large enough improvement for you to merit a raise. Choose the two terms which correctly completes his reply. (a) α / statistical (b) β(0.09) / objective (c) the sample size / practical (d) the size of your raise / theoretical 11

Probability of Type II Error 0.0 0.2 0.4 0.6 0.8 1.0 100 200 300 400 500 sample size (n) Figure 4: Plot of probability of type II error, as a function of sample size (n) and proportion of defective (p). Used in Questions 12, 13, and 14. 15. Suppose that X 1, X 2,...,X n are a sequence of independent random variables, each with the same distribution. What does the Law of Averages state? (a) As the sample size gets larger, the sample total T 0 will get closer and closer to the expected value of the population total. (b) As the sample size gets larger, the variance of the sample mean gets closer and closer to one. (c) As the sample size (n) gets larger, the sample mean X will get closer and closer to the population mean. (d) As the sample size gets larger, the sample standard deviation (s) will get smaller. 16. Random variables X 1 and X 2 are independent, both are normally distributed, and each has mean 12.0 and variance 4.0. The distribution of X 1 + X 2 2 is which of the following? (a) Normal with mean 24.0 and variance 2.0. (b) Normal with mean 24.0 and variance 4.0. (c) Normal with mean 12.0 and variance 2.0. (d) Standard normal (e) Cannot determine from the given information. 12

Questions 17, 18, 19 and 20 refer to the following situation: You desire to weigh an object, and have two scales to choose from. The scales are not exact, i.e. they do not return the exact weight of the object placed upon it. If the object has weight θ, then the weight returned by Scale One is a normal random variable with mean θ and variance 1 mg 2. The weight given by Scale Two is a normal random variable with mean θ and variance 4 mg 2. Label the weight given by Scale One as X 1, and the weight given by Scale Two as X 2. Estimate the weight of the object by θ = ax 1 + (1 a)x 2 where a is a number between 0 and 1. Assume X 1 and X 2 are independent. 17. Show that, for any value a, θ is an unbiased estimator of θ. (4) 18. What is the variance of θ? (It will depend on a, simplify the expression as much as possible.) (4) 19. Figure 5 shows the mean squared error (MSE) of θ as a function of a. What would be the best choice for a? (Approximate it using the plot.)(3) 20. EXTRA CREDIT: Determine the best choice for a exactly. (You need to show the derivation.) (3) 13

MSE of theta hat 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0.0 0.2 0.4 0.6 0.8 1.0 a Figure 5: Plot of MSE of θ as a function of a. Used in Question 19. Questions 21 and 22 refer to the following situation: Exercise 7.13 in Devore: The article Extravisual Damage Detection? Defining the Standard Normal Tree (Photogrammetric Engr. and Remote Sensing, 1981: 515-522) discusses the use of color infrared photography in identification of normal trees in Douglas fir stands. Among data report were summary statistics for green-filter analytic optical densitometric measurements on samples of both healthy and diseased trees. For a sample of 69 healthy trees, the sample mean dye-layer density was 1.028, and the sample standard deviation was 0.163. 21. Calculate a 99% confidence interval for the true average dye-layer density for all such trees. (7) 14

22. Suppose the investigators had made a rough guess of 0.16 for the value of σ before collecting the data. What sample size would be necessary to obtain an interval width of 0.05 for a confidence level of 99%? (4) 23. In a wide range of situations, the width of the confidence interval is equal to k/ n where k is a constant whose value depends on the situation. For example, in the Question 22 above the width of the confidence interval is 0.16 2 2 (z.005 ) n, so that would mean k = 2(z.005 )(0.16). Now complete this statement: If the width of the confidence interval can be written k/ n (which it can be in many situations), then if I want to cut the width of the confidence interval in half, I need to multiply my sample size n by. (3) 15