STA220H1F Term Test Oct 26, Last Name: First Name: Student #: TA s Name: or Tutorial Room:

STA0HF Term Test Oct 6, 005 Last Name: First Name: Student #: TA s Name: or Tutorial Room: Time allowed: hour and 45 minutes. Aids: one sided handwritten aid sheet + non-programmable calculator Statistical table(s) attached at the end. Please do not detach them. Please check that you have all the consecutively numbered pages of this test Best marks go to best answers, as a general rule. Where explanations are required, you may see a guideline regarding the number of words expected in your answer; < 0 words means that you should use fewer than 0 words to answer and that fewer than 0 words are required to give an excellent, complete and concise answer; anything more will be considered a sign of confusion, and may cause a loss of marks. This test may be long, so be sure to proportion your time carefully among the questions and limit your time spent on any single question. Show your work and answer in the space provided, in ink. Pencil may be used, but a sample of tests will be copied and be used in case of remark. Use back of pages for rough work. Marks are shown in brackets at the end of each question. Good luck!! Question 4 5 6 7 8 Total Max 00 Score

Question Summery statistics and a stemplot of the scores of seventh grade students in a rural Midwestern school on a psychological test designed to measure children s Self Concept are given below. Stem-and-leaf of Self Concept N = Leaf Unit =.0 8 5 579 6 4 4 8 4 69 5 444 5 5 5668 (0) 6 0000444 7 6 56799 7 0 Descriptive Statistics: Self Concept Variable N Mean Median TrMean StDev Self Concept 55.69 xxxx 56.8.5 Variable Minimum Maximum Q Q Self Concept xxxx 7.00 50.5 xxxx a) Describe the shape of the distribution of the test scores. [] The distribution is left (negatively) skewed. b) Give the 5-number summary for these data and draw boxplot. [4] Min =, Q = 50.5, Median = 60, Q = 64 Max = 7 ------------------------------- --------- --------------- ----------------------------------------------------------------------------------------------------- 0 0 0 40 50 60 70 80 90 00

c) Would providing the mean and standard deviation be as good set of descriptive as the 5 number summary? Why? [] No, mean and stdev are adequate for bell-shaped (and most symmetric) distributions, but do not completely describe skewed distributions. d) Draw below an approximate picture of the normal quantile (probability) plot for the data. Label both axes and show at least two numbers on each axis. [] e) Suppose the scores were rescaled such that each score is now 0% higher. What would be the mean, standard deviation and first quartile of the rescaled scores? [] This is a linear transformation: x new =. x. So the mean, stdev and Q of the rescaled scores are: x new =. 55.69 = 66. 88, s new =..5 = 5. 06, Q =. 50.5 60. = f) If you were to replace the scores by their logarithms, is there anything you can safely predict about the resulting distribution? [] The distribution would be even more left skewed. g) Upon checking, one test score of 9 was entered erroneously - the correct mark was actually 4. Which of the following measures will change when you correct the error? Circle them. [] Mean** Median StDev** Q Q

Question Using your normal tables, answer the following: a) The length of a snow storm in northern Ontario is known to have a normal distribution with mean hours and standard deviation hours. A storm that last more then 48 hours is recorded as a severe one. What is the proportion of severe storms? [4] P 48 ( X > 48 ) = P Z > = P( Z >.) =.908 =. 098 b) Determine the length of a storm such that 5% of the storms are longer then it. [4] x x P ( X < x) = P Z < = 0.95 =.65( or.64) x = 5. 8 hours Question A study of sleeping disorder and smoking classifies subjects as nonsmokers, moderate smokers, or heavy smokers. Samples of 90 men and 90 women are drawn from each group. Each person reports the number of hours of sleep he or she gets on a typical night. a) What are the explanatory and response variables? [] Explanatory: Smoking habits Response: Number of hours of sleep on typical night b) Explain carefully why this is or is not an experiment. (<0 words) [] It is not an experiment because the study does not impose treatments on the subjects. It is purely observational in nature. c) Do you think this study will show that smoking habits causes sleeping disorder? Explain your answer carefully like you were talking to someone who knows no statistics. (<60 words) [] No. Since patients were not randomly assigned to smoking habits, differences in results may be due to confounded factors. E.g. gender, physical condition, eating habits and exercising habits. 4

Question 4 Common sources of caffeine are coffee and tea. Suppose that 55% of the students at U of T drink coffee, 0% drink tea and 0% drink both. a) If we randomly choose one of U of T students, what is the probability that this student does not drink neither coffee nor tea. [] P(coffee or tea) =.55 +. -. =.75 P( neither coffee nor tea) = - P(coffee or tea) = -.75 = 0.5 b) If we randomly choose two students, find the probability that at least one of this students drink only tea. [] P(a student drinks only tea) =. -.=. P( at least drinks only tea ) = - P( none of them drink only tea) = - (0.8) = 0.6 (or add three probabilities) c) If we randomly choose three students, find the probability that exactly one of these three does not drink any of these beverages. [] P(exactly one student doesn t drinks tea or coffee) = 0.5 (0.75). d) In addition to the information given above, suppose now that 70% of the students who drink coffee are females while only 5% of those who doesn t drink coffee are females. If we randomly select one student, what is the probability of selecting a female? [] Coff 0.55 Female 0.7 NonCoff 0.45 Female 0.5 ( female) = Pr( Coffee) Pr( female Coffee) + Pr( noncoffee ) Pr( femal noncoffee ) Pr = 0.55 0.7 + 0.45 0.5 = 0.545 e) Continuing from above (part d), suppose that our randomly selected student turns out to be a female. What is the probability that she doesn t drink coffee? [] ( noncoffee female ) ( noncoffee and female ) Pr( female ) Pr 0.45 0.5 Pr = = = 0.90 0.545 5

Question 5 The GPA of students is thought to be related to their IO and self-concept. Higher self concept scores (based on standard test) indicate more positive self-concept. For a sample of 78 seventh grade students the following data was obtained: OBS GPA IQ SelfConcept 7.940 67 8.9 07 4 4.64 00 5 4 7.470 07 66 5 8.88 4 58... 78 6.98 06 56 Below are some graphs and Minitab outputs. Correlations: GPA IQ IQ 0.697 Self-Concept 0.56 0.49 Cell Contents: Pearson correlation 0 9 8 GPA 7 6 5 4 0 0 40 50 60 SelfConcept 70 80 6

0 9 8 GPA 7 6 5 4 70 80 90 00 0 0 0 40 IQ Regression Analysis: GPA versus IQ The regression equation is GPA = -.5 + 0.099 IQ Predictor Coef SE Coef T P Constant -.54.84 -.5 0.0 IQ 0.0995 0.07 8.47 0.000 S =.5 R-Sq = omitted R-Sq(adj) = 47.9% Analysis of Variance Source DF SS MS F P Regression.. 7.75 0.000 Residual Error 76 9.09.8 Total 77 70.40 Unusual Observations Obs 8 IQ 97 GPA.4 Fit 6.64 SE Fit 0.07 Residual -.95 St Resid -.96R 9 06 4.000 7.56 0.57 -.56 -.4R 54 7 7.95.885 0.459.40.68RX 58 79 5.06 4.579 0.8 0.48 0.7X 6 74 5.7 4.08 0.47.54 0.90X 70 77 4.885 4.8 0.404 0.504 0.9X R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence. 7

Regression Analysis: GPA versus SelfConcept The regression equation is GPA =.0 + 0.0794 SelfConcept Predictor Coef SE Coef T P Constant.0 0.8584.5 0.00 SelfConc 0.0794 0.047 5.9 0.000 S =.604 R-Sq = 7.7% R-Sq(adj) = 6.7% Analysis of Variance Source DF SS MS F P Regression 74.86 74.86 9.07 0.000 Residual Error 76 95.585.57 Total 77 70.40 Unusual Observations Obs SelfConc GPA Fit SE Fit Residual St Resid 8 5.0.4 7.07 0.0-4.660 -.9R 0.0 5.0 4.60 0.574 0.60 0.4X 46 5.0.647 7.5 0.96 -.505 -.0R 47 46.0.408 6.675 0.4 -.67 -.06R 48 66.0.96 8.64 0.5-4.8 -.7R R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence. Use the above outputs to answer the following questions. a) Which of the two variables (IQ and Self-Concept) is a better predictor of GPA? Why? [] IQ, since it has a higher correlation with GPA b) What percentage of the variation in GPA is explained by the linear regression on Self-concept? [] r = (0.54) = 0.94 = 9.4% c) How will the GPA change if the IQ score will: i. increase by 0 points? [] They increase by 0(.04) =.04 points ii. decrease by 7 points? [] They decrease by 7(.04) = 0.78 points 8

d) i. Predict the GPA of someone whose Self-concept score is 0. [] GPA =.9 + 0.090 (95) = 0.9 =. ii. Predict the GPA of someone whose Self-concept is 65. [] GPA =.9 + 0.090 (65) = 8.7 iii. How credible or trustworthy are your two predictions above? Explain. [] The first is less trustworthy, since it is an extrapolation. e) Attempting to make sense of the results, Ana came to the conclusion that high GPA is the result of a high IQ quotient. Do you agree? Defend your argument. (<0 words)[] f) For the regression of GPA on Self-concept explain what makes the 48 th student in the list somewhat unusual (do not use very technical terms like residual or deviation ) [] g) Here are the residual plots for the regression of GPA on IQ examine them and answer the questions below Residual Model Diagnostics Normal Plot of Residuals I Chart of Residuals Residual 0 - - - Residual 0 - - - 6 5 UCL=.58 Mean=0.00669 LCL=-.580 -.5 -.0-.5 -.0-0.5 0.0 0.5.0.5.0.5 Normal Score 0 0 0 0 40 50 60 70 Observ ation Number 80 5 Histogram of Residuals Residuals vs. Fits Frequency 0 5 0 - - - 0 Residual Residual 0 - - - 4 5 6 7 Fit 8 9 0 9

i. What do you learn from inspection of the above 4 plots? Are there any major problems with the model assumptions? [] There appear to be two different patterns, for males and for females, suggesting we cannot use just one prediction equation regardless of gender. The dispersion appears to be greater for those with low IQ than for those with higher IQ. ii. Describe the distribution of the residuals (shape, center, spread). [] Centre at zero. They are skewed left (negatively) h) Base on the graph below, will the fitted model overestimate or underestimat the GPA of students that have IQ below 80? [] Residuals 0 - - - 70 80 90 00 0 0 0 40 IQ We are underestimating since the residuals are positive for those students. 0

Residuals 0 - - - 70 80 90 00 0 0 0 40 IQ Question 6 A marketing experiment compares four different types of packaging for computer disks (denoted by A, B, C and D). Each type of packaging was presented in three different colors (Blue, Red and Orange). Each combination of package type with a particular color is shown to 40 potential customers, who rate the attractiveness of the product on a to 0 scale. a) Identify clearly and explicitly: i. The experimental units [] Computer disks (or packages) ii. The factors and the levels of each factor [] Type of package: levels: A, B, C and D Colour: levels: Blue, Red and Orange iii. The treatments. List all of them. [] Treatments are: (A, Blue), (A, Red), (A, Orange) (B, Blue), (B, Red), (B, Orange) (C, Blue), (C, Red), (C, Orange) (D, Blue), (D, Red), (D, Orange) iv. The response variable [] The attractiveness of the product.

b) Is this a randomized block design or a completely randomized design? [] Randomized block design (if each of the 40 customers views all treatment combinations). [CRD if different groups of 40 view each treatment] Question 7 Suppose that we want to select a sample of students from your STA0 class (69 students in total). Answer the following questions, where random below means by using the Random Number tables the customary way. a) If we assign each student in the classroom a number from 00 to 69, and then use a RN table to pick two distinct random numbers from 00 69, and then take the corresponding students. i. What do we call this type of sample? [] Simple random sample ii. If you start on line 50 what numbers those students will carry? [] 75 and 8 b) If we randomly select 40 students from the night section, then 0 at random from the morning section and finally 60 at random from the afternoon section, what type of sampling design is this? [] stratified sample c) Suppose that we order the students in attendance in some clear simple fashion say left seat to right seat, bottom row to top row and then select every 0 th student, after drawing an appropriate random number. What do we call this type of sampling design? [] systematic sample d) If we randomly select 8 rows in the classroom, then students randomly from each selected row, what do we call this type of sampling design? [] multi-stage sample (or two-stage sample; or cluster sample with subsampling)

Question 8 - Miscellanous a) Ty got 90 on his STA0 final exam where the class average was 75 and the standard deviation was 5 while in PSY6 he got 78 and the class average was 6 with a standard deviation 9. In which course did he do better (assuming similar shapes for both test frequency distribution)? sta0: z = 90-75 / 5 = + psy6: z = 78-6 / 9 = +.67 Better on psy test, since relative standing is better. b) Below is a scatter plot showing data on two variables, V and W V W Suppose we are interested in predicting W from V. i) On the plot above, draw in at least two bars showing exactly what is minimized (after squaring and summing) should we fit a least-square line to predict W from V. ii) Suppose the equation on the line given in the above plot is V = +.5W. The equation of the line to predict W from V is: W = + V. True / False? (circle one) False

c) 000 households are randomly selected from a list of all households for a survey. There is a 50% non-response rate. To compensate, the investigator select an additional 000 households randomly from the list (from which we will get about another 500 responses). i) Adding the additional 000 households will reduce the bias. True / False? (circle one) False ii) Adding the additional 000 households will reduce sampling variation. True / False? (circle one) True d) For the following 0 numbers, calculate the 0% trimmed mean: 5 6 0 7 4 6 6 Ave of middle four: = ave of {4 5 6 6 } = / 4 = 5.5 e) To compare the distribution of test scores for male and female, it would be most appropriate to compare: histograms, scatterplots, bar charts, pie charts, dotplots, boxplots, stem-and-leaf plots. (Circle all that apply) [] histograms, dotplots, boxplots, stem-and-leaf plots. f) Two types of eye drugs, type A and type B, are to be compared, for curing an infection. The response variable will be an appropriate measure of cure by examining the eye after certain amount of drug usages. 40 girls are available for the study. Each child will be given both drugs. What special instruction will you give the experimenter? (<0 words) Use A for one eye, and B for the other eye. Flip a coin to decide which eye gets A for each dog (or randomly select 0 dogs; they get A in left eye; the others get A in right eye) 4