STA220H1F Term Test Oct 26, Last Name: First Name: Student #: TA s Name: or Tutorial Room:

Similar documents
The empirical ( ) rule

Practice Questions for Exam 1

1 Introduction to Minitab

INFERENCE FOR REGRESSION

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

This document contains 3 sets of practice problems.

AP Final Review II Exploring Data (20% 30%)

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

Stat 101 Exam 1 Important Formulas and Concepts 1

MATH 1150 Chapter 2 Notation and Terminology

Sampling, Frequency Distributions, and Graphs (12.1)

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

SMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each)

Conditions for Regression Inference:

M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

1. An article on peanut butter in Consumer reports reported the following scores for various brands

Multiple Regression Examples

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010

STAT 200 Chapter 1 Looking at Data - Distributions

Confidence Interval for the mean response

Correlation & Simple Regression

AP Statistics Semester I Examination Section I Questions 1-30 Spend approximately 60 minutes on this part of the exam.

SMAM 314 Practice Final Examination Winter 2003

Describing distributions with numbers

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

Models with qualitative explanatory variables p216

Final Exam - Solutions

LC OL - Statistics. Types of Data

Elementary Statistics

Continuous random variables

Model Building Chap 5 p251

Sociology 6Z03 Review I

are the objects described by a set of data. They may be people, animals or things.

MULTIPLE REGRESSION METHODS

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

Statistics 528: Homework 2 Solutions

Multiple Choice Circle the letter corresponding to the best answer for each of the problems below (4 pts each)

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice

PS2.1 & 2.2: Linear Correlations PS2: Bivariate Statistics

Data 1 Assessment Calculator allowed for all questions

Analysis of Bivariate Data

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY HIGHER CERTIFICATE IN STATISTICS, Paper III : Statistical Applications and Practice

DE CHAZAL DU MEE BUSINESS SCHOOL AUGUST 2003 MOCK EXAMINATIONS IOP 201-Q (INDUSTRIAL PSYCHOLOGICAL RESEARCH)

TOPIC: Descriptive Statistics Single Variable

QUIZ 1 (CHAPTERS 1-4) SOLUTIONS MATH 119 FALL 2012 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Test 1 Review. Review. Cathy Poliak, Ph.D. Office in Fleming 11c (Department Reveiw of Mathematics University of Houston Exam 1)

Chapter2 Description of samples and populations. 2.1 Introduction.

UCLA STAT 10 Statistical Reasoning - Midterm Review Solutions Observational Studies, Designed Experiments & Surveys

STAB27-Winter Term test February 18,2006. There are 14 pages including this page. Please check to see you have all the pages.

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

Data 1 Assessment Calculator allowed for all questions

appstats8.notebook October 11, 2016

Unit 1: Statistics and Probability (Calculator)

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Statistics lecture 3. Bell-Shaped Curves and Other Shapes

Units. Exploratory Data Analysis. Variables. Student Data

28. SIMPLE LINEAR REGRESSION III

Pre-Calculus Multiple Choice Questions - Chapter S8

Data Presentation. Naureen Ghani. May 4, 2018

Analysing data: regression and correlation S6 and S7

Swarthmore Honors Exam 2012: Statistics

STATISTICS/MATH /1760 SHANNON MYERS

The following formulas related to this topic are provided on the formula sheet:

University of California, Berkeley, Statistics 131A: Statistical Inference for the Social and Life Sciences. Michael Lugo, Spring 2012

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Statistics 5100 Spring 2018 Exam 1

Advanced/Advanced Subsidiary. You must have: Mathematical Formulae and Statistical Tables (Blue)

PhysicsAndMathsTutor.com. Advanced/Advanced Subsidiary. You must have: Mathematical Formulae and Statistical Tables (Blue)

Final Exam STAT On a Pareto chart, the frequency should be represented on the A) X-axis B) regression C) Y-axis D) none of the above

Final Exam - Solutions

Final Exam Bus 320 Spring 2000 Russell

Unit 6 - Introduction to linear regression

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

M & M Project. Think! Crunch those numbers! Answer!

Chapter 3: Examining Relationships Review Sheet

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Business 320, Fall 1999, Final

Introduction to Regression

Probability Distributions

SMAM 314 Exam 42 Name

Describing distributions with numbers

Ch. 3 Review - LSRL AP Stats

Descriptive Statistics and Probability Test Review Test on May 4/5

Sociology 593 Exam 2 March 28, 2002

Chapter 5: Exploring Data: Distributions Lesson Plan

General Certificate of Secondary Education Higher Tier June 2014

Examination paper for TMA4255 Applied statistics

Advanced/Advanced Subsidiary. You must have: Mathematical Formulae and Statistical Tables (Blue)

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

PhysicsAndMathsTutor.com. International Advanced Level Statistics S1 Advanced/Advanced Subsidiary

Tribhuvan University Institute of Science and Technology 2065

The Model Building Process Part I: Checking Model Assumptions Best Practice

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Chapter 2: Tools for Exploring Univariate Data

Sets and Set notation. Algebra 2 Unit 8 Notes

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Transcription:

STA0HF Term Test Oct 6, 005 Last Name: First Name: Student #: TA s Name: or Tutorial Room: Time allowed: hour and 45 minutes. Aids: one sided handwritten aid sheet + non-programmable calculator Statistical table(s) attached at the end. Please do not detach them. Please check that you have all the consecutively numbered pages of this test Best marks go to best answers, as a general rule. Where explanations are required, you may see a guideline regarding the number of words expected in your answer; < 0 words means that you should use fewer than 0 words to answer and that fewer than 0 words are required to give an excellent, complete and concise answer; anything more will be considered a sign of confusion, and may cause a loss of marks. This test may be long, so be sure to proportion your time carefully among the questions and limit your time spent on any single question. Show your work and answer in the space provided, in ink. Pencil may be used, but a sample of tests will be copied and be used in case of remark. Use back of pages for rough work. Marks are shown in brackets at the end of each question. Good luck!! Question 4 5 6 7 8 Total Max 00 Score

Question Summery statistics and a stemplot of the scores of seventh grade students in a rural Midwestern school on a psychological test designed to measure children s Self Concept are given below. Stem-and-leaf of Self Concept N = Leaf Unit =.0 8 5 579 6 4 4 8 4 69 5 444 5 5 5668 (0) 6 0000444 7 6 56799 7 0 Descriptive Statistics: Self Concept Variable N Mean Median TrMean StDev Self Concept 55.69 xxxx 56.8.5 Variable Minimum Maximum Q Q Self Concept xxxx 7.00 50.5 xxxx a) Describe the shape of the distribution of the test scores. [] The distribution is left (negatively) skewed. b) Give the 5-number summary for these data and draw boxplot. [4] Min =, Q = 50.5, Median = 60, Q = 64 Max = 7 ------------------------------- --------- --------------- ----------------------------------------------------------------------------------------------------- 0 0 0 40 50 60 70 80 90 00

c) Would providing the mean and standard deviation be as good set of descriptive as the 5 number summary? Why? [] No, mean and stdev are adequate for bell-shaped (and most symmetric) distributions, but do not completely describe skewed distributions. d) Draw below an approximate picture of the normal quantile (probability) plot for the data. Label both axes and show at least two numbers on each axis. [] e) Suppose the scores were rescaled such that each score is now 0% higher. What would be the mean, standard deviation and first quartile of the rescaled scores? [] This is a linear transformation: x new =. x. So the mean, stdev and Q of the rescaled scores are: x new =. 55.69 = 66. 88, s new =..5 = 5. 06, Q =. 50.5 60. = f) If you were to replace the scores by their logarithms, is there anything you can safely predict about the resulting distribution? [] The distribution would be even more left skewed. g) Upon checking, one test score of 9 was entered erroneously - the correct mark was actually 4. Which of the following measures will change when you correct the error? Circle them. [] Mean** Median StDev** Q Q

Question Using your normal tables, answer the following: a) The length of a snow storm in northern Ontario is known to have a normal distribution with mean hours and standard deviation hours. A storm that last more then 48 hours is recorded as a severe one. What is the proportion of severe storms? [4] P 48 ( X > 48 ) = P Z > = P( Z >.) =.908 =. 098 b) Determine the length of a storm such that 5% of the storms are longer then it. [4] x x P ( X < x) = P Z < = 0.95 =.65( or.64) x = 5. 8 hours Question A study of sleeping disorder and smoking classifies subjects as nonsmokers, moderate smokers, or heavy smokers. Samples of 90 men and 90 women are drawn from each group. Each person reports the number of hours of sleep he or she gets on a typical night. a) What are the explanatory and response variables? [] Explanatory: Smoking habits Response: Number of hours of sleep on typical night b) Explain carefully why this is or is not an experiment. (<0 words) [] It is not an experiment because the study does not impose treatments on the subjects. It is purely observational in nature. c) Do you think this study will show that smoking habits causes sleeping disorder? Explain your answer carefully like you were talking to someone who knows no statistics. (<60 words) [] No. Since patients were not randomly assigned to smoking habits, differences in results may be due to confounded factors. E.g. gender, physical condition, eating habits and exercising habits. 4

Question 4 Common sources of caffeine are coffee and tea. Suppose that 55% of the students at U of T drink coffee, 0% drink tea and 0% drink both. a) If we randomly choose one of U of T students, what is the probability that this student does not drink neither coffee nor tea. [] P(coffee or tea) =.55 +. -. =.75 P( neither coffee nor tea) = - P(coffee or tea) = -.75 = 0.5 b) If we randomly choose two students, find the probability that at least one of this students drink only tea. [] P(a student drinks only tea) =. -.=. P( at least drinks only tea ) = - P( none of them drink only tea) = - (0.8) = 0.6 (or add three probabilities) c) If we randomly choose three students, find the probability that exactly one of these three does not drink any of these beverages. [] P(exactly one student doesn t drinks tea or coffee) = 0.5 (0.75). d) In addition to the information given above, suppose now that 70% of the students who drink coffee are females while only 5% of those who doesn t drink coffee are females. If we randomly select one student, what is the probability of selecting a female? [] Coff 0.55 Female 0.7 NonCoff 0.45 Female 0.5 ( female) = Pr( Coffee) Pr( female Coffee) + Pr( noncoffee ) Pr( femal noncoffee ) Pr = 0.55 0.7 + 0.45 0.5 = 0.545 e) Continuing from above (part d), suppose that our randomly selected student turns out to be a female. What is the probability that she doesn t drink coffee? [] ( noncoffee female ) ( noncoffee and female ) Pr( female ) Pr 0.45 0.5 Pr = = = 0.90 0.545 5

Question 5 The GPA of students is thought to be related to their IO and self-concept. Higher self concept scores (based on standard test) indicate more positive self-concept. For a sample of 78 seventh grade students the following data was obtained: OBS GPA IQ SelfConcept 7.940 67 8.9 07 4 4.64 00 5 4 7.470 07 66 5 8.88 4 58... 78 6.98 06 56 Below are some graphs and Minitab outputs. Correlations: GPA IQ IQ 0.697 Self-Concept 0.56 0.49 Cell Contents: Pearson correlation 0 9 8 GPA 7 6 5 4 0 0 40 50 60 SelfConcept 70 80 6

0 9 8 GPA 7 6 5 4 70 80 90 00 0 0 0 40 IQ Regression Analysis: GPA versus IQ The regression equation is GPA = -.5 + 0.099 IQ Predictor Coef SE Coef T P Constant -.54.84 -.5 0.0 IQ 0.0995 0.07 8.47 0.000 S =.5 R-Sq = omitted R-Sq(adj) = 47.9% Analysis of Variance Source DF SS MS F P Regression.. 7.75 0.000 Residual Error 76 9.09.8 Total 77 70.40 Unusual Observations Obs 8 IQ 97 GPA.4 Fit 6.64 SE Fit 0.07 Residual -.95 St Resid -.96R 9 06 4.000 7.56 0.57 -.56 -.4R 54 7 7.95.885 0.459.40.68RX 58 79 5.06 4.579 0.8 0.48 0.7X 6 74 5.7 4.08 0.47.54 0.90X 70 77 4.885 4.8 0.404 0.504 0.9X R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence. 7

Regression Analysis: GPA versus SelfConcept The regression equation is GPA =.0 + 0.0794 SelfConcept Predictor Coef SE Coef T P Constant.0 0.8584.5 0.00 SelfConc 0.0794 0.047 5.9 0.000 S =.604 R-Sq = 7.7% R-Sq(adj) = 6.7% Analysis of Variance Source DF SS MS F P Regression 74.86 74.86 9.07 0.000 Residual Error 76 95.585.57 Total 77 70.40 Unusual Observations Obs SelfConc GPA Fit SE Fit Residual St Resid 8 5.0.4 7.07 0.0-4.660 -.9R 0.0 5.0 4.60 0.574 0.60 0.4X 46 5.0.647 7.5 0.96 -.505 -.0R 47 46.0.408 6.675 0.4 -.67 -.06R 48 66.0.96 8.64 0.5-4.8 -.7R R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence. Use the above outputs to answer the following questions. a) Which of the two variables (IQ and Self-Concept) is a better predictor of GPA? Why? [] IQ, since it has a higher correlation with GPA b) What percentage of the variation in GPA is explained by the linear regression on Self-concept? [] r = (0.54) = 0.94 = 9.4% c) How will the GPA change if the IQ score will: i. increase by 0 points? [] They increase by 0(.04) =.04 points ii. decrease by 7 points? [] They decrease by 7(.04) = 0.78 points 8

d) i. Predict the GPA of someone whose Self-concept score is 0. [] GPA =.9 + 0.090 (95) = 0.9 =. ii. Predict the GPA of someone whose Self-concept is 65. [] GPA =.9 + 0.090 (65) = 8.7 iii. How credible or trustworthy are your two predictions above? Explain. [] The first is less trustworthy, since it is an extrapolation. e) Attempting to make sense of the results, Ana came to the conclusion that high GPA is the result of a high IQ quotient. Do you agree? Defend your argument. (<0 words)[] f) For the regression of GPA on Self-concept explain what makes the 48 th student in the list somewhat unusual (do not use very technical terms like residual or deviation ) [] g) Here are the residual plots for the regression of GPA on IQ examine them and answer the questions below Residual Model Diagnostics Normal Plot of Residuals I Chart of Residuals Residual 0 - - - Residual 0 - - - 6 5 UCL=.58 Mean=0.00669 LCL=-.580 -.5 -.0-.5 -.0-0.5 0.0 0.5.0.5.0.5 Normal Score 0 0 0 0 40 50 60 70 Observ ation Number 80 5 Histogram of Residuals Residuals vs. Fits Frequency 0 5 0 - - - 0 Residual Residual 0 - - - 4 5 6 7 Fit 8 9 0 9

i. What do you learn from inspection of the above 4 plots? Are there any major problems with the model assumptions? [] There appear to be two different patterns, for males and for females, suggesting we cannot use just one prediction equation regardless of gender. The dispersion appears to be greater for those with low IQ than for those with higher IQ. ii. Describe the distribution of the residuals (shape, center, spread). [] Centre at zero. They are skewed left (negatively) h) Base on the graph below, will the fitted model overestimate or underestimat the GPA of students that have IQ below 80? [] Residuals 0 - - - 70 80 90 00 0 0 0 40 IQ We are underestimating since the residuals are positive for those students. 0

Residuals 0 - - - 70 80 90 00 0 0 0 40 IQ Question 6 A marketing experiment compares four different types of packaging for computer disks (denoted by A, B, C and D). Each type of packaging was presented in three different colors (Blue, Red and Orange). Each combination of package type with a particular color is shown to 40 potential customers, who rate the attractiveness of the product on a to 0 scale. a) Identify clearly and explicitly: i. The experimental units [] Computer disks (or packages) ii. The factors and the levels of each factor [] Type of package: levels: A, B, C and D Colour: levels: Blue, Red and Orange iii. The treatments. List all of them. [] Treatments are: (A, Blue), (A, Red), (A, Orange) (B, Blue), (B, Red), (B, Orange) (C, Blue), (C, Red), (C, Orange) (D, Blue), (D, Red), (D, Orange) iv. The response variable [] The attractiveness of the product.

b) Is this a randomized block design or a completely randomized design? [] Randomized block design (if each of the 40 customers views all treatment combinations). [CRD if different groups of 40 view each treatment] Question 7 Suppose that we want to select a sample of students from your STA0 class (69 students in total). Answer the following questions, where random below means by using the Random Number tables the customary way. a) If we assign each student in the classroom a number from 00 to 69, and then use a RN table to pick two distinct random numbers from 00 69, and then take the corresponding students. i. What do we call this type of sample? [] Simple random sample ii. If you start on line 50 what numbers those students will carry? [] 75 and 8 b) If we randomly select 40 students from the night section, then 0 at random from the morning section and finally 60 at random from the afternoon section, what type of sampling design is this? [] stratified sample c) Suppose that we order the students in attendance in some clear simple fashion say left seat to right seat, bottom row to top row and then select every 0 th student, after drawing an appropriate random number. What do we call this type of sampling design? [] systematic sample d) If we randomly select 8 rows in the classroom, then students randomly from each selected row, what do we call this type of sampling design? [] multi-stage sample (or two-stage sample; or cluster sample with subsampling)

Question 8 - Miscellanous a) Ty got 90 on his STA0 final exam where the class average was 75 and the standard deviation was 5 while in PSY6 he got 78 and the class average was 6 with a standard deviation 9. In which course did he do better (assuming similar shapes for both test frequency distribution)? sta0: z = 90-75 / 5 = + psy6: z = 78-6 / 9 = +.67 Better on psy test, since relative standing is better. b) Below is a scatter plot showing data on two variables, V and W V W Suppose we are interested in predicting W from V. i) On the plot above, draw in at least two bars showing exactly what is minimized (after squaring and summing) should we fit a least-square line to predict W from V. ii) Suppose the equation on the line given in the above plot is V = +.5W. The equation of the line to predict W from V is: W = + V. True / False? (circle one) False

c) 000 households are randomly selected from a list of all households for a survey. There is a 50% non-response rate. To compensate, the investigator select an additional 000 households randomly from the list (from which we will get about another 500 responses). i) Adding the additional 000 households will reduce the bias. True / False? (circle one) False ii) Adding the additional 000 households will reduce sampling variation. True / False? (circle one) True d) For the following 0 numbers, calculate the 0% trimmed mean: 5 6 0 7 4 6 6 Ave of middle four: = ave of {4 5 6 6 } = / 4 = 5.5 e) To compare the distribution of test scores for male and female, it would be most appropriate to compare: histograms, scatterplots, bar charts, pie charts, dotplots, boxplots, stem-and-leaf plots. (Circle all that apply) [] histograms, dotplots, boxplots, stem-and-leaf plots. f) Two types of eye drugs, type A and type B, are to be compared, for curing an infection. The response variable will be an appropriate measure of cure by examining the eye after certain amount of drug usages. 40 girls are available for the study. Each child will be given both drugs. What special instruction will you give the experimenter? (<0 words) Use A for one eye, and B for the other eye. Flip a coin to decide which eye gets A for each dog (or randomly select 0 dogs; they get A in left eye; the others get A in right eye) 4