M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

Similar documents
M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

AP Final Review II Exploring Data (20% 30%)

Practice Questions for Exam 1

Chapter 2: Tools for Exploring Univariate Data

MATH 1150 Chapter 2 Notation and Terminology

Chapters 1 & 2 Exam Review

Example 2. Given the data below, complete the chart:

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Sociology 6Z03 Review I

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Describing distributions with numbers

Math 138 Summer Section 412- Unit Test 1 Green Form, page 1 of 7

Elementary Statistics

The response variable depends on the explanatory variable.

STAT 200 Chapter 1 Looking at Data - Distributions

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

The empirical ( ) rule

Descriptive Statistics Solutions COR1-GB.1305 Statistics and Data Analysis

Describing distributions with numbers

Sem. 1 Review Ch. 1-3

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

M & M Project. Think! Crunch those numbers! Answer!

TOPIC: Descriptive Statistics Single Variable

STA220H1F Term Test Oct 26, Last Name: First Name: Student #: TA s Name: or Tutorial Room:

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

CHAPTER 2: Describing Distributions with Numbers

Stat 101 Exam 1 Important Formulas and Concepts 1

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Sampling, Frequency Distributions, and Graphs (12.1)

Resistant Measure - A statistic that is not affected very much by extreme observations.

Instructor: Doug Ensley Course: MAT Applied Statistics - Ensley

Math 140 Introductory Statistics

Math 140 Introductory Statistics

STT 315 This lecture is based on Chapter 2 of the textbook.

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

Complement: 0.4 x 0.8 = =.6

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

Exam: practice test 1 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions

STA 218: Statistics for Management

Chapter 1. Looking at Data

are the objects described by a set of data. They may be people, animals or things.

Remember your SOCS! S: O: C: S:

Exploring Data. How to Explore Data

Chapter2 Description of samples and populations. 2.1 Introduction.

Describing Distributions with Numbers

OCR Maths S1. Topic Questions from Papers. Representation of Data

Units. Exploratory Data Analysis. Variables. Student Data

Stat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago

Statistics for Managers using Microsoft Excel 6 th Edition

Q 1 = 23.8 M = Q 3 = 29.8 IQR = 6 The numbers are in order and there are 18 pieces of data so the median is the average of the 9th and 10th

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Histograms allow a visual interpretation

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Test 3A AP Statistics Name:

A is one of the categories into which qualitative data can be classified.

Chapter 1: Exploring Data

Performance of fourth-grade students on an agility test

Statistics I Chapter 2: Univariate data analysis

A C E. Answers Investigation 4. Applications

Descriptive statistics

Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

Unit 2. Describing Data: Numerical

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

2011 Pearson Education, Inc

CHAPTER 1. Introduction

Examining Relationships. Chapter 3

Chapter 1: Exploring Data

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Statistics I Chapter 2: Univariate data analysis

Statistics 528: Homework 2 Solutions

If the roles of the variable are not clear, then which variable is placed on which axis is not important.

1.3.1 Measuring Center: The Mean

Chapter 6 Group Activity - SOLUTIONS

LC OL - Statistics. Types of Data

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Section 2.3: One Quantitative Variable: Measures of Spread

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Practice problems from chapters 2 and 3

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Mrs. Poyner/Mr. Page Chapter 3 page 1

Nicole Dalzell. July 2, 2014

MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline.

Summarising numerical data

Chapter 3. Data Description

Bivariate Data Summary

IB Questionbank Mathematical Studies 3rd edition. Grouped discrete. 184 min 183 marks

PS2.1 & 2.2: Linear Correlations PS2: Bivariate Statistics

Foundations of Math 1 Review

Vocabulary: Data About Us

SECTION I Number of Questions 42 Percent of Total Grade 50

Multiple Choice Circle the letter corresponding to the best answer for each of the problems below (4 pts each)

Chapter 1 Introduction & 1.1: Analyzing Categorical Data

Chapter 2: Summarizing and Graphing Data

Chapter 6 Assessment. 3. Which points in the data set below are outliers? Multiple Choice. 1. The boxplot summarizes the test scores of a math class?

Chapter 3. Measuring data

Transcription:

M 140 est 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDI! Problem Max. Points Your Points 1-10 10 11 10 12 3 13 4 14 18 15 8 16 7 17 14 otal 75

Multiple choice questions (1 point each) For questions 1-3 consider the following: he distribution of scores on a certain statistics test is strongly skewed to the right. 1. How would the mean and the median compare for this distribution? a) mean < median b) mean = median c) mean > median d) Can t tell without more information When a distribution skewed to the right, the tail is on the right. Since the tail pulls the mean, the mean is greater than the median. 2. Which set of measures of center and spread are more appropriate for the distribution of scores? a) Mean and standard deviation b) Median and interquartile range c) Mean and interquartile range d) Median and standard deviation For skewed distributions the median and the interquartile range are the appropriate measures of center and spread because they are resistant to outliers. 3. What does this suggest about the difficulty of the test? a) It was an easy test b) It was a hard test c) It wasn t too hard or too easy d) It is impossible to tell If the distribution is skewed to the right, then most of the values are to the left, toward the lower values. hus, more students got lower scores, which suggests that the test was hard. 4. Which one of the following variables is NO categorical? a) whether or not an individual has glasses b) weight of an elephant c) religion d) social security number All the other variables are categorical the average doesn t make sense for those. 5. What percent of the observations in a distribution lie between the first quartile Q 1 and the median? a) Approximately 25% b) Approximately 50% c) Approximately 75% d) Approximately 100% 2

Since Q1 is the first quartile (25%), and the median is the second quartile (50%), between them we have approximately 25% of the data. 6. Which of these statements are FALSE? a) here is a very weak linear relationship between gender and height because we found a correlation of 0.89. b) Plant height and leaf height are strongly correlated because the correlation coefficient is -1.41. c) If we change the units of measurements, the correlation coefficient will change. d) All of the above. e) None of the above. All of the statements are false. a) is false because a correlation of 0.89 means a strong, not a very weak relationship. b) is false because the correlation coefficient cannot be less than -1. c) is false because the correlation coefficient wouldn t change if we change the units of the measurement. 7. Consider two data sets A and B with more than 3 values. he sets are identical except that the maximum value of data set A is three times greater than the maximum value of set B. Which of the following statements is NO true? a) he median of set A is the same as the median of set B. b) he mean of set A is smaller than the mean of set B. c) he standard deviation of set A is the same as the standard deviation of set B. d) he range of set A is smaller than the range of set B. his question had a typo. It should have asked Which of the following statements is true? But since the question asked which one is NO true, b,c, and d, are all correct choices. 8. A vending machine automatically pours coffee into cups. he amount of coffee dispensed into a cup is roughly bell shaped with a mean of 7.6 ounces. From the Standard Deviation Rule we know that 68% of the cups contain an amount of coffee between 6.8 and 8.4 ounces. hus, the standard deviation of the distribution is approximately equal to: a) 0.8 ounces b) 0.4 ounces c) 1.2 ounces d) 0.2 ounces From the Standard Deviation Rule we know that 68% of the data is between one standard deviation below and above the mean. hus, the standard deviation must be 8.4 7.6 = 0.8. (Or 7.6 6.8) 9. A study of the salaries of professors at Smart University shows that the median salary for female professors is considerably less than the median male salary. Further investigation shows that the median salaries for male and female professors are about the same in every department (English, Physics, etc.) of the university. his apparent contradiction is an example of a) correlation b) Simpson s paradox c) causation d) extrapolation 3

Simpson s paradox demonstrates that a great deal of care has to be taken when combining small data sets into a large one. Sometimes conclusions from the large data set are exactly the opposite of conclusion from the smaller sets. And this is the case here. he conclusion was reversed as we included the lurking variable, department. 10. Which of the following statements is FALSE? a) he only way the standard deviation can be 0 is when all the observations have the same value. b) If you interchange the explanatory variable and the response variable, the correlation coefficient remains the same. c) If the correlation coefficient between two variables is 0, that means that there is no relationship between the two variables. d) he correlation coefficient has no units. e) he standard deviation has the same units as the data. c) is false because if r = 0, that means that there is no linear relationship between the two variables. It could be a very strong non-linear relationship. 11. RUE or FALSE? Circle either or F for each statement below (1 point each) F Consider the following data set: 2 3 5 7 8 10 12 15 20 31. According to the 1.5(IQR) rule, 31 is an outlier. Q1 = 5, Q3 = 15, so IQR = 10. According to the 1.5(IQR) rule, Q3 + 1.5(IQR) = 15 + 1.5(10) = 30. Since 31 is higher than 30, it is considered as an outlier according to this rule. F wo variables have a negative linear correlation. hat means that the response variable decreases as the explanatory variable increases. Yes, that s exactly what negative linear correlation means. F If in each of the three schools in a certain school district, a higher percentage of girls than boys gave out Valentines, then it must be true for the district as a whole that a higher percentage of girls than boys gave out Valentines. It could be, but it is not necessarily true. his goes back to Simpson s paradox. F We can display one quantitative variable graphically with a pie chart, or a bar graph. No, these are used to display a categorical variable. F A correlation of 0.6 is indicates a stronger relationship than the correlation of 0.5. Yes, assuming linear relationship, a correlation of 0.6 is stronger than 0.5. he sign only shows the direction. 4

For the following five statements consider this: Infant mortality rates per 1000 live births for the following regions are summarized with side-byside boxplots: (www.worldbank.org) Region 1: Asia (South and East) and the Pacific Region 2: Europe and Central Asia Region 3: Middle East and North Africa Region 4: North and Central America and the Caribbean Region 5: South America Region 6: Sub-Saharan Africa F About 75% of the countries in Region 1 have an infant mortality rate less than about 75, while about 75% of the countries in Region 6 have an infant mortality rate more than about 75. F he distribution of infant mortality rates in Region 4 is left skewed. No, it s right skewed. You can see the longer tail in the higher values. F he median infant mortality rate in Region 4 is about the same as the first quartile of Region 3. F he variability in the infant mortality rate is the smallest in Regions 2 and 5. F he infant mortality rate in all countries except for one in Region 2 is lower than the mortality rate of 50% of the countries in Region 1. 5

12. (3 points) Match each of the five scatterplots with its correlation. 0.75 0.7 0.95 0 0.95 a. b. e. d. c 13. (4 points) Answer the following questions: 6 a) Which measure of spread indicates variation about the mean? he standard deviaton b) Which graphical display shows the median and data spread about the median? boxplot 14. (18 points) For each of the situations described below, identify the explanatory variable and the response variable, and indicate if they are quantitative or categorical. Also, write the appropriate graphical display for each situation. a) You want to explore the relationship between the weight of the car and its mpg (miles per gallon). he explanatory variable is: weight of the car. he response variable is: mpg. he explanatory variable is Categorical Quantitative he response variable is Categorical Quantitative herefore, this is an example of Case III. An appropriate graphical display would be: scatterplot.

b) You want to explore the relationship between nationality and IQ scores. he explanatory variable is: nationality. he response variable is: IQ scores. he explanatory variable is Categorical Quantitative he response variable is Categorical Quantitative herefore, this is an example of Case I. An appropriate graphical display would be: side-by-side boxplots. c) You want to explore the relationship between gender and party affiliation. he explanatory variable is: gender. he response variable is: party affiliation. he explanatory variable is Categorical Quantitative he response variable is Categorical Quantitative herefore, this is an example of Case II. An appropriate graphical display would be: double bar graph. 15. (8 points) hirty students in an experimental psychology class used various techniques to train rats to move through a maze. he running times of the rats (in minutes) are recorded at the end of the experiment. he following stemplot shows the results, where 3 1 represents 3.1 minutes. he summary statistics are also provided. 0 6 1 0167799 2 0047 3 123678 4 02455 5 236 6 0 7 6 8 0 9 1 N = 30 Mean = 3.45 Median = 3.698 StDev = 2.118 Minimum = 0.6 Maximum = 9.1 Q1 = 1.9 Q3 = 4.5 a) Briefly describe the shape of the distribution from the stemplot. Skewed to the right 7

b) Check the mean and the median. What do you think, did I switch them to trick you, or do they seem to be correct? Explain. hey are incorrect. When the distribution is skewed to the right, the mean is usually higher than the median. So I did switch them. c) Find the value of the interquartile range, and interpret it clearly in the context of the experiment. IQR = Q3 Q1 = 4.5 1.9 = 2.6 Interquartile range is the range of the middle50% of the data. In this example, it means that 16 of the rats completed the maze between 1.9 and 4.5 minutes. Seven rats completed the maze below 1.9 minutes, and seven completed it in more than 4.5 minutes. d) Which one of the following boxplots represents the above data? Circle your answer. 9.1 is a suspected outlier. According to the 1.5(IQR) rule it is an outlier. Q3 + 1.5(IQR) = 4.5 + 1.5(2.6) = 8.4. Since 9.1 is above 8.4, it is an outlier. hat leaves C2 or C3 to be the correct answers. But since the next highest number in the list is 8.0 minutes, the correct boxplot must be C2. 16. (7 points) Consider the following two distributions. he first one (A) shows the distribution of the number of houseplants owned by a sample of 30 households in Los Angeles. he second one (B) shows for a sample of 30 freshmen the distribution of the number of girlfriends/boyfriends they have ever had. 8

a) What percent of households have eight or more houseplants? One household has 8 plants, and one household has 10 plants (see yellow highlights on the graph). hat is 2 out of 30 households, 2/30 = 6.67% b) Which distribution has the higher standard deviation and why? B has the higher standard deviation because the values are farther from the mean on average. c. Select the statement below that gives the most complete and correct statistical description of the graph A. A. he bars go from 0 to 10, increasing in height to 4, then decreasing to 10. he tallest bar is at 4. here is a gap between 8 and 10. B. he distribution is approximately normal, with a mean of about 4 and a standard deviation of about 1. C. Most households seem to have about 4 houseplants, but some have more or less. One household has 10 plants. D. he distribution of the number of houseplants is somewhat symmetric and bell-shaped, with one possible outlier at 10. he typical number of houseplants owned is about 4, and the overall range is 10 plants. A description of a distribution should always be in context. So answers A and B are out. Between C and D, D is the most complete and correct description. 17. he number of hours 21 students spent watching V during the weekend (in hours) and their scores on the test they took on the following Monday were recorded. Hours spent watching V 0.0 1.0 2.5 3.0 3.5 5.0 5.5 5.0 6.0 7.5 7.0 10.0 2.0 4.5 6.0 8.0 2.0 1.5 8.0 8.5 9.0 est scores 96 85 82 54 93 68 76 84 58 65 75 50 91 69 61 52 92 84 62 57 58 he mean hours spent watching V is: 5.02 hours he standard deviation of the hours spent watching V is: 2.90 hours he mean test scores is: 72.00 he standard deviation of the test scores is: 14.97 he correlation coefficient is: 0.80 9

a) Describe the scatterplot. Make sure you mention all four features. Form: linear Direction: negative Strength: moderately strong Outliers: maybe one at (3, 54) b) Find the equation of the least squares line, and sketch the line on the plot. Use three decimal digits in your answers. he equation of the LSRL is Y = a + bx, where b = r s, and a = Y bx. s In this case, b r s y 14. 97 = = 080. = 4130. and s 2. 90 x a = Y bx = 72. 00 ( 4130. )( 502. ) = 92. 733 hus, the equation of the LSRL is Y = 92.733 4.130X c) Provide an interpretation of the slope in this context. he slope is -4.130 scores/hour V watched. hat means that students who watched V for an extra hour got 4.130 points less on average on their tests. d) Predict the test scores for a student who watched V for 4 hours on the weekend. Y = 92.733 4.130(4) = 76.213 Using the LSRL, we can predict that a student who watched V for 4 hours on the weekend got about 76 points on the test. e) Would it be OK to use the regression line to predict the test score for a student who watched V on the weekend for 12 hours? Explain. No. hat would be extrapolation. f) Does the data support the notion that test scores are associated with the number of hours student watch V on the weekend? Explain briefly, giving all evidence that supports your contention. Yes. It seems to be a negative association between the number of hours spent watching V on the weekend and the test scores. he scatterplot shows a negative relationship, and the correlation coefficient is high. In fact, r 2 = 0.64, meaning that about 64% of the variation in the test scores can be explained by the number of hours spent watching V, while 36% of the variation is attributable to factors other than number of hours watched V. g) Can you name one lurking variable that might affect the test scores? y x 10 Maybe students interest in the material. Number of hours slept. est anxiety. Gender. IQ. Students previous knowledge of the material, etc.