Sample Statistics 5021 First Midterm Examination with solutions

Similar documents
Lecture 26 Section 8.4. Mon, Oct 13, 2008

Your Galactic Address

Nursing Facilities' Life Safety Standard Survey Results Quarterly Reference Tables

Use your text to define the following term. Use the terms to label the figure below. Define the following term.

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984.

Analyzing Severe Weather Data

C Further Concepts in Statistics

What Lies Beneath: A Sub- National Look at Okun s Law for the United States.

Summary of Natural Hazard Statistics for 2008 in the United States

Evolution Strategies for Optimizing Rectangular Cartograms

Smart Magnets for Smart Product Design: Advanced Topics

SAMPLE AUDIT FORMAT. Pre Audit Notification Letter Draft. Dear Registrant:

Class business PS is due Wed. Lecture 20 (QPM 2016) Multivariate Regression November 14, / 44

Drought Monitoring Capability of the Oklahoma Mesonet. Gary McManus Oklahoma Climatological Survey Oklahoma Mesonet

Forecasting the 2012 Presidential Election from History and the Polls

Module 19: Simple Linear Regression

Annual Performance Report: State Assessment Data

Multiway Analysis of Bridge Structural Types in the National Bridge Inventory (NBI) A Tensor Decomposition Approach

Swine Enteric Coronavirus Disease (SECD) Situation Report June 30, 2016

Appendix 5 Summary of State Trademark Registration Provisions (as of July 2016)

EXST 7015 Fall 2014 Lab 08: Polynomial Regression

Statistical Mechanics of Money, Income, and Wealth

2006 Supplemental Tax Information for JennisonDryden and Strategic Partners Funds

Further Concepts in Statistics

Further Concepts in Statistics

Meteorology 110. Lab 1. Geography and Map Skills

AIR FORCE RESCUE COORDINATION CENTER

Cluster Analysis. Part of the Michigan Prosperity Initiative

Swine Enteric Coronavirus Disease (SECD) Situation Report Sept 17, 2015

Data Visualization (DSC 530/CIS )

Regression Diagnostics

The veto as electoral stunt

AFRCC AIR FORCE RESCUE COORDINATION CENTER

Swine Enteric Coronavirus Disease (SECD) Situation Report Mar 5, 2015

Kari Lock. Department of Statistics, Harvard University Joint Work with Andrew Gelman (Columbia University)

Combinatorics. Problem: How to count without counting.

REGRESSION ANALYSIS BY EXAMPLE

Some concepts are so simple

Introduction to Mathematical Statistics and Its Applications Richard J. Larsen Morris L. Marx Fifth Edition

Final Exam. 1. Definitions: Briefly Define each of the following terms as they relate to the material covered in class.

Data Visualization (CIS 468)

Week 3 Linear Regression I

Empirical Application of Panel Data Regression

Analysis of the USDA Annual Report (2015) of Animal Usage by Research Facility. July 4th, 2017

Discontinuation of Support for Field Chemistry Measurements in the National Atmospheric Deposition Program National Trends Network (NADP/NTN)

A Test of Racial Bias in Capital Sentencing

Announcements. Lecture 10: Relationship between Measurement Variables. Poverty vs. HS graduate rate. Response vs. explanatory

Estimating Dynamic Games of Electoral Competition to Evaluate Term Limits in U.S. Gubernatorial Elections: Online Appendix

AC/RC Regional Councils of Colonels & Partnerships

Test of Convergence in Agricultural Factor Productivity: A Semiparametric Approach

Sociology 6Z03 Review I

A Second Opinion Correlation Coefficient. Rudy A. Gideon and Carol A. Ulsafer. University of Montana, Missoula MT 59812

Effects of Various Uncertainty Sources on Automatic Generation Control Systems

CS-11 Tornado/Hail: To Model or Not To Model

Time-Series Trends of Mercury Deposition Network Data

Resources. Amusement Park Physics With a NASA Twist EG GRC

Draft Report. Prepared for: Regional Air Quality Council 1445 Market Street, Suite 260 Denver, Colorado Prepared by:

$3.6 Billion GNMA Servicing Offering

APRIL 1999 THE WALL STREET JOURNAL CLASSROOM EDITION Hurricanes, typhoons, coastal storms Earthquake (San Francisco area) Flooding

Teachers Curriculum Institute Map Skills Toolkit 411

The Observational Climate Record

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Daily Operations Briefing June 9, 2012 As of 8:30 a.m. EDT

Clear Roads Overview. AASHTO Committee on Maintenance July 23, 2018 Charlotte, North Carolina

Blueline Tilefish Assessment Approach: Stock structure, data structure, and modeling approach

Chapter 2: Tools for Exploring Univariate Data

Estimations of Copper Roof Runoff Rates in the United States

The Vulnerable: How Race, Age and Poverty Contribute to Tornado Casualties

Daily Operations Briefing. August 2, 2012 As of 8:30 a.m. EDT

Authors: Antonella Zanobetti and Joel Schwartz

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Chapter 2. Data Analysis

MINERALS THROUGH GEOGRAPHY

Revised: 2/19/09 Unit 1 Pre-Algebra Concepts and Operations Review

Announcements. Lecture 18: Simple Linear Regression. Poverty vs. HS graduate rate

Daily Operations Briefing Tuesday, May 14, 2013 As of 8:30 a.m. EDT

SUPPLEMENTAL NUTRITION ASSISTANCE PROGRAM QUALITY CONTROL ANNUAL REPORT FISCAL YEAR 2008

Practice Questions for Exam 1

Daily Operations Briefing April 15, 2012 As of 8:30 a.m. EDT

AP Final Review II Exploring Data (20% 30%)

Homework Example Chapter 1 Similar to Problem #14

Correlation and Regression

Algebra Calculator Skills Inventory Solutions

Chapter 5: Exploring Data: Distributions Lesson Plan

Describing distributions with numbers

Chinle USD CURRICULUM GUIDE SUBJECT: MATH GRADE: 8th TIMELINE: 3 rd quarter

Daily Operations Briefing May 30, 2012 As of 8:30 a.m. EDT

Outline. Administrivia and Introduction Course Structure Syllabus Introduction to Data Mining

Drake Petroleum Company, Inc. Accutest Job Number: JB Total number of pages in report:

Samples and Surveys pp

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

download instant at

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Sample. Test Booklet. Subject: MA, Grade: HS PSSA 2013 Keystone Algebra 1. - signup at to remove - Student name:

Descriptive statistics

Towards a Bankable Solar Resource

FLOOD/FLASH FLOOD. Lightning. Tornado

Section 619. Profile. TA Center. early childhood. national. IDEAs partnerships results. The National Early Childhood Technical Assistance Center

Chapter 4. Displaying and Summarizing. Quantitative Data

Transcription:

THE UNIVERSITY OF MINNESOTA Statistics 5021 February 12, 2003 Sample First Midterm Examination (with solutions) 1. Baseball pitcher Nolan Ryan played in 20 games or more in the 24 seasons from 1968 through 1992. Here are the numbers of games he played in during these years. 21 39 39 35 30 33 25 41 37 21 35 32 27 42 31 35 30 30 30 28 34 29 34 27 (a) Make a stemplot of these data, ordering the leaves on each stem Note: Use 4 or more stems Five stems is probably the best choice here, using the first digit twice. 2* 11 2. 57789 3* 000012344 3. 555799 4* 12 1* 1 represents 11 Leaf digit unit = 1 The next larger size is 10 stems, using the digit 5 times. 2* 11 2t 2f 5 2s 77 This is really too many 2. 89 stems for a data set with n = 24. 3* 00001 It's hard to make out the shape 3t 23 3f 44555 3s 7 3. 99 4* 1 4t 2 1* 1 represents 11 Leaf digit unit = 1 Note: When there are two stems per first digit, the lower one gets leaves 0, 1, 2, 3 and 4 and the upper one gets leaves 5, 6, 7, 8 and 9. When there are 5 stems per first digit, the first gets leaves 0 and 1, the second 2 and 3, and so on. Thus neither of the following is a proper stemplot. 2* 115 2* 1157789 2. 7789 3. 000012344 3* 000012344555 3* 555799 3. 799 4. 12 4* 12 1

(b) Find the quartiles of these data (first or lower quartile, third or upper quartile, and median). Here is some MacAnova output: Cmd> ryan <- vector(21,39,39,35,30,33,25,41,37,21,35,32,\ 27,42,31,35,30,30,30,28,34,29,34,27) # enter data Cmd> sort(ryan) (1) 21 21 25 27 27 (6) 28 29 30 30 30 (11) 30 31 32 33 34 (16) 34 35 35 35 37 (21) 39 39 41 42 Having the data ordered makes this pretty easy. n = 24 is even, so the median is half way between 31 and 32, the 12 th and 13 th values in order of size. Thus M = 31.5. Q 1 is the median of the lower half (lower 12 values), that is half way between 28 and 29, the 6 th and 7 th values. Q 1 = 28.5. Q 3 is the median of the upper half (upper 12 values), that is half way between 35 and 35, the 6 th and 7 th values counting from the maximum. Q 3 = 35 More MacAnova output Cmd> n <- 24 Cmd> sum(ryan) (1) 765 Cmd> write(sum((ryan - sum(ryan)/n)^2)) NUMBER: (1) 722.625 (c) Use these values to calculate the sample mean x and standard deviation s. Do not enter the data in your calculator. Show your work. x = 31.875 s = 5.6052 x = x / n = 765/24 = 31.875 i 2 s = ( x x) / ( n 1 ) = 722.625 / 23 = 31.418 = 5.6052 i 2

2. Fifty boats, all of the same design, competed in a six mile sailboat race on Lake Superior. (a) Suppose you are told (not necessarily correctly) that the distribution of finishing times (times from start to finish) of these boats was approximately normal, with mean 110 minutes and standard deviation 20 minutes. Approximately what proportion of the finishes would you guess were between 90 and 130 minutes? 110 σ = 90 and 110 + σ = 130 so you are asked to guess the proportion of boats withing one standard deviation of the mean. By the 68-95-99.7 rule, about 68 of the data should be withing σ of the mean. More precisely, if the sailboat finishing times were exactly normal, Table A gives the proportion as 2(.8413.5) =.6826 = 68.26 or.8413.1587 Here is a relative frequency histogram of the actual finishing times of the 50 boats. The number above each bar is the relative frequency of sailboats in that time class. 0.4 0.35 0.3 0.25 Relative frequency histogram of sailboat finishing times 0.36 0.2 0.2 0.15 0.14 0.1 0.1 0.1 0.06 0.05 0.02 0.02 0 80 100 120 140 160 180 Finishing time (minutes) (b) Describe in no more than two sentences the shape of the distribution of finishing times. Would you expect the median to be larger than the mean, smaller than the mean, or about the same? The data are strongly positively skewed and positively skewed; possibly there are two outliers. You would expect the median to be smaller than the mean. This is characteristic of positively skewed data. 3

(c) What percent of the sail boats actually finished between 90 and 130 minutes after the start. Summing the relative frequencies of the corresponding bars, the proportion is.36 +.20 +.14 +.10 =.80 or 80 This is substantially more than the 68 predicted by the 68-95-99.7 rule. 3. The following table lists data for the 50 states and the District of Columbia from the 1993-1994 Statistical Abstract of the United states. There are two variables, the percent of the population living in areas in 1990 ( ) and the average 1991 hospital cost per day ( ). Also included are the two letter abbreviations for the states. AK 67.5 1130 ID 57.4 565 MT 52.5 437 RI 86.0 730 AL 60.4 637 IL 84.6 770 NC 50.4 651 SC 54.6 684 AR 53.5 571 IN 64.9 745 ND 53.3 454 SD 50.0 436 AZ 87.5 955 KS 69.1 585 NE 66.1 546 TN 60.9 716 CA 92.6 1037 KY 51.8 616 NH 51.0 713 TX 80.3 846 CO 82.4 801 LA 68.1 764 NJ 89.4 680 UT 87.0 915 CT 79.1 910 MA 84.3 880 NM 73.0 840 VA 69.4 701 DC 100.0 1038 MD 81.3 740 NV 88.3 907 VT 32.2 621 DE 73.0 845 ME 44.6 617 NY 84.3 694 WA 76.4 904 FL 84.8 836 MI 70.5 792 OH 74.1 782 WI 65.7 611 GA 63.2 652 MN 69.9 582 OK 67.7 684 WV 36.1 612 HI 89.0 686 MO 68.7 737 OR 70.5 896 WY 65.0 489 IA 60.6 538 MS 47.1 479 PA 68.9 732 4

Here is a scatterplot of these data. 1991 ital cost per day (dollars) vs 1990 percent popu 1200 AK Dollars 1000 800 600 400 VT WV CA AZ OR WA CT UT NV MA DE NM TX FL MI CO IN LA OH IL MO MD NH TN PA RI SC OK VA NY HI NJ NC AL GA ME KY WI AR ID KS MN IA NE MS WY SD MT ND DC 30 40 50 60 70 80 90 100 Percent 1990 population (a) Describe in about two sentences any regular pattern to the data and any apparent deviations from the pattern. There is a positive relationship, somewhat linear. There is one striking outlier in the y direction, namely Alaska where hospital costs are well out of line with other states. (b) Alaska (AK) has the highest hospital cost of any state. In a regression of hospital cost on percent rural population, is Alaska likely to be an influential case? Why or why not. No, Alaska is not likely to be an influential case. Although it appears to be an outlier in the y-direction, it is not an x-outlier. With x = 67.5, and x = 68.8 its percent population is very close to the overall mean. (c) Here are various summary numbers (x =, y = ). n = 51 x = 3509.0 ( x x) 2 = 11,539.94 ( y y) 2 = 1,272,643.7 y = 36,789 ( x x)( y y) = 80,509.43 s x = 15.192 s y = 159.54 r xy = 0.6643 Find the intercept and slope of the least squares regression line. Use the summary numbers given. Do not key in the data in your calculator. b = r xy s y /s x = 0.6643 159.54/15.192 = 6.976 or b = S xy /S xx = ( x x)( y y) / ( x x) 2 = 80,509.43/11,539.94 = 6.9766 a = y - bx Since y = 36,789/51 = 721.35 and x = 3509.0/51 = 68.804, a = 721.35 (6.9766) (68.804) = 241.33 5

4. Here are counts from a study of all gun related deaths in Milwaukee, between 1990 and 1994. Handgun Shotgun Rifle Unknown Totals Homicides 468 28 15 13 524 Suicides 124 22 24 5 175 Totals 592 50 39 18 699 (a) Find the conditional distribution of gun type given cause of death. This would have been more precise if it had asked to find the conditional distributions, since there are 2, the distribution of gun type given homicides and the distribution of gun type given suicides. Handgun Shotgun Rifle Unknown Total Homicides 468/524 =.893 28/524 =.053 15/524 =.029 13/524 =.025 1.000 Handgun Shotgun Rifle Unknown Total Suicides 124/175 =.709 22/175 =.126 24/175 =.137 5/175 =.029 1.001 (b) On the basis of these distributions, in not more than 2 or 3 sentences, describe the differences in types of weapons used in gun related homicides and suicides. It appears that proportion of suicides committed with a handgun is about 79 of the proportion of murders committed with a handgun. Conversely, shotguns were more than twice a popular and rifles more than 4 times as popular suicides than among murderers. 6