Stat 139 Homework 2 Solutions, Spring 2015

Size: px
Start display at page:

Download "Stat 139 Homework 2 Solutions, Spring 2015"

Transcription

1 Stat 139 Homework 2 Solutions, Spring 2015 Problem 1. A pharmaceutical company is surveying through 50 different targeted compounds to try to determine whether any of them may be useful in treating migraine headaches. From previous experiments like this, they believe that each compound independently has a 1/100 chance of truly being effective, and 99/100 chance of having zero effect. For each potential compound, they perform a hypothesis test to determine whether it is effective at α = An effective drug will be statistically significant based on this hypothesis test 80% of the time (which is called statistical power. (a What is the expected number of compounds that will be shown to be statistically significant based on these fifty separate hypothesis tests? Let E i be the event that the i th compound is trule effective, and let R i be the event that it reject the null hypothesis (aka, statistically significant. For any one compound, the probability of being statistically significant can be calculated based on the Law of Total Probability: P (R i = P (R i E i + P (R i E C i = P (R i E i P (E i + P (R i E C i P (E C i = (0.80( (0.05(0.99 = Let X i be the indicator r.v. for whether the i th compound is statistically significant, and let T be the total number of compounds that are shown to be statistically significant. Based on linearity of expectation, the expected number becomes: E(T = E(X 1 + X X 50 = E(X 1 + E(X E(X 50 = 50( = (b Given a compound is flagged as statistically significant, what is the probability that it is actually effective in treating migraine headaches? Here we want to determine P (E i R i, which we can use Bayes Rule to calculate (since we are flipping which event is conditioned on: P (E i R i = P (R i E i P (E i P (R i E i P (E i + P (R i E C i P (EC i = (0.80(0.01 (0.80( (0.05( (c After testing the 50 potential compounds, the company has exactly 1 compound that was deemed to be statistically significant based on the tests. Let π be the probability that it is actually effective in treating migraine headaches. How does π compare to your result in part (b? Explain briefly. It depends on whether you think the characteristics of the testing and the compounds are fixed given in the problem statement. Intuitively speaking, if you see fewer compounds as significant than expected, then there is a good chance that the assumption of 1/100 truly effective is an overestimate, so one could argue that π should be even lower than If you think the 1/100 probability is truly a known and fixed value, then this probability should be the same as part (b. The key is that the characteristics of the test and the compounds have not changed, so for any selected compound, the probability should still be that it is truly effective, no matter how many statistically significant compounds you find out of the 50 compounds tested. Problem 2. ACT scores of high school seniors. The scores of high school seniors on the ACT college entrance examination in a recent year had mean µ = 19.2 and standard deviation σ = 5.1. The distribution of individual scores is only roughly Normal. 1

2 (a What is the approximate probability that a single student randomly chosen from all those taking the test scores 23 or higher? Let X be the test score of a single student. Then: ( P (X 23 P Z > = P (Z > = (b Now take an SRS of 25 students who took the test. What would be the mean and standard deviation of the sampling distribution of the sample mean score, X, for n = 25 students? Based on the Law of Large Numbers, E( X = µ and Var( X = σ2 n, where µ and σ are the mean and standard deviation for a single student. Then E( X = 19.2 and SD( X = Var ( X = = 1.02 in this problem. (c What is the approximate probability that the mean scores of these 25 students is 23 or higher? By the Central Limit Theorem, we know that X is approximately Normal with the mean and standard deviation found in part (b. Then: ( P ( X P Z > = P (Z > 3.73 < (d Which of your two Normal probability calculations in parts (a and (c is more accurate? Why? The distribution of single-student scores is only roughly Normal (it is very discretized after all since individual ACT scores can only be whole numbers, but the sampling of the distribution of X is closer to Normal (although still approximate by the CLT (and can be fractions of 1/25. So we believe that the calculation in part (c is more accurate. Problem 3. The sum of squares of a sample of data is minimized when the sample mean, X = Xi /n, is used as the basis of the calculation. Define g(c as a function w.r.t. c as: g(c = (X i c 2. Show that this function is minimized at the value c = X. In order to minimize a function, we have to take the first derivative (w.r.t. c and set to zero. Then we can take the second derivative and make sure it is positive at x (concave up: g (c = 2 (x i c 0 = c = x i = n c = x i = c = x i /n = x g (c = 2 1 = 2n > 0 Problem 4. Let X 1,..., X i,..., X n be independent random variables drawn from a population with mean µ and variance σ 2. Let X be a sample average. Recall that σ 2 can be estimated by S 2, the usual sample variance, defined as: n S 2 = (X i X ( 2 = 1 Xi 2 n n 1 n 1 X 2. 2

3 (a Show that E(X 2 i = σ2 + µ 2, using the fact that σ 2 = E ( (X i µ 2. E(X 2 i = E(X 2 i 2µ 2 + 2µ 2 = E(X 2 i 2µX i + µ 2 + E(µ 2 = E ( (X i µ 2 + E(µ 2 = σ 2 + µ 2 Note: E(µ 2 = µe(µ = µe(x i = E(µX i. (b Show that E(S 2 = σ 2, i.e., S 2 is an unbiased estimator of the population variance. [ ( ] ( E(S 2 1 = E Xi 2 n n 1 X 2 = 1 E(Xi 2 ne( n 1 X 2 = 1 n 1 ( n(σ 2 + µ 2 n(σ 2 /n + µ 2 Note: E( X 2 = σ 2 X + µ 2 X = σ 2 /n + µ 2 based on the Law of Large Numbers. Problem 5. Let X 1, X 2,..., X 25 be i.i.d. Normal r.vt].s. with mean µ = 1 and variance σ 2 = 3 2 = 9. Let s 2 be the usual variance estimate: S 2 = (X i X 2 /(n 1, and let ˆσ 2 be the estimate using µ in the calculation instead: ˆσ 2 = (X i µ 2 /n. Write a simulation in R, using a for-loop based on at least 10,000 iterations, to determine the following (be sure to include the relevant R code and output: (a That both estimators (S 2 and ˆσ 2 are unbiased. Based on 10,000 iterations, the observed means of both estimators were within 0.01 units of the true variance of 9. We could formally test if the is significantly different from 9 (based on n = 10, 000 realizations, but that is overkill. Here is the relevant R code: > nsims=10000 > mu=1 > sigma=3 > n=25 > sigma2.hat=s2=rep(na,nsims > > for(i in 1:nsims{ + sample=rnorm(n,mean=mu,sd=sigma + xbar=mean(sample + sigma2.hat[i]=sum((sample-mu^2/n + s2[i]=var(sample + } > mean(sigma2.hat [1] > mean(s2 [1] (b Provide a separate histogram for each of the two sampling distributions. Which has lower spread? Based on the R output below, ˆσ 2 has slightly smaller spread than S 2 (about 3% lower standard deviation. 3

4 > sd(sigma2.hat [1] > sd(s2 [1] Histogram of sigma2.hat Histogram of s2 Frequency Frequency sigma2.hat s2 (c Which estimator is closer to the true value more often. Based on the R output below, ˆσ 2 is as close or closer than S 2 about 52.4% of the time. > mean(abs(sigma2.hat-sigma^2>abs(s2-sigma^2 [1] (d Are you sure of your answers above? What could you do to be more certain? No, I am not certain of the answers above since these are based on random simulations. We could be more certain if we based this study on more iterations, or if we performed a formal test to see if the results above were statistically significant. Problem 6. The BOSsnowfall.csv data set on the course website has weather measurements made at Logan Airport. There are two variables in this data set measured annually from winter until winter : totalsnow: the total amount of snow fall for a winter season, in inches avgmaxtemp: the average daily high temperature for the previous calendar year, in degrees F (a Calculate the following summary statistics for both the totalsnow and avgmaxtemp variables: sample mean, sample SD, min, median, max, 1st and 3rd quartiles. > summary(snow season totalsnow avgmaxtemp : 1 Min. : 9.00 Min. : : 1 1st Qu.: st Qu.: : 1 Median : Median :

5 : 1 Mean : Mean : : 1 3rd Qu.: rd Qu.: : 1 Max. : Max. :61.27 (Other:85 > sd(snow$totalsnow [1] > sd(snow$avgmaxtemp [1] (b Split the observations into two groups: the winters with avgmaxtemp at or below the 3rd quartile, as calculated in part (a, vs. the winters above the 3rd quartile. Plot side-by-side boxplots of totalsnow for the two groups and describe the shapes of their distributions. Are there any visible differences? Histogram of meandiff.sim Frequency High Low meandiff.sim The boxplot to the left above shows the annual snowfall for years when the average maximum temperature is in the top quartile, vs. the bottom 75%. Both boxplots appear to be right-skewed. When the temperature is cooler, there appears to be more snowfall, on average. There also seems to be more spread in the cooler group, but this may just be because there are more observations in that group (3-to-1. More details below. (c Comment on whether you think the group means are very different or not (without conducting any formal tests. Based on the side-by-side boxplots above, it appears that the High group (when the average temperature for the year is above 59.63, the 3rd quartile has typically lower amounts of snowfall. The median is lower (the line inside the box, the middle 50% of the distribution (the box is shifted down, and the highest values are lower for the High group as well compared to the Low group. (d Perform a permutation test based on 10,000 iterations to determine whether totalsnow differs between winters where the temperature was at or below the 3rd quartile vs. above the 3rd quartile. Please refer to the Unit 2 lecture notes for useful R code. Be sure to state the hypotheses, calculate the test statistic, produce a histogram of the reference distribution, calculate the p-value based on this distribution, and state the conclusion of the procedure (be sure to mention the scope of the inference. Here is some relevant R output (see HW 2 Solutions R Code.R for the remaining R commands used. 5

6 > meandiff.obs [1] > mean(meandiff.sim [1] > sd(meandiff.sim [1] > #two-sided p-value > mean( abs(meandiff.sim >= abs(meandiff.obs [1] Based on the R ouptput and the histogram above (the reference distribution for the test statistic, we can perform the following Hypothesis Test (a permutation test at the α = 0.05 level, where Y high = Y low + δ: H 0 : δ = 0 vs. H A : δ 0 T = Ȳhigh Ȳlow = p value Since our estimated p-value = , which is two-sided, is less than α = 0.05, we have just enough evidence to conclude that the average snowfall in Boston is different in the two groups; in fact, snowfall tends to be lower in years with high temperature. This is certainly not a causal relationship (no way to randomly assign temperature to years, and this is not a random sample of years, so this does not necessarily mean the trend generalizes outside the years studied or to other locations. 6

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1 Math 66/566 - Midterm Solutions NOTE: These solutions are for both the 66 and 566 exam. The problems are the same until questions and 5. 1. The moment generating function of a random variable X is M(t)

More information

Chapter 18. Sampling Distribution Models. Bin Zou STAT 141 University of Alberta Winter / 10

Chapter 18. Sampling Distribution Models. Bin Zou STAT 141 University of Alberta Winter / 10 Chapter 18 Sampling Distribution Models Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 10 Population VS Sample Example 18.1 Suppose a total of 10,000 patients in a hospital and

More information

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)

More information

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN Lecture Notes 5 Convergence and Limit Theorems Motivation Convergence with Probability Convergence in Mean Square Convergence in Probability, WLLN Convergence in Distribution, CLT EE 278: Convergence and

More information

Lecture 8 Sampling Theory

Lecture 8 Sampling Theory Lecture 8 Sampling Theory Thais Paiva STA 111 - Summer 2013 Term II July 11, 2013 1 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013 Lecture Plan 1 Sampling Distributions 2 Law of Large

More information

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters.

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters. Chapter 9: Sampling Distributions 9.1: Sampling Distributions IDEA: How often would a given method of sampling give a correct answer if it was repeated many times? That is, if you took repeated samples

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20 CS 70 Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20 Today we shall discuss a measure of how close a random variable tends to be to its expectation. But first we need to see how to compute

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Math Review Sheet, Fall 2008

Math Review Sheet, Fall 2008 1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the

More information

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. Chapter 3 Numerically Summarizing Data Chapter 3.1 Measures of Central Tendency Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. A1. Mean The

More information

Discrete Random Variables

Discrete Random Variables Discrete Random Variables We have a probability space (S, Pr). A random variable is a function X : S V (X ) for some set V (X ). In this discussion, we must have V (X ) is the real numbers X induces a

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I 1 / 16 Nonparametric tests Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I Nonparametric one and two-sample tests 2 / 16 If data do not come from a normal

More information

Chapter 23: Inferences About Means

Chapter 23: Inferences About Means Chapter 3: Inferences About Means Sample of Means: number of observations in one sample the population mean (theoretical mean) sample mean (observed mean) is the theoretical standard deviation of the population

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

X = X X n, + X 2

X = X X n, + X 2 CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 22 Variance Question: At each time step, I flip a fair coin. If it comes up Heads, I walk one step to the right; if it comes up Tails, I walk

More information

20 Hypothesis Testing, Part I

20 Hypothesis Testing, Part I 20 Hypothesis Testing, Part I Bob has told Alice that the average hourly rate for a lawyer in Virginia is $200 with a standard deviation of $50, but Alice wants to test this claim. If Bob is right, she

More information

Ch. 1: Data and Distributions

Ch. 1: Data and Distributions Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and

More information

Unit 22: Sampling Distributions

Unit 22: Sampling Distributions Unit 22: Sampling Distributions Summary of Video If we know an entire population, then we can compute population parameters such as the population mean or standard deviation. However, we generally don

More information

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing & z-test Lecture Set 11 We have a coin and are trying to determine if it is biased or unbiased What should we assume? Why? Flip coin n = 100 times E(Heads) = 50 Why? Assume we count 53 Heads... What could

More information

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700 Class 4 Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science Copyright 013 by D.B. Rowe 1 Agenda: Recap Chapter 9. and 9.3 Lecture Chapter 10.1-10.3 Review Exam 6 Problem Solving

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

Bootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location

Bootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location Bootstrap tests Patrick Breheny October 11 Patrick Breheny STA 621: Nonparametric Statistics 1/14 Introduction Conditioning on the observed data to obtain permutation tests is certainly an important idea

More information

Chapter 24. Comparing Means

Chapter 24. Comparing Means Chapter 4 Comparing Means!1 /34 Homework p579, 5, 7, 8, 10, 11, 17, 31, 3! /34 !3 /34 Objective Students test null and alternate hypothesis about two!4 /34 Plot the Data The intuitive display for comparing

More information

INTRODUCTION TO ANALYSIS OF VARIANCE

INTRODUCTION TO ANALYSIS OF VARIANCE CHAPTER 22 INTRODUCTION TO ANALYSIS OF VARIANCE Chapter 18 on inferences about population means illustrated two hypothesis testing situations: for one population mean and for the difference between two

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Lecture 27. DATA 8 Spring Sample Averages. Slides created by John DeNero and Ani Adhikari

Lecture 27. DATA 8 Spring Sample Averages. Slides created by John DeNero and Ani Adhikari DATA 8 Spring 2018 Lecture 27 Sample Averages Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu) Announcements Questions for This Week How can we quantify natural

More information

Outline. Unit 3: Inferential Statistics for Continuous Data. Outline. Inferential statistics for continuous data. Inferential statistics Preliminaries

Outline. Unit 3: Inferential Statistics for Continuous Data. Outline. Inferential statistics for continuous data. Inferential statistics Preliminaries Unit 3: Inferential Statistics for Continuous Data Statistics for Linguists with R A SIGIL Course Designed by Marco Baroni 1 and Stefan Evert 1 Center for Mind/Brain Sciences (CIMeC) University of Trento,

More information

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008 MIT OpenCourseWare http://ocw.mit.edu 2.830J / 6.780J / ESD.63J Control of Processes (SMA 6303) Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Stat 427/527: Advanced Data Analysis I

Stat 427/527: Advanced Data Analysis I Stat 427/527: Advanced Data Analysis I Review of Chapters 1-4 Sep, 2017 1 / 18 Concepts you need to know/interpret Numerical summaries: measures of center (mean, median, mode) measures of spread (sample

More information

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Understand Difference between Parametric and Nonparametric Statistical Procedures Parametric statistical procedures inferential procedures that rely

More information

Confidence Intervals for Comparing Means

Confidence Intervals for Comparing Means Comparison 2 Solutions COR1-GB.1305 Statistics and Data Analysis Confidence Intervals for Comparing Means 1. Recall the class survey. Seventeen female and thirty male students filled out the survey, reporting

More information

Probabilities & Statistics Revision

Probabilities & Statistics Revision Probabilities & Statistics Revision Christopher Ting Christopher Ting http://www.mysmu.edu/faculty/christophert/ : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 January 6, 2017 Christopher Ting QF

More information

Conditioning Nonparametric null hypotheses Permutation testing. Permutation tests. Patrick Breheny. October 5. STA 621: Nonparametric Statistics

Conditioning Nonparametric null hypotheses Permutation testing. Permutation tests. Patrick Breheny. October 5. STA 621: Nonparametric Statistics Permutation tests October 5 The conditioning idea In most hypothesis testing problems, information can be divided into portions that pertain to the hypothesis and portions that do not The usual approach

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Introduction to Econometrics. Review of Probability & Statistics

Introduction to Econometrics. Review of Probability & Statistics 1 Introduction to Econometrics Review of Probability & Statistics Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com Introduction 2 What is Econometrics? Econometrics consists of the application of mathematical

More information

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things. (c) Epstein 2013 Chapter 5: Exploring Data Distributions Page 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms Individuals are the objects described by a set of data. These individuals

More information

What is a parameter? What is a statistic? How is one related to the other?

What is a parameter? What is a statistic? How is one related to the other? Chapter Seven: SAMPLING DISTRIBUTIONS 7.1 Sampling Distributions Read 424 425 What is a parameter? What is a statistic? How is one related to the other? Example: Identify the population, the parameter,

More information

, 0 x < 2. a. Find the probability that the text is checked out for more than half an hour but less than an hour. = (1/2)2

, 0 x < 2. a. Find the probability that the text is checked out for more than half an hour but less than an hour. = (1/2)2 Math 205 Spring 206 Dr. Lily Yen Midterm 2 Show all your work Name: 8 Problem : The library at Capilano University has a copy of Math 205 text on two-hour reserve. Let X denote the amount of time the text

More information

Survey on Population Mean

Survey on Population Mean MATH 203 Survey on Population Mean Dr. Neal, Spring 2009 The first part of this project is on the analysis of a population mean. You will obtain data on a specific measurement X by performing a random

More information

University of Regina. Lecture Notes. Michael Kozdron

University of Regina. Lecture Notes. Michael Kozdron University of Regina Statistics 252 Mathematical Statistics Lecture Notes Winter 2005 Michael Kozdron kozdron@math.uregina.ca www.math.uregina.ca/ kozdron Contents 1 The Basic Idea of Statistics: Estimating

More information

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2 STA 248 H1S MIDTERM TEST February 26, 2008 SURNAME: SOLUTIONS GIVEN NAME: STUDENT NUMBER: INSTRUCTIONS: Time: 1 hour and 50 minutes Aids allowed: calculator Tables of the standard normal, t and chi-square

More information

STATISTICS AND BUSINESS MATHEMATICS B.com-1 Private Annual Examination 2015

STATISTICS AND BUSINESS MATHEMATICS B.com-1 Private Annual Examination 2015 B.com-1 STATISTICS AND BUSINESS MATHEMATICS B.com-1 Private Annual Examination 2015 Compiled & Solved By: JAHANGEER KHAN (SECTION A) Q.1 (a): Find the distance between the points (1, 2), (4, 5). SOLUTION

More information

Chapter 22. Comparing Two Proportions 1 /29

Chapter 22. Comparing Two Proportions 1 /29 Chapter 22 Comparing Two Proportions 1 /29 Homework p519 2, 4, 12, 13, 15, 17, 18, 19, 24 2 /29 Objective Students test null and alternate hypothesis about two population proportions. 3 /29 Comparing Two

More information

HOMEWORK ANALYSIS #3 - WATER AVAILABILITY (DATA FROM WEISBERG 2014)

HOMEWORK ANALYSIS #3 - WATER AVAILABILITY (DATA FROM WEISBERG 2014) HOMEWORK ANALYSIS #3 - WATER AVAILABILITY (DATA FROM WEISBERG 2014) 1. In your own words, summarize the overarching problem and any specific questions that need to be answered using the water data. Discuss

More information

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing 1. Purpose of statistical inference Statistical inference provides a means of generalizing

More information

Statistical Methods for Astronomy

Statistical Methods for Astronomy Statistical Methods for Astronomy Probability (Lecture 1) Statistics (Lecture 2) Why do we need statistics? Useful Statistics Definitions Error Analysis Probability distributions Error Propagation Binomial

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

value mean standard deviation

value mean standard deviation Mr. Murphy AP Statistics 2.4 The Empirical Rule and z - Scores HW Pg. 208 #4.45 (a) - (c), 4.46, 4.51, 4.52, 4.73 Objectives: 1. Calculate a z score. 2. Apply the Empirical Rule when appropriate. 3. Calculate

More information

Chapter 8: Sampling Distributions. A survey conducted by the U.S. Census Bureau on a continual basis. Sample

Chapter 8: Sampling Distributions. A survey conducted by the U.S. Census Bureau on a continual basis. Sample Chapter 8: Sampling Distributions Section 8.1 Distribution of the Sample Mean Frequently, samples are taken from a large population. Example: American Community Survey (ACS) A survey conducted by the U.S.

More information

3.1 Measure of Center

3.1 Measure of Center 3.1 Measure of Center Calculate the mean for a given data set Find the median, and describe why the median is sometimes preferable to the mean Find the mode of a data set Describe how skewness affects

More information

Statistics. Statistics

Statistics. Statistics The main aims of statistics 1 1 Choosing a model 2 Estimating its parameter(s) 1 point estimates 2 interval estimates 3 Testing hypotheses Distributions used in statistics: χ 2 n-distribution 2 Let X 1,

More information

Lecture 4. August 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Lecture 4. August 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University. random Lecture 4 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University August 24, 2007 random 1 2 3 4 random 5 6 7 8 9 random 1 Define random 2 and 3 4 Co

More information

Paired Samples. Lecture 37 Sections 11.1, 11.2, Robb T. Koether. Hampden-Sydney College. Mon, Apr 2, 2012

Paired Samples. Lecture 37 Sections 11.1, 11.2, Robb T. Koether. Hampden-Sydney College. Mon, Apr 2, 2012 Paired Samples Lecture 37 Sections 11.1, 11.2, 11.3 Robb T. Koether Hampden-Sydney College Mon, Apr 2, 2012 Robb T. Koether (Hampden-Sydney College) Paired Samples Mon, Apr 2, 2012 1 / 17 Outline 1 Dependent

More information

Business Statistics: A Decision-Making Approach 6 th Edition. Chapter Goals

Business Statistics: A Decision-Making Approach 6 th Edition. Chapter Goals Chapter 6 Student Lecture Notes 6-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 6 Introduction to Sampling Distributions Chap 6-1 Chapter Goals To use information from the sample

More information

Nonparametric hypothesis tests and permutation tests

Nonparametric hypothesis tests and permutation tests Nonparametric hypothesis tests and permutation tests 1.7 & 2.3. Probability Generating Functions 3.8.3. Wilcoxon Signed Rank Test 3.8.2. Mann-Whitney Test Prof. Tesler Math 283 Fall 2018 Prof. Tesler Wilcoxon

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and

More information

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression

More information

Permutation tests. Patrick Breheny. September 25. Conditioning Nonparametric null hypotheses Permutation testing

Permutation tests. Patrick Breheny. September 25. Conditioning Nonparametric null hypotheses Permutation testing Permutation tests Patrick Breheny September 25 Patrick Breheny STA 621: Nonparametric Statistics 1/16 The conditioning idea In many hypothesis testing problems, information can be divided into portions

More information

An inferential procedure to use sample data to understand a population Procedures

An inferential procedure to use sample data to understand a population Procedures Hypothesis Test An inferential procedure to use sample data to understand a population Procedures Hypotheses, the alpha value, the critical region (z-scores), statistics, conclusion Two types of errors

More information

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

EC2001 Econometrics 1 Dr. Jose Olmo Room D309 EC2001 Econometrics 1 Dr. Jose Olmo Room D309 J.Olmo@City.ac.uk 1 Revision of Statistical Inference 1.1 Sample, observations, population A sample is a number of observations drawn from a population. Population:

More information

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

Stat 135 Fall 2013 FINAL EXAM December 18, 2013 Stat 135 Fall 2013 FINAL EXAM December 18, 2013 Name: Person on right SID: Person on left There will be one, double sided, handwritten, 8.5in x 11in page of notes allowed during the exam. The exam is closed

More information

CHAPTER 18 SAMPLING DISTRIBUTION MODELS STAT 203

CHAPTER 18 SAMPLING DISTRIBUTION MODELS STAT 203 1 CHAPTER 18 SAMPLING DISTRIBUTION MODELS STAT 203 Outline 2 Sampling Distribution for Proportions Sample Proportions The mean The standard deviation The Distribution Model Assumptions and Conditions Sampling

More information

are the objects described by a set of data. They may be people, animals or things.

are the objects described by a set of data. They may be people, animals or things. ( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms

More information

2008 Winton. Statistical Testing of RNGs

2008 Winton. Statistical Testing of RNGs 1 Statistical Testing of RNGs Criteria for Randomness For a sequence of numbers to be considered a sequence of randomly acquired numbers, it must have two basic statistical properties: Uniformly distributed

More information

STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods Course code: EC40 Examiner: Lena Nekby Number of credits: 7,5 credits Date of exam: Saturday, May 9, 008 Examination time: 3

More information

STAT100 Elementary Statistics and Probability

STAT100 Elementary Statistics and Probability STAT100 Elementary Statistics and Probability Exam, Sample Test, Summer 014 Solution Show all work clearly and in order, and circle your final answers. Justify your answers algebraically whenever possible.

More information

(Re)introduction to Statistics Dan Lizotte

(Re)introduction to Statistics Dan Lizotte (Re)introduction to Statistics Dan Lizotte 2017-01-17 Statistics The systematic collection and arrangement of numerical facts or data of any kind; (also) the branch of science or mathematics concerned

More information

Final Exam Bus 320 Spring 2000 Russell

Final Exam Bus 320 Spring 2000 Russell Name Final Exam Bus 320 Spring 2000 Russell Do not turn over this page until you are told to do so. You will have 3 hours minutes to complete this exam. The exam has a total of 100 points and is divided

More information

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan COSC 341 Human Computer Interaction Dr. Bowen Hui University of British Columbia Okanagan 1 Last Class Introduced hypothesis testing Core logic behind it Determining results significance in scenario when:

More information

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015 Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences 18.30 21.15h, February 12, 2015 Question 1 is on this page. Always motivate your answers. Write your answers in English. Only the

More information

Lecture 8. October 22, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Lecture 8. October 22, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University. Lecture 8 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University October 22, 2007 1 2 3 4 5 6 1 Define convergent series 2 Define the Law of Large Numbers

More information

Stat Lecture 20. Last class we introduced the covariance and correlation between two jointly distributed random variables.

Stat Lecture 20. Last class we introduced the covariance and correlation between two jointly distributed random variables. Stat 260 - Lecture 20 Recap of Last Class Last class we introduced the covariance and correlation between two jointly distributed random variables. Today: We will introduce the idea of a statistic and

More information

ST 371 (IX): Theories of Sampling Distributions

ST 371 (IX): Theories of Sampling Distributions ST 371 (IX): Theories of Sampling Distributions 1 Sample, Population, Parameter and Statistic The major use of inferential statistics is to use information from a sample to infer characteristics about

More information

Estimating a population mean

Estimating a population mean Introductory Statistics Lectures Estimating a population mean Confidence intervals for means Department of Mathematics Pima Community College Redistribution of this material is prohibited without written

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

Math 101: Elementary Statistics Tests of Hypothesis

Math 101: Elementary Statistics Tests of Hypothesis Tests of Hypothesis Department of Mathematics and Computer Science University of the Philippines Baguio November 15, 2018 Basic Concepts of Statistical Hypothesis Testing A statistical hypothesis is an

More information

University of California, Berkeley, Statistics 131A: Statistical Inference for the Social and Life Sciences. Michael Lugo, Spring 2012

University of California, Berkeley, Statistics 131A: Statistical Inference for the Social and Life Sciences. Michael Lugo, Spring 2012 University of California, Berkeley, Statistics 3A: Statistical Inference for the Social and Life Sciences Michael Lugo, Spring 202 Solutions to Exam Friday, March 2, 202. [5: 2+2+] Consider the stemplot

More information

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example continued : Coin tossing Math 425 Intro to Probability Lecture 37 Kenneth Harris kaharri@umich.edu Department of Mathematics University of Michigan April 8, 2009 Consider a Bernoulli trials process with

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Inference for the mean of a population. Testing hypotheses about a single mean (the one sample t-test). The sign test for matched pairs

Inference for the mean of a population. Testing hypotheses about a single mean (the one sample t-test). The sign test for matched pairs Stat 528 (Autumn 2008) Inference for the mean of a population (One sample t procedures) Reading: Section 7.1. Inference for the mean of a population. The t distribution for a normal population. Small sample

More information

Final Exam STAT On a Pareto chart, the frequency should be represented on the A) X-axis B) regression C) Y-axis D) none of the above

Final Exam STAT On a Pareto chart, the frequency should be represented on the A) X-axis B) regression C) Y-axis D) none of the above King Abdul Aziz University Faculty of Sciences Statistics Department Final Exam STAT 0 First Term 49-430 A 40 Name No ID: Section: You have 40 questions in 9 pages. You have 90 minutes to solve the exam.

More information

What Is a Sampling Distribution? DISTINGUISH between a parameter and a statistic

What Is a Sampling Distribution? DISTINGUISH between a parameter and a statistic Section 8.1A What Is a Sampling Distribution? Learning Objectives After this section, you should be able to DISTINGUISH between a parameter and a statistic DEFINE sampling distribution DISTINGUISH between

More information

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you.

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you. ISQS 5347 Final Exam Spring 2017 Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you. 1. Recall the commute

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

Math 10 - Compilation of Sample Exam Questions + Answers

Math 10 - Compilation of Sample Exam Questions + Answers Math 10 - Compilation of Sample Exam Questions + Sample Exam Question 1 We have a population of size N. Let p be the independent probability of a person in the population developing a disease. Answer the

More information

Statistics and Quantitative Analysis U4320. Segment 5: Sampling and inference Prof. Sharyn O Halloran

Statistics and Quantitative Analysis U4320. Segment 5: Sampling and inference Prof. Sharyn O Halloran Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O Halloran Sampling A. Basics 1. Ways to Describe Data Histograms Frequency Tables, etc. 2. Ways to Characterize

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

Numerical Measures of Central Tendency

Numerical Measures of Central Tendency ҧ Numerical Measures of Central Tendency The central tendency of the set of measurements that is, the tendency of the data to cluster, or center, about certain numerical values; usually the Mean, Median

More information

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Mean vs.

More information

Lecture 30. DATA 8 Summer Regression Inference

Lecture 30. DATA 8 Summer Regression Inference DATA 8 Summer 2018 Lecture 30 Regression Inference Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu) Contributions by Fahad Kamran (fhdkmrn@berkeley.edu) and

More information