Chapter 6 Part 4. Confidence Intervals

Size: px
Start display at page:

Download "Chapter 6 Part 4. Confidence Intervals"

Transcription

1 Chapter 6 Part 4 Confidence Intervals October 1, 008

2 Goal: To clearly understand the link between probability distributions and confidence intervals. Skills: Be able to calculate (1 - α)% confidence interval for a sample mean both for the case that the population variance is known and the case that it is not known. Be able to accurately interpret a confidence interval. Contents: Central Limit Theorem Page Confidence interval using the normal distribution Page 5 Formula Page 8 What impacts the length of a CI Page 16 Stata commands: invnormal

3 Usually we study samples from a population rather than the population itself because it is not possible to get our hands on the whole population (e.g. it is too big, the process is too costly, frequently some of the members of the population we are interested in haven t even been born yet). We have agreed that when possible, we should select a random sample. We also know that when we select a random sample of size n for a study, it is just one of many possible samples of size that could have been selected from the population. n Suppose we want to know the average fasting triglycerides of the entire population of the U.S. that is 55 years old or older (55+). Some of the reasons why we ll have to select a sample would be: 1) usually the whole population is simply not available (e.g. the ALLHAT investigators were hoping that the results of their study would apply not only to those who were 55+ at the time of entry into the study but also to those who will later become 55+) and ) even in cases where the population is available (an unusual case) the cost and time involved to study a whole population tends to be prohibitive. So we ve decided to select a random sample from the population and use the mean of the fasting triglycerides of that sample to estimate the mean of the entire population. What we learned earlier when studying the sampling distribution of means is the following: Let be a random variable representing the distribution of the fasting triglycerides in the population of people aged 55+. Let the fasting triglycerides and represent the population mean for the population variance for the fasting triglycerides. Then we usually denote the random variable representing the sampling n distribution of means of samples of size by, the mean of the population of means by and the variance of the population of means by. If the size, n =, of the sample is large enough, we have 1) (Fact 1 from before) The mean of the original distribution is equal to the mean of the sampling distribution. Page -1-

4 ) = n where is called the standard error of the mean (SEM) - Fact from before. n Note that refers to variation related to a single sample and = n, the SEM, refers to variation among samples. 3) We also noticed that the larger the sample size n got, the more the distribution of those sample means looked like a normal distribution. The Central Limit Theorem states the following: Given the notation we have used above for the original population of fasting triglycerides and the notation for the sampling distribution of means, for large, n = is approximately normally distributed with mean = (Fact 1: ) and variance = (Fact : = ) regardless of the distribution of. n n If is distributed normally, then is also distributed normally (as opposed to approximately normally). Now our problem is, how do we know if the sample mean is a good estimate of the population mean. Let us say that the graph below is the distribution of the means for fasting triglycerides (AFTRIG) of all samples of size n from the U.S. population of those aged 55+. Looking at the histogram of the sampling distribution below we would probably be willing to say that the means represented by the bar on the far right end (the bar with square dots) of the distribution are not good estimates for the mean of the distribution of Page --

5 the original AFTRIG values because they are probably not what we would be willing to call close to the mean of the distribution of sample means (i.e. ). But what about the means represented by the striped bar in the graph below. This is where our problems begin. We are clearly going to need some sort of measure of how certain we are that the mean of our sample is a reasonable estimate of the population mean. This is where confidence intervals come in. Confidence intervals are going to be defined such that given a 95% confidence interval, we will be 95% confident that (and hence ) lies within our interval. So in obtaining a 95% confidence interval for, we will have also obtained an interval for the original population mean. Just as we have only one sample and one sample mean, we will have only one confidence interval based on that sample and its mean. If, however, we had all possible samples, we could get a confidence interval for the mean of each sample. Then the interpretation of the 95% confidence interval is that we are confident that 95% of these intervals contain the original population mean ( ). Page -3-

6 Looking at the graph below of the confidence intervals, we notice that 3 of the intervals (the dashed ones) do not contain the population mean. The very top confidence interval does not contain the mean because confidence intervals will be defined as open intervals (i.e. intervals that do not contain their endpoints). The other two dashed confidence intervals don t even come particularly close to the mean. 95% CI s for the sample means assuming we know = Each interval is centered about a sample mean. Each interval is the same length because is known. The intervals are all of the same length because (as we will show) the length of each interval depends on the sample size n (remember all samples from the sampling distribution have the same size) and on the size of when is known. We ll show later that when is not known, we can calculate the confidence interval using the sample estimate of, namely s. In this case the lengths of the samples will vary as s varies from sample to sample. There are actually 3 kinds of intervals that we can use: prediction, confidence and tolerance intervals. We won t do much with prediction and tolerance intervals until we get to regression, but I will describe all three kinds of intervals here. Page -4-

7 This example is taken from Forthofer and Lee s (007) book Biostatistics. Dairies add vitamin D to milk for the purpose of fortification. The recommended amount of vitamin D to be added to a quart of milk is 400 IUs (10 g). If a dairy adds too much vitamin D, perhaps over 5000 IUs, the amount of vitamin D could be toxic. A prediction interval focuses on a single observation of the variable - for example, the amount of vitamin D in the next bottle of milk. A confidence interval focuses on a population parameter - for example, the mean or median of vitamin D in a population of bottles of milk. Thus, the prediction interval is of more interest to the consumer of the next bottle of milk, whereas the confidence interval is of more interest to the dairy. A tolerance interval provides limits such that there is a high level of confidence that a large portion of the values of the variable will fall within them. For example, besides being interested in the mean, the dairy owner or regulatory agency also wants to be confident that for a large portion of the bottles the vitamin D contents are within a specified tolerance of the value of 400 IUs. So back to confidence intervals. The picture of the confidence intervals above is a nice graphic, but how do we actually calculate the confidence interval for our sample mean? Confidence Intervals Below we give the confidence interval for the random variable conditions that the random variable is normally distributed under the has an unknown mean and has a known variance It is not usually the case that we know confidence interval first. but we present this simplest version of the Page -5-

8 So let be the random variable associated with the sampling distribution of samples of size n drawn from the distribution with random variable. N(, ) The Central Limit Theorem says: for n large enough of the distribution of ) where = and. [ = approximately.] = n (regardless Density for N =, = n 95%.5%.5% Note that the areas and standard deviations in the graph above were derived under the assumption that is close enough to being normally distributed not make any difference. How did I decide that the area under the normal density associated with x-axis and between 196. and , above the is 95% of the total Page -6-

9 196. area under the curve. Well is 1.96 standard deviations ( ) [ ] N(, ) below the mean ( ) of the normal distribution and is 1.96 standard deviations above the mean. We learned earlier that from 1.96 standard deviations below the mean to 1.96 standard deviations above the mean cuts off 95% of the area under the curve for any normal distribution (i.e. this is part of what we learned when we showed that any normal distribution could be mapped into the standard normal distribution ). [ Z ~ N( 01, )] So for n large enough we have Equation 1 ( ) Pr 196. < < = 095. [Aside: Notice above that I have used < rather than because although it doesn t make any difference which you use in terms of the probability of a continuous distribution, confidence intervals are always written as open intervals.] But according to the Central Limit Theorem = and = n So Equation 1 becomes Pr 196. < < = 095. n n Equation Page -7-

10 But we want in the middle and on the ends, so we subtract across all parts of the inequality in Equation and get Pr 196. < < 196. = 095. n n Equation 3 Now subtract across all parts of the inequality in Equation 3 and get Pr 196. < < = 095. n n Equation 4 Now multiply by -1 across all parts of the inequality in Equation 4 (note this reverses the inequalities) Pr > > 196. = 095. n n Equation 5 Now just put the smaller endpoint of equation 5 on the left and the larger on the right. Pr 196. < < = 095. n n Equation 6 Below we switch from probability to confidence because is a random variable for which probability is appropriate but is the mean of a particular sample. Once we use the sample mean, the population mean probability is no longer appropriate. x x either is or is not in the interval and Page -8-

11 So our 95% confidence interval is x n x 196., n On the N(0,1) curve the area to the right of 1.96 is 0.05 or.5%. Or the area to the left of 1.96 is or z = z α = 005. This means we could denote 1.96 as. Or if we let, so that α / = z 1 α of the value of., then more generally we have. This pattern will work regardless α ( / ) Well what do we do about -1.96? We ll use. ( α / ) z 1 Therefore, the general form of the (1 - α )% confidence interval is x z n x + z, 1 ( α/ ) 1 ( α/ ) n Usually we don t have to work so hard to distinguish between and and their means and variances. This is because the random variable is not usually part of the conversation. We have only used it to derive the formula for the confidence interval. This means we can just say that the distribution for the random variable has mean and standard deviation. So the commonly used form of the (1 - )% α x z n x + z, 1 ( α/ ) 1 ( α/ ) n confidence interval is Page -9-

12 In the above formula x is the mean of a single sample and is not a random variable. α α = α = 090. α / = 005. The confidence for the interval above is 1 -. So if, then and we would have a 90% confidence interval. So equal to 0.05 is cut off each end of the distribution.. Therefore, an area The length of the confidence interval is 1 α z ( / ) n As we select different samples of size n, we get different values for. So the location of the confidence interval changes. However, the length of the confidence interval remains the same (this is because is known) and the samples are all of size n. x Find the 95% confidence interval for the baseline heart rate in beats/min for the Propranolol treatment group (Cardiology Problem 6.81 on page of Rosner), also see original description of the problem in Cardiovascular Disease on page 157). Let us suppose that the standard deviation of the baseline heart rate for Propranolol is known and is equal to 17 beats/minute. The Stata data set for this problem is nifed.dta. Page -10-

13 . des Contains data from C:\Stata\StataData\Myfiles\BiostatFall003\Data\nifed.dta obs: 34 vars: 10 Oct 00 0:53 size: 1,496 (99.9% of memory free) storage display value variable name type format label variable label id float %1.0g trtgrp float %11.0g trt Treatment Group heartlv0 float %1.0g Baseline Heart Rate beats/min heartlv1 float %1.0g Level 1 Heart Rate beats/min heartlv float %1.0g Level Heart Rate beats/min heartlv3 float %1.0g Level 3 Heart Rate beats/min syslv0 float %1.0g Baseline Systolic Blood Pressure mmhg syslv1 float %1.0g Level 1 Systolic Blood Pressure mmhg syslv float %1.0g Level Systolic Blood Pressure mmhg syslv3 float %1.0g Level 3 Systolic Blood Pressure mmhg tab trtgrp Treatment Group Freq. Percent Cum nifedipine propranolol Total label list trt: 0 nifedipine 1 propranolol Since we have not used this data set before, I have run codebook for treatment group and for baseline heart rate so we can see what we have. Page -11-

14 . codebook trtgrp Treatment Group type: numeric (float) label: trt range: [0,1] units: 1 unique values: missing.: 0/34 tabulation: Freq. Numeric Label 18 0 nifedipine 16 1 propranolol heartlv0 Baseline Heart Rate beats/min type: numeric (float) range: [51,116] units: 1 unique values: 1 missing.: 0/34 mean: std. dev: percentiles: 10% 5% 50% 75% 90% The baseline heart rate in beats/minute is denoted heartlv0 and trtgrp = 1 is the propranolol treatment group.. sum(heartlv0) if trtgrp == 1 Variable Obs Mean Std. Dev. Min Max heartlv So x = and = 17 (i.e. we don t use s = because is known). Since we are assuming n is large enough to assume normality, the 95% confidence Page -1-

15 interval is , = (68.48, 85.14) We are confident that 95% of all such confidence intervals cover, the mean of the population (i.e. all people treated with Propranolol) baseline heart rate. That is what we mean when we say we are 95% confident that lies between and When assuming normality our equation for the confidence interval implies that the confidence interval is centered about the sample mean. So when you are carefully double-checking your work, you ll want to make sure that the confidence interval you have gotten actually contains the sample mean. What impacts the length of the confidence interval? Remember that the length of the confidence interval is z 1 ( α / ) n 1) Sample size n As n increases, the length of the confidence interval decreases. So there is an inverse relationship between the sample size n and the length of the confidence interval. Note that shorter confidence intervals are better. x and y are inversely related if one increases as the other decreases. So there is an inverse relationship between the size of n and the length of the confidence interval. ) The standard deviation or variance. Page -13-

16 As the standard deviation or variance increases, the length of the confidence interval increases. So there is a direct relationship between the size of and the length of the confidence interval. x and y are directly related if they both increase or they both decrease. 3) The α -level. α As increases (meaning the confidence decreases), the length of the confidence interval decreases. So there is an inverse relationship between the size α and the length of the confidence interval. Let us use the function invnormal(p) = z where p is the probability or area and z is the cutoff. We can write the equation as invnormal(1 - ( α /)) = z. Suppose that α = 0.05 (i.e. we are talking about a 95% confidence interval). This means that an area of 0.05 will be cut off on each end of the normal distribution. So we have 1 - ( α /) = = di invnormal(1-(0.05/)) or. di invnormal(0.975) α z = z = 196. So for = 0.05, 1 ( α / ) Page -14-

17 If α = 0.10, then 1 - ( α /) = = di invnormal(1 - (0.10/)) or. di invnormal(0.95) So z1 ( α / ) = z095. = 164. So α 1 = 0.05 produces a z value of 1.96 and α = 0.10 produces a z value of 1.64 So the larger of the two α s produces the smaller z value and hence the shorter confidence interval. If α = 0.05, then we have a 95% [i.e. (1 - α )%] confidence interval. If α = 0.10, then we have a 90% confidence interval. So less confidence and shorter confidence intervals go together. Page -15-

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67 Chapter 6 The Standard Deviation as a Ruler and the Normal Model 1 /67 Homework Read Chpt 6 Complete Reading Notes Do P129 1, 3, 5, 7, 15, 17, 23, 27, 29, 31, 37, 39, 43 2 /67 Objective Students calculate

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

POL 681 Lecture Notes: Statistical Interactions

POL 681 Lecture Notes: Statistical Interactions POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship

More information

Solving Linear and Rational Inequalities Algebraically. Definition 22.1 Two inequalities are equivalent if they have the same solution set.

Solving Linear and Rational Inequalities Algebraically. Definition 22.1 Two inequalities are equivalent if they have the same solution set. Inequalities Concepts: Equivalent Inequalities Solving Linear and Rational Inequalities Algebraically Approximating Solutions to Inequalities Graphically (Section 4.4).1 Equivalent Inequalities Definition.1

More information

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Chapter 6 The Standard Deviation as a Ruler and the Normal Model Chapter 6 The Standard Deviation as a Ruler and the Normal Model Overview Key Concepts Understand how adding (subtracting) a constant or multiplying (dividing) by a constant changes the center and/or spread

More information

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n = Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,

More information

Week 11 Sample Means, CLT, Correlation

Week 11 Sample Means, CLT, Correlation Week 11 Sample Means, CLT, Correlation Slides by Suraj Rampure Fall 2017 Administrative Notes Complete the mid semester survey on Piazza by Nov. 8! If 85% of the class fills it out, everyone will get a

More information

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up? Comment: notes are adapted from BIOL 214/312. I. Correlation. Correlation A) Correlation is used when we want to examine the relationship of two continuous variables. We are not interested in prediction.

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

Stat 20 Midterm 1 Review

Stat 20 Midterm 1 Review Stat 20 Midterm Review February 7, 2007 This handout is intended to be a comprehensive study guide for the first Stat 20 midterm exam. I have tried to cover all the course material in a way that targets

More information

Section 5.4. Ken Ueda

Section 5.4. Ken Ueda Section 5.4 Ken Ueda Students seem to think that being graded on a curve is a positive thing. I took lasers 101 at Cornell and got a 92 on the exam. The average was a 93. I ended up with a C on the test.

More information

Chapter 23. Inference About Means

Chapter 23. Inference About Means Chapter 23 Inference About Means 1 /57 Homework p554 2, 4, 9, 10, 13, 15, 17, 33, 34 2 /57 Objective Students test null and alternate hypotheses about a population mean. 3 /57 Here We Go Again Now that

More information

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2: Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2: 03 17 08 3 All about lines 3.1 The Rectangular Coordinate System Know how to plot points in the rectangular coordinate system. Know the

More information

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1 Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression

More information

Multidimensional Data

Multidimensional Data A&F in STATA Multidimensional Data Matrix of achievement scores for 4 households in 6 dimensions Water Toilet Scho Atte Floor Elec piped flush 8 yes cement no y = protect no 3 no earth yes surface bucket

More information

Confidence Intervals. - simply, an interval for which we have a certain confidence.

Confidence Intervals. - simply, an interval for which we have a certain confidence. Confidence Intervals I. What are confidence intervals? - simply, an interval for which we have a certain confidence. - for example, we are 90% certain that an interval contains the true value of something

More information

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b). Confidence Intervals 1) What are confidence intervals? Simply, an interval for which we have a certain confidence. For example, we are 90% certain that an interval contains the true value of something

More information

Math 5a Reading Assignments for Sections

Math 5a Reading Assignments for Sections Math 5a Reading Assignments for Sections 4.1 4.5 Due Dates for Reading Assignments Note: There will be a very short online reading quiz (WebWork) on each reading assignment due one hour before class on

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics

More information

Business Statistics. Lecture 5: Confidence Intervals

Business Statistics. Lecture 5: Confidence Intervals Business Statistics Lecture 5: Confidence Intervals Goals for this Lecture Confidence intervals The t distribution 2 Welcome to Interval Estimation! Moments Mean 815.0340 Std Dev 0.8923 Std Error Mean

More information

P8130: Biostatistical Methods I

P8130: Biostatistical Methods I P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

Graphing Linear Inequalities

Graphing Linear Inequalities Graphing Linear Inequalities Linear Inequalities in Two Variables: A linear inequality in two variables is an inequality that can be written in the general form Ax + By < C, where A, B, and C are real

More information

sociology 362 regression

sociology 362 regression sociology 36 regression Regression is a means of modeling how the conditional distribution of a response variable (say, Y) varies for different values of one or more independent explanatory variables (say,

More information

Probability Distributions

Probability Distributions CONDENSED LESSON 13.1 Probability Distributions In this lesson, you Sketch the graph of the probability distribution for a continuous random variable Find probabilities by finding or approximating areas

More information

Section 2.3: One Quantitative Variable: Measures of Spread

Section 2.3: One Quantitative Variable: Measures of Spread Section 2.3: One Quantitative Variable: Measures of Spread Objectives: 1) Measures of spread, variability a. Range b. Standard deviation i. Formula ii. Notation for samples and population 2) The 95% rule

More information

Statistical Concepts. Constructing a Trend Plot

Statistical Concepts. Constructing a Trend Plot Module 1: Review of Basic Statistical Concepts 1.2 Plotting Data, Measures of Central Tendency and Dispersion, and Correlation Constructing a Trend Plot A trend plot graphs the data against a variable

More information

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2004

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2004 UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 50W - Introduction to Biostatistics Fall 00 Exercises with Solutions Topic Summarizing Data Due: Monday September 7, 00 READINGS.

More information

Ch. 7: Estimates and Sample Sizes

Ch. 7: Estimates and Sample Sizes Ch. 7: Estimates and Sample Sizes Section Title Notes Pages Introduction to the Chapter 2 2 Estimating p in the Binomial Distribution 2 5 3 Estimating a Population Mean: Sigma Known 6 9 4 Estimating a

More information

EXPERIMENT: REACTION TIME

EXPERIMENT: REACTION TIME EXPERIMENT: REACTION TIME OBJECTIVES to make a series of measurements of your reaction time to make a histogram, or distribution curve, of your measured reaction times to calculate the "average" or "mean"

More information

a table or a graph or an equation.

a table or a graph or an equation. Topic (8) POPULATION DISTRIBUTIONS 8-1 So far: Topic (8) POPULATION DISTRIBUTIONS We ve seen some ways to summarize a set of data, including numerical summaries. We ve heard a little about how to sample

More information

INTRODUCTION TO ANALYSIS OF VARIANCE

INTRODUCTION TO ANALYSIS OF VARIANCE CHAPTER 22 INTRODUCTION TO ANALYSIS OF VARIANCE Chapter 18 on inferences about population means illustrated two hypothesis testing situations: for one population mean and for the difference between two

More information

form and solve simultaneous equations

form and solve simultaneous equations form and solve simultaneous equations Skills that will help you to understand, work with and solve formulae and equations include the four rules of number (including working with very large and very small

More information

A Little Stats Won t Hurt You

A Little Stats Won t Hurt You A Little Stats Won t Hurt You Nate Derby Statis Pro Data Analytics Seattle, WA, USA Edmonton SAS Users Group, 11/13/09 Nate Derby A Little Stats Won t Hurt You 1 / 71 Outline Introduction 1 Introduction

More information

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users BIOSTATS 640 Spring 2017 Review of Introductory Biostatistics STATA solutions Page 1 of 16 Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users #1. The following table lists length of

More information

GMAT Arithmetic: Challenge (Excerpt)

GMAT Arithmetic: Challenge (Excerpt) GMAT Arithmetic: Challenge (Excerpt) Jeff Sackmann / GMAT HACKS January 2013 Contents 1 Introduction 2 2 Difficulty Levels 3 3 Problem Solving 4 4 Data Sufficiency 5 5 Answer Key 7 6 Explanations 8 1 1

More information

ABSOLUTE VALUE EQUATIONS AND INEQUALITIES

ABSOLUTE VALUE EQUATIONS AND INEQUALITIES ABSOLUTE VALUE EQUATIONS AND INEQUALITIES The absolute value of a number is the magnitude of the number without regard to the sign of the number. Absolute value is indicated by vertical lines and is always

More information

1 Review of the dot product

1 Review of the dot product Any typographical or other corrections about these notes are welcome. Review of the dot product The dot product on R n is an operation that takes two vectors and returns a number. It is defined by n u

More information

Multiple Regression: Inference

Multiple Regression: Inference Multiple Regression: Inference The t-test: is ˆ j big and precise enough? We test the null hypothesis: H 0 : β j =0; i.e. test that x j has no effect on y once the other explanatory variables are controlled

More information

Chapter 6 The Normal Distribution

Chapter 6 The Normal Distribution Chapter 6 The Normal PSY 395 Oswald Outline s and area The normal distribution The standard normal distribution Setting probable limits on a score/observation Measures related to 2 s and Area The idea

More information

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only

More information

Calculus II. Calculus II tends to be a very difficult course for many students. There are many reasons for this.

Calculus II. Calculus II tends to be a very difficult course for many students. There are many reasons for this. Preface Here are my online notes for my Calculus II course that I teach here at Lamar University. Despite the fact that these are my class notes they should be accessible to anyone wanting to learn Calculus

More information

One-sample categorical data: approximate inference

One-sample categorical data: approximate inference One-sample categorical data: approximate inference Patrick Breheny October 6 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction It is relatively easy to think about the distribution

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

1. AN INTRODUCTION TO DESCRIPTIVE STATISTICS. No great deed, private or public, has ever been undertaken in a bliss of certainty.

1. AN INTRODUCTION TO DESCRIPTIVE STATISTICS. No great deed, private or public, has ever been undertaken in a bliss of certainty. CIVL 3103 Approximation and Uncertainty J.W. Hurley, R.W. Meier 1. AN INTRODUCTION TO DESCRIPTIVE STATISTICS No great deed, private or public, has ever been undertaken in a bliss of certainty. - Leon Wieseltier

More information

Please bring the task to your first physics lesson and hand it to the teacher.

Please bring the task to your first physics lesson and hand it to the teacher. Pre-enrolment task for 2014 entry Physics Why do I need to complete a pre-enrolment task? This bridging pack serves a number of purposes. It gives you practice in some of the important skills you will

More information

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types

More information

ANOVA: Analysis of Variation

ANOVA: Analysis of Variation ANOVA: Analysis of Variation The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative variables depend on which group (given by categorical

More information

The Normal Distribution. Chapter 6

The Normal Distribution. Chapter 6 + The Normal Distribution Chapter 6 + Applications of the Normal Distribution Section 6-2 + The Standard Normal Distribution and Practical Applications! We can convert any variable that in normally distributed

More information

Review of the Normal Distribution

Review of the Normal Distribution Sampling and s Normal Distribution Aims of Sampling Basic Principles of Probability Types of Random Samples s of the Mean Standard Error of the Mean The Central Limit Theorem Review of the Normal Distribution

More information

MEASURES OF LOCATION AND SPREAD

MEASURES OF LOCATION AND SPREAD MEASURES OF LOCATION AND SPREAD Frequency distributions and other methods of data summarization and presentation explained in the previous lectures provide a fairly detailed description of the data and

More information

Algebra I Chapter 6: Solving and Graphing Linear Inequalities

Algebra I Chapter 6: Solving and Graphing Linear Inequalities Algebra I Chapter 6: Solving and Graphing Linear Inequalities Jun 10 9:21 AM Chapter 6 Lesson 1 Solve Inequalities Using Addition and Subtraction Vocabulary Words to Review: Inequality Solution of an Inequality

More information

Practical Algebra. A Step-by-step Approach. Brought to you by Softmath, producers of Algebrator Software

Practical Algebra. A Step-by-step Approach. Brought to you by Softmath, producers of Algebrator Software Practical Algebra A Step-by-step Approach Brought to you by Softmath, producers of Algebrator Software 2 Algebra e-book Table of Contents Chapter 1 Algebraic expressions 5 1 Collecting... like terms 5

More information

STATISTICS 1 REVISION NOTES

STATISTICS 1 REVISION NOTES STATISTICS 1 REVISION NOTES Statistical Model Representing and summarising Sample Data Key words: Quantitative Data This is data in NUMERICAL FORM such as shoe size, height etc. Qualitative Data This is

More information

20 Hypothesis Testing, Part I

20 Hypothesis Testing, Part I 20 Hypothesis Testing, Part I Bob has told Alice that the average hourly rate for a lawyer in Virginia is $200 with a standard deviation of $50, but Alice wants to test this claim. If Bob is right, she

More information

Two sided, two sample t-tests. a) IQ = 100 b) Average height for men = c) Average number of white blood cells per cubic millimeter is 7,000.

Two sided, two sample t-tests. a) IQ = 100 b) Average height for men = c) Average number of white blood cells per cubic millimeter is 7,000. Two sided, two sample t-tests. I. Brief review: 1) We are interested in how a sample compares to some pre-conceived notion. For example: a) IQ = 100 b) Average height for men = 5 10. c) Average number

More information

A C E. Answers Investigation 4. Applications

A C E. Answers Investigation 4. Applications Answers Applications 1. 1 student 2. You can use the histogram with 5-minute intervals to determine the number of students that spend at least 15 minutes traveling to school. To find the number of students,

More information

Section 1.4 Solving Other Types of Equations

Section 1.4 Solving Other Types of Equations M141 - Chapter 1 Lecture Notes Page 1 of 27 Section 1.4 Solving Other Types of Equations Objectives: Given a radical equation, solve the equation and check the solution(s). Given an equation that can be

More information

Math Fundamentals for Statistics I (Math 52) Unit 7: Connections (Graphs, Equations and Inequalities)

Math Fundamentals for Statistics I (Math 52) Unit 7: Connections (Graphs, Equations and Inequalities) Math Fundamentals for Statistics I (Math 52) Unit 7: Connections (Graphs, Equations and Inequalities) By Scott Fallstrom and Brent Pickett The How and Whys Guys This work is licensed under a Creative Commons

More information

Solving Polynomial and Rational Inequalities Algebraically. Approximating Solutions to Inequalities Graphically

Solving Polynomial and Rational Inequalities Algebraically. Approximating Solutions to Inequalities Graphically 10 Inequalities Concepts: Equivalent Inequalities Solving Polynomial and Rational Inequalities Algebraically Approximating Solutions to Inequalities Graphically (Section 4.6) 10.1 Equivalent Inequalities

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Last few slides from last time

Last few slides from last time Last few slides from last time Example 3: What is the probability that p will fall in a certain range, given p? Flip a coin 50 times. If the coin is fair (p=0.5), what is the probability of getting an

More information

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam June 8 th, 2016: 9am to 1pm Instructions: 1. This is exam is to be completed independently. Do not discuss your work with

More information

Section 20: Arrow Diagrams on the Integers

Section 20: Arrow Diagrams on the Integers Section 0: Arrow Diagrams on the Integers Most of the material we have discussed so far concerns the idea and representations of functions. A function is a relationship between a set of inputs (the leave

More information

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 7 Time-dependent Covariates in Cox Regression Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the

More information

LECTURE 15: SIMPLE LINEAR REGRESSION I

LECTURE 15: SIMPLE LINEAR REGRESSION I David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).

More information

Properties of Arithmetic

Properties of Arithmetic Excerpt from "Prealgebra" 205 AoPS Inc. 4 6 7 4 5 8 22 23 5 7 0 Arithmetic is being able to count up to twenty without taking o your shoes. Mickey Mouse CHAPTER Properties of Arithmetic. Why Start with

More information

BIOSTATS 540 Fall 2016 Exam 1 (Unit 1 Summarizing Data) Page 1 of 7

BIOSTATS 540 Fall 2016 Exam 1 (Unit 1 Summarizing Data) Page 1 of 7 BIOSTATS 540 Fall 2016 Exam 1 (Unit 1 Summarizing Data) Page 1 of 7 BIOSTATS 540 - Introductory Biostatistics Fall 2016 Examination 1 (Unit 1 Summarizing Data) Due: Monday September 26, 2016 Last Date

More information

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests: One sided tests So far all of our tests have been two sided. While this may be a bit easier to understand, this is often not the best way to do a hypothesis test. One simple thing that we can do to get

More information

Alex s Guide to Word Problems and Linear Equations Following Glencoe Algebra 1

Alex s Guide to Word Problems and Linear Equations Following Glencoe Algebra 1 Alex s Guide to Word Problems and Linear Equations Following Glencoe Algebra 1 What is a linear equation? It sounds fancy, but linear equation means the same thing as a line. In other words, it s an equation

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Manipulating Equations

Manipulating Equations Manipulating Equations Now that you know how to set up an equation, the next thing you need to do is solve for the value that the question asks for. Above all, the most important thing to remember when

More information

6.5 Systems of Inequalities

6.5 Systems of Inequalities 6.5 Systems of Inequalities Linear Inequalities in Two Variables: A linear inequality in two variables is an inequality that can be written in the general form Ax + By < C, where A, B, and C are real numbers

More information

Performance Evaluation

Performance Evaluation Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,

More information

determine whether or not this relationship is.

determine whether or not this relationship is. Section 9-1 Correlation A correlation is a between two. The data can be represented by ordered pairs (x,y) where x is the (or ) variable and y is the (or ) variable. There are several types of correlations

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency The word average: is very ambiguous and can actually refer to the mean, median, mode or midrange. Notation:

More information

Quadratic Equations Part I

Quadratic Equations Part I Quadratic Equations Part I Before proceeding with this section we should note that the topic of solving quadratic equations will be covered in two sections. This is done for the benefit of those viewing

More information

Working with Stata Inference on the mean

Working with Stata Inference on the mean Working with Stata Inference on the mean Nicola Orsini Biostatistics Team Department of Public Health Sciences Karolinska Institutet Dataset: hyponatremia.dta Motivating example Outcome: Serum sodium concentration,

More information

#29: Logarithm review May 16, 2009

#29: Logarithm review May 16, 2009 #29: Logarithm review May 16, 2009 This week we re going to spend some time reviewing. I say re- view since you ve probably seen them before in theory, but if my experience is any guide, it s quite likely

More information

Lecture 10: The Normal Distribution. So far all the random variables have been discrete.

Lecture 10: The Normal Distribution. So far all the random variables have been discrete. Lecture 10: The Normal Distribution 1. Continuous Random Variables So far all the random variables have been discrete. We need a different type of model (called a probability density function) for continuous

More information

Chapter 5 Formulas Distribution Formula Characteristics n. π is the probability Function. x trial and n is the. where x = 0, 1, 2, number of trials

Chapter 5 Formulas Distribution Formula Characteristics n. π is the probability Function. x trial and n is the. where x = 0, 1, 2, number of trials SPSS Program Notes Biostatistics: A Guide to Design, Analysis, and Discovery Second Edition by Ronald N. Forthofer, Eun Sul Lee, Mike Hernandez Chapter 5: Probability Distributions Chapter 5 Formulas Distribution

More information

Mathematician Preemptive Strike 2 Due on Day 1 of Chapter 2

Mathematician Preemptive Strike 2 Due on Day 1 of Chapter 2 Mathematician Preemptive Strike Due on Day 1 of Chapter Your first experience with a Wiggle chart. Not to be confused with anything related to the Wiggles Note: It s also called a Sign chart, but since

More information

Chapter 9: Roots and Irrational Numbers

Chapter 9: Roots and Irrational Numbers Chapter 9: Roots and Irrational Numbers Index: A: Square Roots B: Irrational Numbers C: Square Root Functions & Shifting D: Finding Zeros by Completing the Square E: The Quadratic Formula F: Quadratic

More information

Functions. If x 2 D, then g(x) 2 T is the object that g assigns to x. Writing the symbols. g : D! T

Functions. If x 2 D, then g(x) 2 T is the object that g assigns to x. Writing the symbols. g : D! T Functions This is the second of three chapters that are meant primarily as a review of Math 1050. We will move quickly through this material. A function is a way of describing a relationship between two

More information

The Basics COPYRIGHTED MATERIAL. chapter. Algebra is a very logical way to solve

The Basics COPYRIGHTED MATERIAL. chapter. Algebra is a very logical way to solve chapter 1 The Basics Algebra is a very logical way to solve problems both theoretically and practically. You need to know a number of things. You already know arithmetic of whole numbers. You will review

More information

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom Central Limit Theorem and the Law of Large Numbers Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Understand the statement of the law of large numbers. 2. Understand the statement of the

More information

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math. Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if

More information

ALGEBRAIC PRINCIPLES

ALGEBRAIC PRINCIPLES ALGEBRAIC PRINCIPLES Numbers and Operations Standard: 1 Understands and applies concepts of numbers and operations Power 1: Understands numbers, ways of representing numbers, relationships among numbers,

More information

Chapter 7. Practice Exam Questions and Solutions for Final Exam, Spring 2009 Statistics 301, Professor Wardrop

Chapter 7. Practice Exam Questions and Solutions for Final Exam, Spring 2009 Statistics 301, Professor Wardrop Practice Exam Questions and Solutions for Final Exam, Spring 2009 Statistics 301, Professor Wardrop Chapter 6 1. A random sample of size n = 452 yields 113 successes. Calculate the 95% confidence interval

More information

BIOSTATISTICS NURS 3324

BIOSTATISTICS NURS 3324 Simple Linear Regression and Correlation Introduction Previously, our attention has been focused on one variable which we designated by x. Frequently, it is desirable to learn something about the relationship

More information

MATH 308 COURSE SUMMARY

MATH 308 COURSE SUMMARY MATH 308 COURSE SUMMARY Approximately a third of the exam cover the material from the first two midterms, that is, chapter 6 and the first six sections of chapter 7. The rest of the exam will cover the

More information

Introduction to Survey Analysis!

Introduction to Survey Analysis! Introduction to Survey Analysis! Professor Ron Fricker! Naval Postgraduate School! Monterey, California! Reading Assignment:! 2/22/13 None! 1 Goals for this Lecture! Introduction to analysis for surveys!

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

Analytical Graphing. lets start with the best graph ever made

Analytical Graphing. lets start with the best graph ever made Analytical Graphing lets start with the best graph ever made Probably the best statistical graphic ever drawn, this map by Charles Joseph Minard portrays the losses suffered by Napoleon's army in the Russian

More information

SESSION 5 Descriptive Statistics

SESSION 5 Descriptive Statistics SESSION 5 Descriptive Statistics Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Exam: practice test 1 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Exam: practice test 1 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Exam: practice test MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Solve the problem. ) Using the information in the table on home sale prices in

More information

Honors Advanced Mathematics Determinants page 1

Honors Advanced Mathematics Determinants page 1 Determinants page 1 Determinants For every square matrix A, there is a number called the determinant of the matrix, denoted as det(a) or A. Sometimes the bars are written just around the numbers of the

More information

Objectives: Review open, closed, and mixed intervals, and begin discussion of graphing points in the xyplane. Interval notation

Objectives: Review open, closed, and mixed intervals, and begin discussion of graphing points in the xyplane. Interval notation MA 0090 Section 18 - Interval Notation and Graphing Points Objectives: Review open, closed, and mixed intervals, and begin discussion of graphing points in the xyplane. Interval notation Last time, we

More information

Math Lecture 3 Notes

Math Lecture 3 Notes Math 1010 - Lecture 3 Notes Dylan Zwick Fall 2009 1 Operations with Real Numbers In our last lecture we covered some basic operations with real numbers like addition, subtraction and multiplication. This

More information