Math 143: Introduction to Biostatistics

Size: px
Start display at page:

Download "Math 143: Introduction to Biostatistics"

Transcription

1 Math 143: Introduction to Biostatistics R Pruim Spring 2012

2 0.2 Last Modified: April 2, 2012 Math 143 : Spring 2012 : Pruim

3 Contents 10 More About Random Variables The Mean of a Random Variable The Variance of a Random Variable Expected Value and Variance for Combinations Inference for One Mean The Long Way Super Short Cuts Confidence Intervals for Proportions Comparing Two Means Summary of Methods Paired t Two Sample T

4 9.4 Last Modified: April 2, 2012 Math 143 : Spring 2012 : Pruim

5 More About Random Variables More About Random Variables 10.1 The Mean of a Random Variable A motivating example: GPA computation Let s begin with a motivating example. Suppose a student has taken 10 courses and received 5 A s, 4 B s, and 1 C. Using the traditional numerical scale where an A is worth 4, a B is worth 3, and a C is worth 2, what is this student s GPA (grade point average)? The first thing to notice is that = 3 is not correct. We cannot simply add up the values and divide by the number of values. Clearly this student should have a GPA that is higher than 3.0, since there were more A s than C s. Consider now a correct way to do this calculation: GPA = = = = = 3.4. Our definition of the mean of a random variable follows the example above. Notice that we can think of the GPA as a sum of products: GPA = (grade)(probability of getting that grade). Such a sum is often called a weighted sum or weighted average of the grades (the probabilities are the weights). The expected value of a discrete random variable is a similar weighted average of its possible values. Math 143 : Spring 2012 : Pruim Last Modified: April 2, 2012

6 10.2 More About Random Variables Let X be a discrete random variable with pmf f. The mean (also called expected value) of X is denoted as µ X or E(X) and is defined by µ X = E(X) = x x Pr(X = x). The sum is taken over all possible values of X Example: Daily 3 Lottery If you play the Daily 3 lottery game straight you pick a three-digit number and if it exactly matches the three-digit number randomly chosen by the lottery commission, you win $500. What is the expected value of lottery ticket? Let X = the value of the ticket. Then value of X probability 999/1000 1/1000 because there is only one winning combination. So E(X) = = 0.50 So the expected value of the ticket is 50c. The lottery commission charges $1 to play the game. This means the lottery commission averages a gain of 50c per play. (And those who play lose 50c on average per play.) Another Example: Four Coins If we flip four fair coins and let X count the number of heads, what is E(X)? Recall that if we flip four fair coins and let X count the number of heads, then the distribution of X is described by the following table: value of X probability So the expected value is = 2. On average we get 2 heads in 4 tosses. Last Modified: April 2, 2012 Math 143 : Spring 2012 : Pruim

7 More About Random Variables The Variance of a Random Variable The variance of a random variable is computed very much like the mean. We just replace the values of the distribution with (value expected value) 2 In other words, the variance is the expected value of (X E(X)) 2. The standard deviation is the square root of the variance Example: GPA Returning to our GPA example value probability Recall that the expected value is 3.4. We can compute the variance as follows. variance = (4 3.4) (3 3.4) (2 3.4) = Example: Bin(2,.05) Let X Binom(2, 0.5). We can compute the expected value and the variance, recalling our probability table: value of X probability and E(X) = = = 1.0. Var(X) = (0 1) (1 1) (2 1) = = Example Find the mean and variance of the random variable Y described below: value of Y probability Math 143 : Spring 2012 : Pruim Last Modified: April 2, 2012

8 10.4 More About Random Variables Bernoulli Random Variable If X Binom(1, p), then X is called a Bernoulli random variable with probability p. For a Bernoulli random variable X, what are E(X) and Var(X)? We begin by building the probability table for this variable: value of X 0 1 probability 1 p p Now we do the calculations: and E(X) = 0 (1 p) + 1 p = p Var(X)(0 p) 2 (1 p) + (1 p) 2 p = p 2 (1 p) + (1 p) 2 p = p(1 p)(p + (1 p)) = p(1 p). We can calculate the variance directly from the definition: Var(X) = (0 p) 2 (1 p) + (1 p) 2 p = p 2 (1 p) + p(1 p) 2 = p(1 p) [(1 p) + p] = p(1 p). As a function of p, Var(X) is quadratic. Its graph is a parabola that opens downward, and since Var(X) = 0 when p = 0 and when p = 1, the largest variance occurs when p = 1 2. plotfun(p * (1 - p) ~ p, p.lim = c(0, 1)) p * (1 p) p Any random variable that only takes on the values 0 and 1 is a Bernoulli random variable. Bernoulli random variables will play an important role in determining the variance of a general binomial random variable. Last Modified: April 2, 2012 Math 143 : Spring 2012 : Pruim

9 More About Random Variables Expected Value and Variance for Combinations The following rules often make calculating means and expected values much simpler. Let X and Y be random variables and let a be a constant. Then 1. E(a + X) = a + E(X) 2. E(aX) = a E(X) 3. E(X + Y ) = E(X) + E(Y ) 4. E(X Y ) = E(X) + E(Y ) 5. Var(a + X) = Var(X) 6. Var(aX) = a 2 Var(X) 7. Var(X + Y ) = Var(X) + Var(Y ), provided X and Y are independent. (We ll call this the Pythagorean theorem for variance.) Example Suppose X and Y are independent and satisfy X Y mean standard deviation 3 4 Then E(X + Y ) = E(X) + E(Y ) = = 110 Var(X + Y ) = Var(X) + Var(Y ) = = 25 So the standard deviation of X is Binomial Random Varaibles We can now determine the mean and variance of any Binomial random variable. Let X Binom(n, p). Then X = X 1 + X X n where each X i Binom(1, p). So and E(X) = E(X 1 ) + E(X 2 ) + E(X n ) = p + p + + p = np Var(X) = Var(X 1 ) + Var(X 2 ) + Var(X n ) = p(1 p) + p(1 p) + + p(1 p) = np(1 p) Math 143 : Spring 2012 : Pruim Last Modified: April 2, 2012

10 10.6 More About Random Variables Normal Distributions in RStudio The two main functions we need for working with normal distributions are pnorm() and qnorm(). pnorm() works just like pbinom(): when X Norm(µ, σ). pbinom(x,mean=µ, sd=σ) = Pr(X x) # P( X <= 700); X ~ Norm(500, 100) pnorm(700, 500, 100) [1] # P( X >= 700); X ~ Norm(500, 100) 1 - pnorm(700, 500, 100) [1] qnorm() goes the other direction. You provide the quantile (percentile expressed as a decimal) and R gives you the value. # find 80th percentile in Norm(500, 100) qnorm(0.8, 500, 100) [1] The xpnorm() function gives a bit more verbose output and also gives you a picture. xpnorm(700, 500, 100) If X ~ N(500,100), then P(X <= 700) = P(Z <= 2) = P(X > 700) = P(Z > 2) = density (z=2) [1] Last Modified: April 2, 2012 Math 143 : Spring 2012 : Pruim

11 More About Random Variables 10.7 Exercises 1 Consider the random variable X with probabilities given in the table below: x P (X = x) a) What is the expected value of X? b) What is the variance of X? 2 Two fair dice (6-sided) are to be rolled. Let Y be the larger of the two values rolled. a) What is P (Y = 1)? b) What is P (Y = 6)? c) What is P (Y = 1 Y 6)? d) Compute E(Y ), the expected value of Y. 3 Two fair four-sided dice are to be rolled. Let Y be the larger of the two values rolled. a) What is P (Y = 1)? b) What is P (Y = 4)? c) What is P (Y = 1 Y 4)? d) Compute E(Y ), the expected value of Y. 4 Consider the game of roulette. An American roullette wheel has slots numbered 1 through 36 on it, half red and half black. In addition there are two green slots (numbered 0 and 00). That makes 38 slots altogether. A $1 wager on black returns the wager plus an additional $1 if a black number comes up, else the wager is lost. (So a player either wins $1 or loses $1.) a) What is the expected profit per roulette play for a casino? (This is, of course, also the expected loss per play for the players.) b) Suppose a casino estimates that it costs $50 per hour to run the roulette table. (They have to pay for heat and lights, for people to run the table, etc.) How much money must be bet per hour for the casino to break even? 5 Weird Willy offers you the following choice. You may have 1/3.5 dollars, or you may roll a fair die and he will give you 1/X dollars where X is the value of the roll. Which is the better deal? Compute E(1/X) to decide. 6 Suppose X and Y are independent random variables and E(X) = 24, Var(X) = 4, E(Y ) = 22, and Var(Y ) = 9. a) What is the standard deviation of X? b) What is the standard deviation of Y? c) What is E(2X)? d) What is Var(2X)? e) What is E(X + Y )? f) What is Var(X + Y )? g) What is E(X Y )? h) What is Var(X Y )? Math 143 : Spring 2012 : Pruim Last Modified: April 2, 2012

12 10.8 More About Random Variables 7 Suppose X and Y are independent random variables satisfying X Y mean standard deviation 6 8 a) What is E(2X)? b) What is Var(2X)? c) What is E(X + Y )? d) What is Var(X + Y )? e) What is E(X Y )? f) What is Var(X Y )? 8 Let X Binom(20,.8). a) What is the probability that X = 16? b) What is the probability that X 16? c) Compute E(X). d) Compute Var(X). Last Modified: April 2, 2012 Math 143 : Spring 2012 : Pruim

13 More About Random Variables 10.9 Solutions 1 # Expected value m <- 0 * * * * 0.4; m [1] 2 # Variance (0-m)^2 * (1-m)^2 * (2-m)^2 * (3-m)^2 * 0.4 [1] 1 2 value of Y probability 1/36 3/36 5/36 7/36 9/36 11/36 P (Y = 1 Y 6) = 1/36 25/36 = # Expected value 1 * 1/ * 3/ * 5/ * 7/ * 9/ * 11/36 [1] value of Y probability 1/16 3/16 5/16 7/16 P (Y = 1 Y 4) = 1/16 7/16 = 1 7. # Expected value m <- -1 * 20/ * 18/38 m [1] # How many plays to cover $50? 50/abs(m) [1] Math 143 : Spring 2012 : Pruim Last Modified: April 2, 2012

14 10.10 More About Random Variables # Expected value m <- -1 * 20/ * 18/38 m [1] # How many plays to cover $50? 50/abs(m) [1] value of 1/X 1 1/2 1/3 1/4 1/5 1/6 probability 1/6 1/6 1/6 1/6 1/6 1/6 # Expected value m <- 1 * 1/6 + 1/2 * 1/6 + 1/3 * 1/6 + 1/4 * 1/6 + 1/5 * 1/6 + 1/6 * 1/6 m [1] /3.5 [1] # Should we roll the dice? m > 1/3.5 [1] TRUE 6 a) Standard deviation of X: 2 b) Standard deviation of Y : 3 c) E(2X) = 48 d) Var(2X) = 44 e) E(X + Y ) = = 26 f) Var(X + Y ) = = 13 g) E(X Y ) = = 2 h) What is Var(X Y ) = = 13 7 a) E(2X) = 200 b) Var(2X) = (2 6) 2 = 4 36 = 144 c) E(X + Y ) = = 190 Last Modified: April 2, 2012 d) Var(X + Y ) = = 100 e) E(X Y ) = = 10 f) Var(X Y ) = = 100 Math 143 : Spring 2012 : Pruim

15 More About Random Variables # Pr(X = 16) dbinom(16, 20, 0.8) [1] # Pr(X >= 16) 1 - pbinom(15, 20, 0.8) [1] E(X) = = 16. Var(X) = = 3.2. Math 143 : Spring 2012 : Pruim Last Modified: April 2, 2012

16 10.12 More About Random Variables Last Modified: April 2, 2012 Math 143 : Spring 2012 : Pruim

17 Inference for One Mean Inference for One Mean This is mostly covered in the text. Note that we are skipping section 11.5 at least for now. That sections covers how to compute a confidence interval for the variance, and you are welcome to read it if you like. Here is some R code related to t tests and confidence intervals The Long Way The test statistic for a null hypothesis of H 0 : µ = µ 0 is This is easily computed in RStudio: t = x µ 0 s/ n # some ingredients x.bar <- mean(humanbodytemp$temp); x.bar = estimate hypothesis value standard error [1] sd <- sd (HumanBodyTemp$temp); sd [1] n <- nrow (HumanBodyTemp); n [1] 25 se <- sd/sqrt(n); se [1] # test statistic t <- ( x.bar ) / se; t Math 143 : Spring 2012 : Pruim Last Modified: April 2, 2012

18 11.2 Inference for One Mean [1] # 2-sided p-value 2 * pt( - abs(t), df=24 ) [1] Similarly, we con compute a 95% confidence interval t.star <- qt(0.975, df = 24) t.star [1] # lower limit x.bar - t.star * se [1] # upper limit x.bar + t.star * se [1] Super Short Cuts Of course, RStudio can do all of the calculations for you if you give it the raw data: t.test(humanbodytemp$temp, mu = 98.6) One Sample t-test data: HumanBodyTemp$temp t = , df = 24, p-value = alternative hypothesis: true mean is not equal to percent confidence interval: sample estimates: mean of x Confidence Intervals for Proportions Note: This topic is covered in section 7.3 of the textbook. Last Modified: April 2, 2012 Math 143 : Spring 2012 : Pruim

19 Inference for One Mean 11.3 The sampling distribution for a sample proportion (assuming samples of size n) is ( ) p(1 p) ˆp Norm p, n The standard deviation of a sampling distribution is called the standard error (abbreviated SE), so in this case we have p(1 p) SE = n This means that for 95% of samples, our estimated proportion ˆp will be within (1.96)SE of p. This tells us how well ˆp approximates p. It says that usually ˆp is between p 1.96SE and p SE. Key idea: If ˆp is close to p, then p is close to ˆp. That is, for most samples p is between ˆp 1.96SE and ˆp SE. Our only remaining difficulty is that we don t know SE because it depends on p, which we don t know. We will estimate SE as follows. Several methods have been proposed to work around this difficulty. Some of them work better than others. The Wald Interval The Old Traditional Method goes back to a statistician named Wald. Unfortunately, it is not very accurate when n is small (how large n must be depends on p), but it is simple: ˆp(1 ˆp) SE n ˆp(1 ˆp) We will call the interval between ˆp 1.96 n and ˆp we will write this very succinctly as ˆp ± 1.96SE ˆp(1 ˆp) n a 95% confidence interval for p. Often Notice that this fits our general pattern for confidence intervals where estimate ± (critical value)(standard error) estimate = ˆp = x n SE = ˆp(1 ˆp) n interval: ˆp ± z SE The Plus 4 Interval Recently, due to a paper by Agresti and Coull, a new method has become popular. It is almost as simple as the Wald Interval, and does a much better job of obtaining the desired 95% coverage rate. The idea is simple: Pretend you have 2 extra successes and two extra failures and use the Wald method with the modified data. The resulting method again fits our general pattern for confidence intervals with Math 143 : Spring 2012 : Pruim Last Modified: April 2, 2012

20 11.4 Inference for One Mean estimate = p = x+2 SE = p (1 p ) n+4 n+4 interval: p ± z SE The addition of 2 and 4 comes from the fact that z = and z 2 4 for a 95% confidence interval. It can safely be used for confidence levels between 90% and 99%, and that covers most confidence intervals used in practice. Example Q. Suppose we flip a coin 100 times and obtain 42 heads. What is a 95% confidence interval for p? Using the Wald method. ˆp = = SE = 100 = z = 1.96 interval:.42 ± or.42 ± Using the Plus 4 method. p = = p SE = (1 p ) 104 = z = 1.96 interval: ± or ± Using R. Of course, R can do all of this for us. Both binom.test() and prop.test() report confidence intervals as well as p-values. The interval produced by prop.test() will be quite close to the one produced by the Plus 4 method. The interval produced by binom.test() will typically be a little bit wider because it guarantees to have a coverage rate of at least 95% whereas the one produced by prop.test() may have a coverage rate a bit above or below 95%. prop.test(42, 100) 1-sample proportions test with continuity correction data: x and n X-squared = 2.25, df = 1, p-value = alternative hypothesis: true p is not equal to percent confidence interval: sample estimates: p 0.42 Last Modified: April 2, 2012 Math 143 : Spring 2012 : Pruim

21 Inference for One Mean 11.5 binom.test(42, 100) Exact binomial test data: x and n number of successes = 42, number of trials = 100, p-value = alternative hypothesis: true probability of success is not equal to percent confidence interval: sample estimates: probability of success 0.42 Which denominator for SE? If you look in the book on page 162 you will see that it uses n 1 in the denominator for SEˆp. This is another variation on the theme, but it much less commonly used than any of the other methods discussed here. It performs about as well/poorly as the Wald method. Other confidence levels We can use any confidence level we like if we replace 1.96 with the appropriate critical value z. Example Q. Suppose we flip a coin 100 times and obtain 42 heads. What is a 99% confidence interval for p? A. The only thing that changes is our value of z which must now be selected so that 99% of a normal distribution is between z standard deviations below the mean and z standard deviations above the mean. R can calculate this value for us: qnorm(0.995) #.995 because we need BELOW z* [1] Now we proceed as before. SE = = 0.049, so Wald confidence interval:.42 ± or.42 ± Plus 4 interval: ± or ± R computes these confidence intervals for us when we use prop.test() and binom.test() with the extra argument conf.level = prop.test(42, 100, conf.level = 0.99) 1-sample proportions test with continuity correction Math 143 : Spring 2012 : Pruim Last Modified: April 2, 2012

22 11.6 Inference for One Mean data: x and n X-squared = 2.25, df = 1, p-value = alternative hypothesis: true p is not equal to percent confidence interval: sample estimates: p 0.42 binom.test(42, 100, conf.level = 0.99) Exact binomial test data: x and n number of successes = 42, number of trials = 100, p-value = alternative hypothesis: true probability of success is not equal to percent confidence interval: sample estimates: probability of success Determining Sample Size An important part of designing a study is deciding how large the sample needs to be for the intended purposes of the study. Q. You have been asked to conduct a public opinion survey to determine what percentage of the residents of a city are in favor of the mayor s new deficit reduction efforts. You need to have a margin of error of ±3%. How large must your sample be? (Assume a 95% confidence interval.) ˆp(1 ˆp) A. The margin of error will be 1.96 n, so our method will be to make a reasonable guess about what ˆp will be, and then determine how large n must be to make the margin of error small enough. Since SE is largest when ˆp = 0.50, one safe estimate for ˆp is 0.50, and that is what we will use unless we are quite sure that p is close to 0 or 1. (In those latter cases, we will make a best guess, erring on the side of being too close to 0.50 to avoid doing the work of getting a sample much larger than we need.) We can solve 0.03 = n value of n. We could also graph method * sqrt(0.5 * 0.5/400) algebraically, or we can play a simple game of higher and lower until we get our n and use the graph to estimate n. Here s the higher/lower guessing [1] * sqrt(0.5 * 0.5/800) Last Modified: April 2, 2012 Math 143 : Spring 2012 : Pruim

23 Inference for One Mean 11.7 [1] * sqrt(0.5 * 0.5/1200) [1] * sqrt(0.5 * 0.5/1000) [1] * sqrt(0.5 * 0.5/1100) [1] * sqrt(0.5 * 0.5/1050) [1] So we need a sample size a bit larger than 1050, but not as large as We can continue this process to get a tighter estimate if we like: 1.96 * sqrt(0.5 * 0.5/1075) [1] * sqrt(0.5 * 0.5/1065) [1] * sqrt(0.5 * 0.5/1070) [1] * sqrt(0.5 * 0.5/1068) [1] * sqrt(0.5 * 0.5/1067) [1] 0.03 We see that a sample of size 1067 is guaranteed to give us a margin of error of at most 3%. It isn t really important to get this down to the nearest whole number, however. Our goal is to know roughly what size sample we need (tens? hundreds? thousands? tens of thousands?). Knowing the answer to 2 Math 143 : Spring 2012 : Pruim Last Modified: April 2, 2012

24 11.8 Inference for One Mean significant figures is usually sufficient for planning purposes. Side note: The R function uniroot can automate this guessing for us, but it requires a bit of programming to use it: # uniroot finds when a function is 0, so we need to build such a # function f <- function(n) { 1.96 * sqrt(0.5 * 0.5/n) } # uniroot needs a function and a lower bound and upper bound to # search between uniroot(f, c(1, 50000))$root [1] 1067 Example Q. How would things change in the previous problem if 1. We wanted a 98% confidence interval instead of a 95% confidence interval? 2. We wanted a 95% confidence interval with a margin of error at most 0.5%? 3. We wanted a 95% confidence interval with a margin of error at most 0.5% and we are pretty sure that p < 10%? A. We ll use uniroot() here. You should use the higher-lower method or algebra and compare your results. f1 <- function(n) { qnorm(0.99) * sqrt(0.5 * 0.5/n) } uniroot(f1, c(1, 50000))$root [1] 1503 f2 <- function(n) { qnorm(0.975) * sqrt(0.5 * 0.5/n) } uniroot(f2, c(1, 50000))$root [1] f3 <- function(n) { qnorm(0.99) * sqrt(0.1 * 0.9/n) } uniroot(f3, c(1, 50000))$root [1] Last Modified: April 2, 2012 Math 143 : Spring 2012 : Pruim

25 Comparing Two Means Comparing Two Means 12.1 Summary of Methods Methods for One Variable So far, our inference methods have dealt with just one variable. We can divide the methods we know into three situations, depending on the type of variable we have: Type of Variable Parameter of Interest Method R function 1 Quantitative mean 1-sample t t.test() 1 Categorical (2 levels) proportion 1-proportion binom.test() prop.test() 1 Categorical ( 3 levels) multiple proportions Chi-squared goodness of fit chisq.test() Table 12.1: A summary of one variable statistical inference methods. For each row in Table 12.1, we have learned a method for computing p-values and for each row except the last we have a method for computing confidence intervals. These intervals (and several more that we will encounter) follow a common pattern: data value ± critical value standard error Quantitative Variable: data value: x (sample mean) critical value: t (using n 1 degrees of freedom) standard error: SE = s n. Categorial Variable (Wald Interval): data value: ˆp (sample proportion) critical value: z (using standard normal distribution) ˆp(1 ˆp) standard error: SE = n. For a Plus-4 interval use p = x+2 n+4 and n = n + 4 in place of ˆp and n. Math 143 : Spring 2012 : Pruim Last Modified: April 2, 2012

26 12.2 Comparing Two Means There are methods (that we mostly won t cover in this course) for investigating other parameters (like median, standard deviation, etc.), so it is important to note that both the type of data and the parameter are important in determining the statistical method Methods for Two Variables Now we are ready to think about multiple variables. Let s start with two. If we have an explanatory variable and a response variable, we can map the type of data to the method of analysis as follows: Response Variable Categorical (2 levels) Categorical ( 3 levels) Chi-Squared for 2-way tables logistic regression Quantitative 2-sample t ANOVA simple linear regression paired t Cat (2 levels) Cat ( 3 levels) Quant Explanatory Variable Table 12.2: A summary of two variable statistical inference methods. We will learn about each of the methods in this table. We will also learn a little bit about methods that can handle more than two variables. As we go along, always keep in mind where we are in Table Paired t Chapter 12 of our text covers two methods that are similar because the both use the t distribution, but otherwise really don t belong together since they are in different cells of Table This means they are for situations where you have different kinds of data The paired t situation For a paired t test, we will have two quantitative measurements on the same scale for each observational unit. One common situation is a pre/post study where subjects are measured before and after some treatment, intervention, or time delay. Other examples of paired situations include: Measurements of husbands and wives on some quantitative scale (like satisfaction with their marriage). Measurements of speed of performance of a task with subjects left and right hands. Measurements of the distance a subject kicks two footballs, one filled with air, the other with helium. (Yes, this study has actually been done.) Last Modified: April 2, 2012 Math 143 : Spring 2012 : Pruim

27 Comparing Two Means 12.3 Crop studies where quantitative measures of crops (like yield per acre) are measured for crops experiencing two treatments (kinds of fertilizer, for example) if a portion of each plot gets each treatment. In this case, the plots become the observational units and for each plot we have two variables (yield with fertilizer A, yield with fertilizer B). Paired t procedures are just a 2-step version of 1-sample t procedures: Step 1: Combine the two measurements into one (usually by taking their difference). Step 2: Use a 1-sample t procedure on the new variable Blackbirds example The Blackbirds example compares the amount of antibodies in male blackbirds (on a log scale) before and after injection with testosterone. You can read the details of the study in the text. Here is the R code to compute the paired t test and confidence interval for this example. t.test(blackbirds$log.after - Blackbirds$log.before) # explicit subtraction One Sample t-test data: Blackbirds$log.after - Blackbirds$log.before t = 1.271, df = 12, p-value = alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: sample estimates: mean of x We can also give R two variables and let it do the subtraction for us by setting paired=true. The main advantage to this is that the output shows that you are doing a paired test. t.test(blackbirds$log.after, Blackbirds$log.before, paired = TRUE) # using paired = TRUE Paired t-test data: Blackbirds$log.after and Blackbirds$log.before t = 1.271, df = 12, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of the differences If the data set did not have the log-transformed values already calculated, we could also do Math 143 : Spring 2012 : Pruim Last Modified: April 2, 2012

28 12.4 Comparing Two Means t.test(log(blackbirds$after), log(blackbirds$before), paired = TRUE) Paired t-test data: log(blackbirds$after) and log(blackbirds$before) t = 1.244, df = 12, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of the differences We could also calculate the p-values and confidence intervals from numerical summaries of the data m <- mean(log(after) - log(before), data=blackbirds); m [1] s <- sd(log(after) - log(before), data=blackbirds); s [1] n <- nrow(blackbirds); n # sample size [1] 13 SE <- s/ sqrt(n); SE # standard error [1] t <- ( m - 0 ) / SE; t # test statistic [1] * pt( -abs(t), df=n-1 ) # 2-sided p-value [1] t.star <- qt(.975, df=n-1); t.star # critical value for 95% CI [1] t.star * SE # margin of error Last Modified: April 2, 2012 Math 143 : Spring 2012 : Pruim

29 Comparing Two Means 12.5 [1] m - t.star * SE # lower bound on CI [1] m + t.star * SE # upper bound on CI [1] Side note: Paired t is a data reduction method Paired t procedures are an example of data reduction: we are combining multiple variables in our data set into one variable to make the analysis easier. There are many other examples of data reduction. One famous example in biology is body mass index (bmi). The formula used for bmi is body mass index = bmi = weight height 2 where weight is measured in kilograms and height in meters. This is a more complicated way of turning two measurements (height and weight) into a single measurement. i 12.3 Two Sample T The 2-sample Method In a paired t situation we have two quantitative measurements for ech observational unit. In the Two Sample t situation we have only one quantitative measurement for each observational unit, but the observational units are in two groups. We use a categorical variable to indicate which group each observational unit is in. If we have two populations that are normal with means µ 1 and µ 2 and standard deviations σ 1 and σ 2, then the sampling distributions of the sample means using samples of sizes n 1 and n 2 are given by ( ) σ 1 X Norm µ 1, n1 ( ) σ 2 Y Norm µ 2, n2 So σ1 X Y Norm µ 2 1 µ 2, + σ2 2 n 1 n 2 So the 2-sample t procedures follow our typical pattern using data value: x y Math 143 : Spring 2012 : Pruim Last Modified: April 2, 2012

30 12.6 Comparing Two Means SE: σ 2 1 n 1 + σ2 2 n 2 degrees of freedom for t given by a messy formula but satisfying df min n 1, n 2 1 df n 1 + n 2 2 The degrees of freedom will be closer to the upper bound when the two standard deviations and the two sample sizes are close in value Horned Lizards This example is described in the text. Below is R code to compute the p-values and confidence intervals By Hand For each group we need the mean, standard deviation, and sample size. We can get them all at once using favstats() favstats(horn.length ~ group, data = HornedLizards) min Q1 median Q3 max mean sd n missing killed living We can use this information to compute SE: SE <- sqrt( 2.709^2 / ^2 / 154 ); SE [1] To test H 0 : µ 1 µ 2 = 0, we use the test statistic t <- ( ) / SE; t [1] The degrees of freedom will be between 30 1 = 29 and = 182, so our p-value (for a two-sided test) is between the two results below: 2 * pt(t, df = 29) [1] * pt(t, df = 182) [1] 3.334e-05 Last Modified: April 2, 2012 Math 143 : Spring 2012 : Pruim

31 Comparing Two Means 12.7 In this example, the p-values are quite close and lead to the same conclusions. We can give a 95% confidence interval for the difference in the mean horn length using a margin of error computed as follows: t.star <- qt(.975, df=29); t.star [1] MofE <- t.star * SE; MofE [1] Super Short Cut Of course, R can automate the entire process: t.test(horn.length ~ group, HornedLizards) Welch Two Sample t-test data: horn.length by group t = , df = 40.37, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean in group killed mean in group living Notice that the degrees of freedom is indeed between 29 and 282. Math 143 : Spring 2012 : Pruim Last Modified: April 2, 2012

1 Normal Distribution.

1 Normal Distribution. Normal Distribution.. Introduction A Bernoulli trial is simple random experiment that ends in success or failure. A Bernoulli trial can be used to make a new random experiment by repeating the Bernoulli

More information

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests Chapters 3.5.1 3.5.2, 3.3.2 Prof. Tesler Math 283 Fall 2018 Prof. Tesler z and t tests for mean Math

More information

Inference for Single Proportions and Means T.Scofield

Inference for Single Proportions and Means T.Scofield Inference for Single Proportions and Means TScofield Confidence Intervals for Single Proportions and Means A CI gives upper and lower bounds between which we hope to capture the (fixed) population parameter

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics February 19, 2018 CS 361: Probability & Statistics Random variables Markov s inequality This theorem says that for any random variable X and any value a, we have A random variable is unlikely to have an

More information

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22 CS 70 Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22 Random Variables and Expectation Question: The homeworks of 20 students are collected in, randomly shuffled and returned to the students.

More information

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual Question 1. Suppose you want to estimate the percentage of

More information

Exam III #1 Solutions

Exam III #1 Solutions Department of Mathematics University of Notre Dame Math 10120 Finite Math Fall 2017 Name: Instructors: Basit & Migliore Exam III #1 Solutions November 14, 2017 This exam is in two parts on 11 pages and

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Last few slides from last time

Last few slides from last time Last few slides from last time Example 3: What is the probability that p will fall in a certain range, given p? Flip a coin 50 times. If the coin is fair (p=0.5), what is the probability of getting an

More information

Chapter 26: Comparing Counts (Chi Square)

Chapter 26: Comparing Counts (Chi Square) Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces

More information

Statistical Inference

Statistical Inference Statistical Inference Bernhard Klingenberg Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Outline Estimation: Review of concepts

More information

One-sample categorical data: approximate inference

One-sample categorical data: approximate inference One-sample categorical data: approximate inference Patrick Breheny October 6 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction It is relatively easy to think about the distribution

More information

Class 26: review for final exam 18.05, Spring 2014

Class 26: review for final exam 18.05, Spring 2014 Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event

More information

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation CS 70 Discrete Mathematics and Probability Theory Fall 203 Vazirani Note 2 Random Variables: Distribution and Expectation We will now return once again to the question of how many heads in a typical sequence

More information

What is a random variable

What is a random variable OKAN UNIVERSITY FACULTY OF ENGINEERING AND ARCHITECTURE MATH 256 Probability and Random Processes 04 Random Variables Fall 20 Yrd. Doç. Dr. Didem Kivanc Tureli didemk@ieee.org didem.kivanc@okan.edu.tr

More information

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages Name No calculators. 18.05 Final Exam Number of problems 16 concept questions, 16 problems, 21 pages Extra paper If you need more space we will provide some blank paper. Indicate clearly that your solution

More information

2.1 Independent, Identically Distributed Trials

2.1 Independent, Identically Distributed Trials Chapter 2 Trials 2.1 Independent, Identically Distributed Trials In Chapter 1 we considered the operation of a CM. Many, but not all, CMs can be operated more than once. For example, a coin can be tossed

More information

success and failure independent from one trial to the next?

success and failure independent from one trial to the next? , section 8.4 The Binomial Distribution Notes by Tim Pilachowski Definition of Bernoulli trials which make up a binomial experiment: The number of trials in an experiment is fixed. There are exactly two

More information

The topics in this section concern with the first course objective.

The topics in this section concern with the first course objective. 1.1 Systems & Probability The topics in this section concern with the first course objective. A system is one of the most fundamental concepts and one of the most useful and powerful tools in STEM (science,

More information

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation CS 70 Discrete Mathematics and Probability Theory Fall 202 Vazirani Note 4 Random Variables: Distribution and Expectation Random Variables Question: The homeworks of 20 students are collected in, randomly

More information

Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan

Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan 2.4 Random Variables Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan By definition, a random variable X is a function with domain the sample space and range a subset of the

More information

Confidence Intervals with σ unknown

Confidence Intervals with σ unknown STAT 141 Confidence Intervals and Hypothesis Testing 10/26/04 Today (Chapter 7): CI with σ unknown, t-distribution CI for proportions Two sample CI with σ known or unknown Hypothesis Testing, z-test Confidence

More information

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations EECS 70 Discrete Mathematics and Probability Theory Fall 204 Anant Sahai Note 5 Random Variables: Distributions, Independence, and Expectations In the last note, we saw how useful it is to have a way of

More information

Lecture 3. Biostatistics in Veterinary Science. Feb 2, Jung-Jin Lee Drexel University. Biostatistics in Veterinary Science Lecture 3

Lecture 3. Biostatistics in Veterinary Science. Feb 2, Jung-Jin Lee Drexel University. Biostatistics in Veterinary Science Lecture 3 Lecture 3 Biostatistics in Veterinary Science Jung-Jin Lee Drexel University Feb 2, 2015 Review Let S be the sample space and A, B be events. Then 1 P (S) = 1, P ( ) = 0. 2 If A B, then P (A) P (B). In

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation

More information

Computations - Show all your work. (30 pts)

Computations - Show all your work. (30 pts) Math 1012 Final Name: Computations - Show all your work. (30 pts) 1. Fractions. a. 1 7 + 1 5 b. 12 5 5 9 c. 6 8 2 16 d. 1 6 + 2 5 + 3 4 2.a Powers of ten. i. 10 3 10 2 ii. 10 2 10 6 iii. 10 0 iv. (10 5

More information

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878 Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each

More information

Chapter 2: Discrete Distributions. 2.1 Random Variables of the Discrete Type

Chapter 2: Discrete Distributions. 2.1 Random Variables of the Discrete Type Chapter 2: Discrete Distributions 2.1 Random Variables of the Discrete Type 2.2 Mathematical Expectation 2.3 Special Mathematical Expectations 2.4 Binomial Distribution 2.5 Negative Binomial Distribution

More information

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module

More information

18.05 Practice Final Exam

18.05 Practice Final Exam No calculators. 18.05 Practice Final Exam Number of problems 16 concept questions, 16 problems. Simplifying expressions Unless asked to explicitly, you don t need to simplify complicated expressions. For

More information

*Karle Laska s Sections: There is no class tomorrow and Friday! Have a good weekend! Scores will be posted in Compass early Friday morning

*Karle Laska s Sections: There is no class tomorrow and Friday! Have a good weekend! Scores will be posted in Compass early Friday morning STATISTICS 100 EXAM 3 Spring 2016 PRINT NAME (Last name) (First name) *NETID CIRCLE SECTION: Laska MWF L1 Laska Tues/Thurs L2 Robin Tu Write answers in appropriate blanks. When no blanks are provided CIRCLE

More information

Math 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal.

Math 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal. What we will do today s David Meredith Department of Mathematics San Francisco State University October 22, 2009 s 1 2 s 3 What is a? Decision support Political decisions s s Goal of statistics: optimize

More information

Distributions of linear combinations

Distributions of linear combinations Distributions of linear combinations CE 311S MORE THAN TWO RANDOM VARIABLES The same concepts used for two random variables can be applied to three or more random variables, but they are harder to visualize

More information

STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation CS 70 Discrete Mathematics and Probability Theory Spring 206 Rao and Walrand Note 6 Random Variables: Distribution and Expectation Example: Coin Flips Recall our setup of a probabilistic experiment as

More information

The Normal Distribution

The Normal Distribution The Mary Lindstrom (Adapted from notes provided by Professor Bret Larget) February 10, 2004 Statistics 371 Last modified: February 11, 2004 The The (AKA Gaussian Distribution) is our first distribution

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

Exam 2 Practice Questions, 18.05, Spring 2014

Exam 2 Practice Questions, 18.05, Spring 2014 Exam 2 Practice Questions, 18.05, Spring 2014 Note: This is a set of practice problems for exam 2. The actual exam will be much shorter. Within each section we ve arranged the problems roughly in order

More information

SDS 321: Introduction to Probability and Statistics

SDS 321: Introduction to Probability and Statistics SDS 321: Introduction to Probability and Statistics Lecture 10: Expectation and Variance Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin www.cs.cmu.edu/ psarkar/teaching

More information

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1 PHP2510: Principles of Biostatistics & Data Analysis Lecture X: Hypothesis testing PHP 2510 Lec 10: Hypothesis testing 1 In previous lectures we have encountered problems of estimating an unknown population

More information

Confidence intervals

Confidence intervals Confidence intervals We now want to take what we ve learned about sampling distributions and standard errors and construct confidence intervals. What are confidence intervals? Simply an interval for which

More information

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom Central Limit Theorem and the Law of Large Numbers Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Understand the statement of the law of large numbers. 2. Understand the statement of the

More information

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math. Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if

More information

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. Contingency Tables Definition & Examples. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. (Using more than two factors gets complicated,

More information

Confidence intervals CE 311S

Confidence intervals CE 311S CE 311S PREVIEW OF STATISTICS The first part of the class was about probability. P(H) = 0.5 P(T) = 0.5 HTTHHTTTTHHTHTHH If we know how a random process works, what will we see in the field? Preview of

More information

CIS 2033 Lecture 5, Fall

CIS 2033 Lecture 5, Fall CIS 2033 Lecture 5, Fall 2016 1 Instructor: David Dobor September 13, 2016 1 Supplemental reading from Dekking s textbook: Chapter2, 3. We mentioned at the beginning of this class that calculus was a prerequisite

More information

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n = Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,

More information

Uncertainty. Michael Peters December 27, 2013

Uncertainty. Michael Peters December 27, 2013 Uncertainty Michael Peters December 27, 20 Lotteries In many problems in economics, people are forced to make decisions without knowing exactly what the consequences will be. For example, when you buy

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

Ch. 7: Estimates and Sample Sizes

Ch. 7: Estimates and Sample Sizes Ch. 7: Estimates and Sample Sizes Section Title Notes Pages Introduction to the Chapter 2 2 Estimating p in the Binomial Distribution 2 5 3 Estimating a Population Mean: Sigma Known 6 9 4 Estimating a

More information

STATISTICS 141 Final Review

STATISTICS 141 Final Review STATISTICS 141 Final Review Bin Zou bzou@ualberta.ca Department of Mathematical & Statistical Sciences University of Alberta Winter 2015 Bin Zou (bzou@ualberta.ca) STAT 141 Final Review Winter 2015 1 /

More information

Expectations and Variance

Expectations and Variance 4. Model parameters and their estimates 4.1 Expected Value and Conditional Expected Value 4. The Variance 4.3 Population vs Sample Quantities 4.4 Mean and Variance of a Linear Combination 4.5 The Covariance

More information

Discrete Distributions

Discrete Distributions Discrete Distributions STA 281 Fall 2011 1 Introduction Previously we defined a random variable to be an experiment with numerical outcomes. Often different random variables are related in that they have

More information

Exam 2 Review Math 118 Sections 1 and 2

Exam 2 Review Math 118 Sections 1 and 2 Exam 2 Review Math 118 Sections 1 and 2 This exam will cover sections 2.4, 2.5, 3.1-3.3, 4.1-4.3 and 5.1-5.2 of the textbook. No books, notes, calculators or other aids are allowed on this exam. There

More information

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences Random Variables Christopher Adolph Department of Political Science and Center for Statistics and the Social Sciences University

More information

Lecture 10. Variance and standard deviation

Lecture 10. Variance and standard deviation 18.440: Lecture 10 Variance and standard deviation Scott Sheffield MIT 1 Outline Defining variance Examples Properties Decomposition trick 2 Outline Defining variance Examples Properties Decomposition

More information

Men. Women. Men. Men. Women. Women

Men. Women. Men. Men. Women. Women Math 203 Topics for second exam Statistics: the science of data Chapter 5: Producing data Statistics is all about drawing conclusions about the opinions/behavior/structure of large populations based on

More information

Lecture 8 Sampling Theory

Lecture 8 Sampling Theory Lecture 8 Sampling Theory Thais Paiva STA 111 - Summer 2013 Term II July 11, 2013 1 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013 Lecture Plan 1 Sampling Distributions 2 Law of Large

More information

Analysis of Engineering and Scientific Data. Semester

Analysis of Engineering and Scientific Data. Semester Analysis of Engineering and Scientific Data Semester 1 2019 Sabrina Streipert s.streipert@uq.edu.au Example: Draw a random number from the interval of real numbers [1, 3]. Let X represent the number. Each

More information

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices. Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices. 1.(10) What is usually true about a parameter of a model? A. It is a known number B. It is determined by the data C. It is an

More information

Two-sample inference: Continuous data

Two-sample inference: Continuous data Two-sample inference: Continuous data Patrick Breheny April 6 Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 1 / 36 Our next several lectures will deal with two-sample inference for continuous

More information

MATH MW Elementary Probability Course Notes Part I: Models and Counting

MATH MW Elementary Probability Course Notes Part I: Models and Counting MATH 2030 3.00MW Elementary Probability Course Notes Part I: Models and Counting Tom Salisbury salt@yorku.ca York University Winter 2010 Introduction [Jan 5] Probability: the mathematics used for Statistics

More information

Solution E[sum of all eleven dice] = E[sum of ten d20] + E[one d6] = 10 * E[one d20] + E[one d6]

Solution E[sum of all eleven dice] = E[sum of ten d20] + E[one d6] = 10 * E[one d20] + E[one d6] Name: SOLUTIONS Midterm (take home version) To help you budget your time, questions are marked with *s. One * indicates a straight forward question testing foundational knowledge. Two ** indicate a more

More information

Debugging Intuition. How to calculate the probability of at least k successes in n trials?

Debugging Intuition. How to calculate the probability of at least k successes in n trials? How to calculate the probability of at least k successes in n trials? X is number of successes in n trials each with probability p # ways to choose slots for success Correct: Debugging Intuition P (X k)

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Basic Probability. Introduction

Basic Probability. Introduction Basic Probability Introduction The world is an uncertain place. Making predictions about something as seemingly mundane as tomorrow s weather, for example, is actually quite a difficult task. Even with

More information

An introduction to biostatistics: part 1

An introduction to biostatistics: part 1 An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random

More information

PRACTICE PROBLEMS FOR EXAM 2

PRACTICE PROBLEMS FOR EXAM 2 PRACTICE PROBLEMS FOR EXAM 2 Math 3160Q Fall 2015 Professor Hohn Below is a list of practice questions for Exam 2. Any quiz, homework, or example problem has a chance of being on the exam. For more practice,

More information

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance Practice Final Exam Statistical Methods and Models - Math 410, Fall 2011 December 4, 2011 You may use a calculator, and you may bring in one sheet (8.5 by 11 or A4) of notes. Otherwise closed book. The

More information

Chapter 3. Estimation of p. 3.1 Point and Interval Estimates of p

Chapter 3. Estimation of p. 3.1 Point and Interval Estimates of p Chapter 3 Estimation of p 3.1 Point and Interval Estimates of p Suppose that we have Bernoulli Trials (BT). So far, in every example I have told you the (numerical) value of p. In science, usually the

More information

STAT 345 Spring 2018 Homework 4 - Discrete Probability Distributions

STAT 345 Spring 2018 Homework 4 - Discrete Probability Distributions STAT 345 Spring 2018 Homework 4 - Discrete Probability Distributions Name: Please adhere to the homework rules as given in the Syllabus. 1. Coin Flipping. Timothy and Jimothy are playing a betting game.

More information

Topic 3: The Expectation of a Random Variable

Topic 3: The Expectation of a Random Variable Topic 3: The Expectation of a Random Variable Course 003, 2017 Page 0 Expectation of a discrete random variable Definition (Expectation of a discrete r.v.): The expected value (also called the expectation

More information

Expectations and Variance

Expectations and Variance 4. Model parameters and their estimates 4.1 Expected Value and Conditional Expected Value 4. The Variance 4.3 Population vs Sample Quantities 4.4 Mean and Variance of a Linear Combination 4.5 The Covariance

More information

Gov 2000: 6. Hypothesis Testing

Gov 2000: 6. Hypothesis Testing Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis Testing Examples 2. Hypothesis Test Nomenclature 3. Conducting Hypothesis Tests 4. p-values 5. Power Analyses 6.

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

1 Matched pair comparison(p430-)

1 Matched pair comparison(p430-) [1] ST301(AKI) LEC 25 2010/11/30 ST 301 (AKI) LECTURE #25 1 Matched pair comparison(p430-) This has a quite different assumption (matched pair) from the other three methods. Remember LEC 32 page 1 example:

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

MATH 19B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 2010

MATH 19B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 2010 MATH 9B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 00 This handout is meant to provide a collection of exercises that use the material from the probability and statistics portion of the course The

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten

More information

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)

More information

Name: SOLUTIONS Exam 01 (Midterm Part 2 take home, open everything)

Name: SOLUTIONS Exam 01 (Midterm Part 2 take home, open everything) Name: SOLUTIONS Exam 01 (Midterm Part 2 take home, open everything) To help you budget your time, questions are marked with *s. One * indicates a straightforward question testing foundational knowledge.

More information

CSCI2244-Randomness and Computation First Exam with Solutions

CSCI2244-Randomness and Computation First Exam with Solutions CSCI2244-Randomness and Computation First Exam with Solutions March 1, 2018 Each part of each problem is worth 5 points. There are actually two parts to Problem 2, since you are asked to compute two probabilities.

More information

Business Statistics. Lecture 5: Confidence Intervals

Business Statistics. Lecture 5: Confidence Intervals Business Statistics Lecture 5: Confidence Intervals Goals for this Lecture Confidence intervals The t distribution 2 Welcome to Interval Estimation! Moments Mean 815.0340 Std Dev 0.8923 Std Error Mean

More information

7 Estimation. 7.1 Population and Sample (P.91-92)

7 Estimation. 7.1 Population and Sample (P.91-92) 7 Estimation MATH1015 Biostatistics Week 7 7.1 Population and Sample (P.91-92) Suppose that we wish to study a particular health problem in Australia, for example, the average serum cholesterol level for

More information

Expected Value II. 1 The Expected Number of Events that Happen

Expected Value II. 1 The Expected Number of Events that Happen 6.042/18.062J Mathematics for Computer Science December 5, 2006 Tom Leighton and Ronitt Rubinfeld Lecture Notes Expected Value II 1 The Expected Number of Events that Happen Last week we concluded by showing

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Lecture 8. October 22, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Lecture 8. October 22, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University. Lecture 8 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University October 22, 2007 1 2 3 4 5 6 1 Define convergent series 2 Define the Law of Large Numbers

More information

Ismor Fischer, 11/5/ # tablets AM probability PM probability

Ismor Fischer, 11/5/ # tablets AM probability PM probability Ismor Fischer, 11/5/017 4.4-1 4.4 Problems 1. Patient noncompliance is one of many potential sources of bias in medical studies. Consider a study where patients are asked to take tablets of a certain medication

More information

Math 141. Lecture 10: Confidence Intervals. Albyn Jones 1. jones/courses/ Library 304. Albyn Jones Math 141

Math 141. Lecture 10: Confidence Intervals. Albyn Jones 1.   jones/courses/ Library 304. Albyn Jones Math 141 Math 141 Lecture 10: Confidence Intervals Albyn Jones 1 1 Library 304 jones@reed.edu www.people.reed.edu/ jones/courses/141 Inference Suppose X Binomial(n, p). Inference about p includes the topics: Inference

More information

Exam details. Final Review Session. Things to Review

Exam details. Final Review Session. Things to Review Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit

More information

CISC 1100/1400 Structures of Comp. Sci./Discrete Structures Chapter 7 Probability. Outline. Terminology and background. Arthur G.

CISC 1100/1400 Structures of Comp. Sci./Discrete Structures Chapter 7 Probability. Outline. Terminology and background. Arthur G. CISC 1100/1400 Structures of Comp. Sci./Discrete Structures Chapter 7 Probability Arthur G. Werschulz Fordham University Department of Computer and Information Sciences Copyright Arthur G. Werschulz, 2017.

More information

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs STATISTICS 4 Summary Notes. Geometric and Exponential Distributions GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs P(X = x) = ( p) x p x =,, 3,...

More information

Math 2311 TEST 2 REVIEW SHEET KEY

Math 2311 TEST 2 REVIEW SHEET KEY Math 2311 TEST 2 REVIEW SHEET KEY #1 25, Define the following: 1. Continuous random variable 2. Discrete random variable 3. Density curve 4. Uniform density curve 5. Normal distribution 6. Sampling distribution

More information

16.400/453J Human Factors Engineering. Design of Experiments II

16.400/453J Human Factors Engineering. Design of Experiments II J Human Factors Engineering Design of Experiments II Review Experiment Design and Descriptive Statistics Research question, independent and dependent variables, histograms, box plots, etc. Inferential

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015 Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences 18.30 21.15h, February 12, 2015 Question 1 is on this page. Always motivate your answers. Write your answers in English. Only the

More information

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER 2 2017/2018 DR. ANTHONY BROWN 5.1. Introduction to Probability. 5. Probability You are probably familiar with the elementary

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing. Road Map Sampling Distributions, Confidence Intervals & Hypothesis Testing

Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing. Road Map Sampling Distributions, Confidence Intervals & Hypothesis Testing Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing ECO22Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit

More information

Bernoulli Trials, Binomial and Cumulative Distributions

Bernoulli Trials, Binomial and Cumulative Distributions Bernoulli Trials, Binomial and Cumulative Distributions Sec 4.4-4.6 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 9-3339 Cathy Poliak,

More information