Statistics Workshop Ramsey A. Foty, Ph.D. Department of Surgery UMDNJ-RWJMS

Size: px
Start display at page:

Download "Statistics Workshop Ramsey A. Foty, Ph.D. Department of Surgery UMDNJ-RWJMS"

Transcription

1 Statistics Workshop 2012 Ramsey A. Foty, Ph.D. Department of Surgery UMDNJ-RWJMS

2 An unsophisticated forecaster uses statistics as a drunkard uses lamp-postsfor support rather than for illumination Andrew Lang ( ) Scottish poet and novelist. Then there is the man who drowned crossing a stream with an average depth of six-inches W.I.E. Gates German Author Statistics: The only science that enables different experts using the same figures to draw different conclusions. Evan Esar American Humorist

3 Topics Why do we need statistics? Sample vs population. Gaussian/normal distribution. Descriptive Statistics. Measures of location. Mean, Median, Mode. Measures of dispersion. Range, Variance, Standard Deviation. Precision of the mean. Standard Error, Confidence Interval. Outliers. Grubb s test. The null hypothesis. Significance testing. Variability. Comparing two means. T-test Group exercise Comparing 3 or more groups. ANOVA Group Excercise Linear Regression. Power Analysis.

4 Hypothesis Testing Observe Phenomenon Propose Hypothesis Design Study Statistics are an important Part of the study design Collect and Analyze Data Interpret Results Draw Conclusions

5 Why do we need statistics? Variability can obscure important findings. We naturally assume that observed differences are real and not due to natural variability. Variability is the norm. Statistics allow us to draw from the sample, conclusions about the general population.

6 Sample vs Population Taking samples of information can be an efficient way to draw conclusions when the cost of gathering all the data is impractical. If you measure the concentration of factor X in the blood of 10 people, does that accurately reflect the concentration of Factor X of the human race in general? How about from 100, 1000, or 10,000 people? How about if you sampled everyone on the planet?

7 Statistical methods were developed based on a simple model: Assume that an infinitely large population of values exists and that your sample was randomly selected from a large subset of that population. Now, use the rules of probability to make inferences about the general population.

8 The Gaussian Distribution If samples are large enough, the sample distribution will be bellshaped. The Gaussian function describing this shape is defined as follows: ; where m represents the population mean and s the standard deviation.

9 An example of a Gaussian distribution

10 Descriptive Statistics Measures of Location A typical or central value that best describes the data. Measures of Dispersion Describe spread (varia.on) of the data around that central value. Mean Median Mode Range Variance Standard Deviation Standard Error Confidence Interval No single parameter can fully describe distribution of data in the sample. Most statistics software will provide a comprehensive table describing the distribution.

11 Measures of Location: Mean Mean More commonly referred to as the average. It is the sum of the data points divided by the number of data points. M = M = M=76.78 microns = 77 microns Migra&on Assay Cell # Distance travelled (Microns)

12 Measures of Location: Median Median for odd sample size The value which has half the data smaller than that point and half the data larger. For odd numbers, you first rank order then pick the middle number. Therefore the 5 th number in the sequence is the median = 62 microns. Migration assay Cell # Distance traveled (microns)

13 Measures of Location: Median Median for even sample size Unranked Ranked Find the middle two numbers then find the value that lies between them. Add two middle ones together and divide by 2. Median is (7+13)/2=10. The median is less sensitive for extreme scores than the mean and is useful for skewed data

14 Measures of Location: Mode Mode Value of the sample which occurs most frequently. It s a good measure of central tendency. The Mode for this data set is 72 since this is the number with the highest frequency in the data set. Not all data sets have a single mode. It s only useful in very limited situations. Data sets can be bi-modal. Marble Color Frequency Black 6 Brown 2 Blue 34 Purple 72 Pink 71 Green 58 Rainbow 34

15 Boxplots Largest observed value that is not an outlier 75 th percenple Median 25 th percenple Smallest observed value that is not an outlier 12, 13, 5, 8, 9, 20, 16, 14, 14, 6, 9, 12, 12 5, 6, 8, 9, 9, 12, 12,12, 13, 14, 14, 16, 20

16 Boxplots are used to display summary statistics

17 Measures of Location do not provide information on spread or variability of the data

18 Measures of Dispersion Describe the spread or variability within the data. Two distinct samples can have the same mean but completely different levels of variability. Which mean has a higher level of variability? 110 ± 5 or 110 ± 25 Typical measures of dispersion include Range, Variance, and Standard Deviation.

19 Measures of Dispersion: Range Cell # Range Distance traveled (microns) The difference between 1 24 the largest and smallest 2 27 sample values It depends only on 4 49 extreme values and 5 62 provides no information 6 78 about how the remaining 7 80 data is distributed For the cell migration data: Largest distance = 200 microns Smallest distance = 24 microns Range = = 176 microns. NOT a reliable measure of dispersion of the whole data set.

20 Measures of Dispersion: Variance Variance Defined as the average of the square distance of each value from the mean. To calculate variance, it is first necessary to calculate the mean score then measure the amount that each score deviates from the mean. The formula for calculating variance is: S 2 = (X M ) 2 N 1

21 Why Square? Squaring makes them all positive numbers (to eliminate negatives, which will reduce the variance. Makes the bigger differences stand out, (10,000) is a lot bigger than 50 2 (2500).

22 N vs N-1 N N- 1 Size of the population Size of the sample

23 For the cell migration data, the sample variance is: S 2 = ( 28)2 +( 50) 2 +(55) 2 +( 53) 2 +(1) 2 +(3) 2 +( 15) 2 +( 38) 2 +(123) 2 8 NOT a very user-friendly statistic.

24 Measures of Dispersion: Standard Deviation Standard Deviation The most common and useful measure of dispersion. Tells you how tightly each sample is clustered around the mean. When the samples are tightly bunched together, the Gaussian curve is narrow and the standard deviation is small. When the samples are spread apart, the Gaussian curve is flat and the standard deviation is large. The formula to calculate standard deviation is: SD = square root of the variance.

25 For this data set, the mean and standard deviation are: 77 ± 57 microns Conclusion: There s lots of scatter in this data set.

26 But then again. This is a fairly small population (n=9). What if we were to count the migration of 90, or 900, or 9000 cells. Would this give us a better sense of what the average migration distance is? In other words, how can we determine whether our mean is precise?

27 Precision of the Mean Standard Error A measure of how far the sample mean is away from the population mean. For our data set: SEM = SD N = 57 9 = 57 3 =19 SEM gets smaller as sample size increases since the mean of a larger sample is likely to be closer to the population mean. Increasing sample size does not change scatter in the data. SD may increase or decrease. Increasing sample size will, however, predictably reduce the standard error.

28 Should we show standard deviation or standard error? Use Standard Deviation If the scatter is caused by biological variability and you want to show that variability. For example: You aliquot 10 plates each with a different cell line and measure integrin expression of each. Use standard error If the variability is caused by experimental imprecision and you want to show the precision of the calculated mean. For example: You aliquot 10 plates of the same cell line and measure integrin expression of each.

29 Precision of the Mean Confidence Intervals Combines the scatter in any given population with the size of that population. Generates an interval in which the probability that the sample mean reflects the population mean is high. The formula for calculating CI: CI = X ± (SEM x Z) X is the sample mean and Z is the critical value for the normal distribution. For the 95% CI, Z=1.96. For our data set: 95% CI=77 ± (19x1.96)=77 ± 32 CI 95%= This means that there s a 95% chance that the CI you calculated contains the population mean.

30 CI: A Practical Example Data set A Data set B Data set A Data set B Mean SD SEM Low 95% CI High 95% CI Between these two data sets, which mean do you think best reflects the population mean and why?

31 SD/SEM/95% CI error bars SD SEM 95% CI

32 Outliers An observation that is numerically distant from the rest of the data. Can be caused by systematic error, flaw in the theory that generated the data point, or by natural variability.

33 How to deal with outliers? In general, we first quantify the difference between the mean and the outlier, then we divide by the scatter (usually SD). Grubb s test Z = mean value SD For the cell migration data set: The mean is 77 microns. The Sample furthest from the mean Is the 200 micron point and the SE is 19. So: Z = = 2.15

34 What does a Z value of mean? In order to answer this question, we must compare this number to a probability value (P) to answer the following question: If all the values were really sampled from a normal population, what is the chance of randomly obtaining an outlier so far from the other values? To do this, we compare the Z value obtained with a table listing the critical value of Z at the 95% probability level. If the computed Z is larger than the critical value of Z in the table, then the P value is less than 5% and you can delete the outlier.

35 For our data set: Z calc (-2.15) is less than Z Tab (-2.21), so P is greater than 5% and the outlier must be retained.

36 Topics Why do we need statistics? Sample vs population. Gaussian/normal distribution. Descriptive Statistics. Measures of location. Mean, Median, Mode. Measures of dispersion. Range, Variance, Standard Deviation. Precision of the mean. Standard Error, Confidence Interval. Outliers. Grubb s test. The null hypothesis. Significance testing. Variability. Comparing two means. T-test Group exercise Comparing 3 or more groups. ANOVA Group Excercise Linear Regression. Power Analysis.

37 The Null Hypothesis Appears in the form H o : µ 1 = µ 2 Where; H o = null hypothesis µ 1 = mean of population 1 µ 2 = mean of population 2 An alternate form is H o : µ 1 - µ 2 =0 The null hypothesis is presumed true until statistical evidence in the form of a hypothesis test proves otherwise.

38 Statistical Significance When a statistic is significant, it simply means that the statistic is reliable. It does not mean that it is biologically important or interesting. When testing the relationship between two parameters we might be sure that the relationship exists, but is it weak or strong?

39 Strong vs weak relationships r 2 = r 2 =1.000

40 Sources of Variability Random Error Caused by inherently unpredictable fluctuations in the readings of a measurement apparatus or in the experimenter's interpretation of the instrumental reading. Can occur in either direction. Systematic Error Is predictable, and typically constant or proportional to the true value. Systematic errors are caused by imperfect calibration of measurement instruments or imperfect methods of observation. Typically occurs only in 1 direction.

41 Some Examples Random Error Type of Error Example How to Minimize SystemaPc Error You measure the mass of a ring three Pmes using the same balance and get slightly different values: g, g, g The electronic scale you use reads 0.05 g too high for all your mass measurements (because it is improperly tared throughout your experiment). Take more data. Random errors can be evaluated through stapspcal analysis and can be reduced by averaging over a large number of observapons. SystemaPc errors are difficult to detect and cannot be analyzed stapspcally, because all of the data is off in the same direcpon (either too high or too low).

42 Repeatability/Reproducibility Repeatability The variation in measurements taken by a single person or instrument on the same item and under the same conditions. An experiment, if performed by the same person, using the same equipment, reagents, and supplies, must yield the same result. Reproducibility The ability of a test or experiment to be accurately reproduced or replicated by someone else working independently. Cold fusion is an example of an unreproducible experiment.

43 Hypothesis Testing Observe Phenomenon Propose Hypothesis Design Study Statistics are an important Part of the study design Collect and Analyze Data Interpret Results Draw Conclusions

44 Comparing Two Means Are these two means significantly different? Variability can strongly influence whether the means are different. Consider these 3 scenarios: Which of these will likely yield significant differences?

45 Comparing Two Means Student t-test Introduced in 1908 by William Sealy Gosset. Gosset was a chemist working for the Guiness Brewery in Dublin. He devised the t-test as a way to cheaply monitor the quality of Stout. He was forced to use a penname by his employer-he chose to use the name Student. N < 30 Independent data points, except when using a paired t-test. Normal distribution for equal and unequal variance Random sampling Equal sample size. Degrees of freedom important. Most useful when comparing 2 sample means.

46 The Student t-test Given two data sets, each characterized by it s mean, standard deviation, and number of samples, we can determine whether the means are significant by using a t-test. Note below that the difference between the means is the same but The variability is very different.

47 An Example Drop # Sample 1 Sample The null hypothesis states that there is no difference in the means between samples: 1) Calculate means. 2) Calculate SDs. 3) Calculate SEs. 4) Calculate t-value. 5) Compare t calc to t tab. 6) Accept/reject H o.

48 Plot Data Box Plot Bar Graph

49 1) Calculate Mean M 1 = X 1 = N M 2 = X = N 2 10 = 341 = 156

50 2) Calculate SD SD 1 = (x i M 1 ) 2 N 1 = ( ) 2 + ( ) 2 + ( ) 2 + ( ) 2 + ( ) 2 + ( ) 2 + ( ) 2 + ( ) 2 + ( ) 2 + ( ) 2 9 = = 1631 = 40 SD 2 = (x i M 2 ) 2 N 1 = ( ) 2 + ( ) 2 + ( ) 2 + ( ) 2 + ( ) 2 + ( ) 2 + ( ) 2 + ( ) 2 + ( ) 2 + ( ) 2 9 = = 854 = 29

51 3) Calculate SE SE 1 = SD 1 N = = = 12.5 = 13 SE 2 = SD 2 N = = = 9.1 = 9

52 4) Caculate the t-statistic Sample 1 Sample 2 Mean SD SE 13 9 N t = (13) 2 + (9) 2 = = = 11.6 Now we have to compare our t-value to a table of critical t-values to determine whether the sample means differ. But.

53 We first have to determine the degrees of freedom. Describe the number of values in the final calculation of a statistic that are free to vary. For our data set, the degrees of freedom is 2N-2 or 2(10)-2 or 20-2=18.

54 Why 18 degrees of freedom. To calculate SD we must first calculate the mean and then compute the sum of the several squared deviations from that mean. While there are n deviations, only n-1 are actually free to assume any value whatsoever. This is because we used an n value to calculate the mean. Since we have 2 data sets, then df=2n-2=18

55 Did you hear the one about the statistician who was thrown in jail? He now has zero degrees of freedom.

56 5) Compare t calc to t tab for 18 df For the 95% confidence level and a df of 18, t tab = Our t-value was Since t calc > t tab, then we must reject the H o and conclude that the sample means are significantly different.

57 Your Turn

58 One-tailed vs two-tailed t-test One-tailed t-test Two-tailed t-test A one-tailed test will test either if the mean is significantly greater than x or if the mean is significantly less than x, but not both. The one-tailed test provides more power to detect an effect in one direction by not testing the effect in the other direction. A two-tailed test will test both if the mean is significantly greater than x and if the mean significantly less than x. The mean is considered significantly different from x if the test statistic is in the top 2.5% or bottom 2.5% of its probability distribution, resulting in a p-value less than 0.05.

59 Paired vs Unpaired t-test Paired The observed data are from the same subject or from a matched subject and are drawn from a population with a normal distribution. Example: Measuring glucose concentration in diabetic patients before and after insulin injection. Unpaired The observed data are from two independent, random samples from a population with a normal distribution. Example: Measuring glucose concentration of diabetic patients versus nondiabetics.

60 Comparing Three or More Means Why not just do multiple t-tests? If you set the confidence level at 5% and do repeated t-tests, you will eventually reject the null hypothesis when you shouldn t i.e you increase your chance of making a Type I error. Number of Groups Number of Comparisons α=

61 Frog Germ Layer Experiment Germ Later Surface Tensions Mul&ple t- test results for significance Endo vs Meso: p= Yes Endo vs Ecto: p= Yes Endo vs Ecto under: p= Yes Meso vs Ecto: p= No Meso vs Ecto under: p= Yes Ecto vs Ecto under: p= Yes 4 groups, 6 possible comparisons, 26% chance of detecting significant difference When non exists.

62 To compare three or more means we must use Analysis of Variance (ANOVA) In ANOVA we don t actually measured variance. We measure a term called sum of squares. There are 3 sum of squares we need to measure. 1) Total sum of squares. Total scatter around the grand mean. 2) Between-group sum of squares. Total scatter of the group means with respect to the grand mean. 3) Within-group sum of squares. The scatter of the scores.

63 Frog Germ Layer Experiment Germ Layer Surface Tensions Anova/MCT

64 Frog Germ Layer Experiment Germ Layer Surface Tensions Comparison T- test Anova/MCT Endo vs Yes No Meso Endo vs. Ecto Endo vs Ecto under Meso vs Ecto Meso vs Ecto under Ecto vs Ecto under Yes Yes No Yes Yes No Yes No Yes Yes

65 ANOVA The fundamental equation for ANOVA is: From this we can calculate the mean sum of squares by dividing the sum of squares by the degrees of freedom. SS Tot = SS BG + SS WG MS BG = SS BG df BG ; MS WG = SS WG df WG We can then calculate the F statistic: F = MS BG MS WG

66 To calculate sums of squares we first need to calculate two types of means. 1) group means ( X ) 2) the grand mean (X) SS total = ( X X) 2 Sum of squares of each sample (X) minus the grand mean. SS BG = ( X X) 2 # groups Sum of squares of each group mean minus the grand mean, multiplied by the number of groups. SS WG = ( x x) 2 Sum of squares of each sample (X) minus the group mean.

67 df for ANOVA To calculate the MS BG and MS WG, we need to know the Df. To determine the df for these two parameters we need to partition: MS BG = SS BG df BG MS WG = SS WG df WG df of SS BG = n-1 of how many groups there are. Therefore for 3 groups, df=2. df of SS WG = n-1 of all groups. Therefore for 30 samples (10 in each of the 3 groups), df=27. We can then compared the F calc to the F tab to determine whether significant differences exist in the entire data set.

68 Your Turn.

69 One-way versus two-way ANOVA One-Way ANOVA 1 measurement variable and 1 nominal variable. For example, you might measure glycogen content for multiple samples of heart, liver, kidney, lung etc Two-Way ANOVA 1 measurement variable and 2 nominal variables. For example, you might measure a response to three different drugs in both men and women. Drug treatment is one factor and gender is the other.

70 ANOVA only tells us that the smallest and largest means likely differ from each other. But what about other means? In order to test other means, we have to run post hoc multiple comparisons tests.

71 Post hoc tests Are only used if the null hypothesis is rejected. There are many, including Tukey s, Bonferroni s, Schefe s, Dunn s, Newman-Keul s. All test whether any of the group means differ significantly. These tests don t suffer from the same issues as performing multiple t-tests. They all apply different corrections to account for the multiple comparisons. Accordingly, some post hoc tests are more stringent than others.

72 Linear Regression The goal of linear regression is to adjust the values of slope and intercept to find the line that best predicts Y from X.

73 More precisely, the goal is to minimize the sum of the squares of the vertical distances of the points from the line. Note that linear regression does not test whether your data are linear. It assumes that your data are linear, and finds the slope and intercept that make a straight line that best fits your data.

74 r 2, a measure of goodness-of-fit of The value r 2 is a fraction between 0.0 and 1.0, and has no units An r 2 value of 0.0 means that knowing X does not help you predict Y. When r 2 equals 1.0, all points lie exactly on a straight line with no scatter. Knowing X lets you predict Y perfectly. linear regression

75 How is r 2 calculated? The left panel shows the best-fit linear regression line. In this example, the sum of squares of those distances (SSreg) equals The right half of the figure shows the null hypothesis -- a horizontal line through the mean of all the Y values. Goodnessof-fit of this model (SStot) is

76 An Example

77 Power Analysis: How many samples are enough? If sample size is too low, the experiment will lack the precision to provide reliable answers to the questions it is investigating. If sample size is too large, time and resources will be wasted, often for minimal gain.

78 Calculation of power requires 3 pieces of information: 1) A research hypothesis. This will determine how many control and treatment groups are required. 2) The variability of the outcomes measure. Standard Deviation is the best option. 3) An estimate of the clinically (or biologically) relevant difference. A difference between groups that is large enough to be considered important. By convention, this is set at 0.8 SD.

79 An Example We would like to design a study to measure two skin barriers for burn patients. We are interested in pain as the clinical outcome using the Oucher scale (1-5). We know from previous studies that the Oucher scale has a SD of 1.5. What is the sample size to detect 1 unit (D) on the Oucher scale. Here s the equation: n = (σ 1 2 +σ 2 2) (z 1 α / 2 + z 1 β ) 2 D 2 Here, α is the critical value of z at (1.96) and β is power at 80% (0.84).

80 n = ( ) ( ) 2 = 35.3 = What would happen to n if our clinically relevant difference was set at 2 Oucher units. Here: n = ( ) ( ) 2 What would happen to n if our clinically relevant difference was set at 0.5 Oucher units. Here: n = ( ) ( ) = 8.8 = = = 142

81 Another Example You want to measure whether aggregates of invasive cell lines are less cohesive than those generated from noninvasive counterparts. You know that SD for the control group is 3dynes/cm and for the invasive group is 2dynes/cm. You set the α at 0.05 (=1.96) and β at 80% (=0.84) and D at 2 dynes/ cm. How many aggregates from each group would you need? n = ( ) ( ) 2 n = n = 2 2 (9 + 4) (2.80) = = 25.5 = 26 Therefore, we need 26 aggregates in each group to be able to reliably detect a difference of 2 dynes/cm cohesivity between invasive and non-invasive cells.

82 In general, how do variability, detection difference, and power influence n? More variability in the data Less variability in the data Detect small differences between groups Detect large differences between groups Smaller α (0.01) Less power (smaller β) Higher n required Fewer n required Higher n required Fewer n required Higher n required Fewer n required

83

Statistical Analysis of Chemical Data Chapter 4

Statistical Analysis of Chemical Data Chapter 4 Statistical Analysis of Chemical Data Chapter 4 Random errors arise from limitations on our ability to make physical measurements and on natural fluctuations Random errors arise from limitations on our

More information

OHSU OGI Class ECE-580-DOE :Design of Experiments Steve Brainerd

OHSU OGI Class ECE-580-DOE :Design of Experiments Steve Brainerd Why We Use Analysis of Variance to Compare Group Means and How it Works The question of how to compare the population means of more than two groups is an important one to researchers. Let us suppose that

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Hypothesis testing: Steps

Hypothesis testing: Steps Review for Exam 2 Hypothesis testing: Steps Exam 2 Review 1. Determine appropriate test and hypotheses 2. Use distribution table to find critical statistic value(s) representing rejection region 3. Compute

More information

Sampling Distributions: Central Limit Theorem

Sampling Distributions: Central Limit Theorem Review for Exam 2 Sampling Distributions: Central Limit Theorem Conceptually, we can break up the theorem into three parts: 1. The mean (µ M ) of a population of sample means (M) is equal to the mean (µ)

More information

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation). Basic Statistics There are three types of error: 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation). 2. Systematic error - always too high or too low

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

INTRODUCTION TO ANALYSIS OF VARIANCE

INTRODUCTION TO ANALYSIS OF VARIANCE CHAPTER 22 INTRODUCTION TO ANALYSIS OF VARIANCE Chapter 18 on inferences about population means illustrated two hypothesis testing situations: for one population mean and for the difference between two

More information

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions Part 1: Probability Distributions Purposes of Data Analysis True Distributions or Relationships in the Earths System Probability Distribution Normal Distribution Student-t Distribution Chi Square Distribution

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

Descriptive Statistics-I. Dr Mahmoud Alhussami

Descriptive Statistics-I. Dr Mahmoud Alhussami Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.

More information

Question. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not?

Question. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not? Hypothesis testing Question Very frequently: what is the possible value of μ? Sample: we know only the average! μ average. Random deviation or not? Standard error: the measure of the random deviation.

More information

Hypothesis testing: Steps

Hypothesis testing: Steps Review for Exam 2 Hypothesis testing: Steps Repeated-Measures ANOVA 1. Determine appropriate test and hypotheses 2. Use distribution table to find critical statistic value(s) representing rejection region

More information

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions Introduction to Analysis of Variance 1 Experiments with More than 2 Conditions Often the research that psychologists perform has more conditions than just the control and experimental conditions You might

More information

This gives us an upper and lower bound that capture our population mean.

This gives us an upper and lower bound that capture our population mean. Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

WELCOME! Lecture 13 Thommy Perlinger

WELCOME! Lecture 13 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 13 Thommy Perlinger Parametrical tests (tests for the mean) Nature and number of variables One-way vs. two-way ANOVA One-way ANOVA Y X 1 1 One dependent variable

More information

One-way between-subjects ANOVA. Comparing three or more independent means

One-way between-subjects ANOVA. Comparing three or more independent means One-way between-subjects ANOVA Comparing three or more independent means ANOVA: A Framework Understand the basic principles of ANOVA Why it is done? What it tells us? Theory of one-way between-subjects

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Statistics: Error (Chpt. 5)

Statistics: Error (Chpt. 5) Statistics: Error (Chpt. 5) Always some amount of error in every analysis (How much can you tolerate?) We examine error in our measurements to know reliably that a given amount of analyte is in the sample

More information

Statistical Inference for Means

Statistical Inference for Means Statistical Inference for Means Jamie Monogan University of Georgia February 18, 2011 Jamie Monogan (UGA) Statistical Inference for Means February 18, 2011 1 / 19 Objectives By the end of this meeting,

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Vocabulary: Samples and Populations

Vocabulary: Samples and Populations Vocabulary: Samples and Populations Concept Different types of data Categorical data results when the question asked in a survey or sample can be answered with a nonnumerical answer. For example if we

More information

Math 10 - Compilation of Sample Exam Questions + Answers

Math 10 - Compilation of Sample Exam Questions + Answers Math 10 - Compilation of Sample Exam Questions + Sample Exam Question 1 We have a population of size N. Let p be the independent probability of a person in the population developing a disease. Answer the

More information

Statistics: revision

Statistics: revision NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 29 / 30 April 2004 Department of Experimental Psychology University of Cambridge Handouts: Answers

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) Analysis of Variance (ANOVA) Two types of ANOVA tests: Independent measures and Repeated measures Comparing 2 means: X 1 = 20 t - test X 2 = 30 How can we Compare 3 means?: X 1 = 20 X 2 = 30 X 3 = 35 ANOVA

More information

One-way between-subjects ANOVA. Comparing three or more independent means

One-way between-subjects ANOVA. Comparing three or more independent means One-way between-subjects ANOVA Comparing three or more independent means Data files SpiderBG.sav Attractiveness.sav Homework: sourcesofself-esteem.sav ANOVA: A Framework Understand the basic principles

More information

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing

More information

Psych 230. Psychological Measurement and Statistics

Psych 230. Psychological Measurement and Statistics Psych 230 Psychological Measurement and Statistics Pedro Wolf December 9, 2009 This Time. Non-Parametric statistics Chi-Square test One-way Two-way Statistical Testing 1. Decide which test to use 2. State

More information

13: Additional ANOVA Topics. Post hoc Comparisons

13: Additional ANOVA Topics. Post hoc Comparisons 13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Post hoc Comparisons In the prior chapter we used ANOVA

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Population Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8

Population Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8 Concepts from previous lectures HUMBEHV 3HB3 one-sample t-tests Week 8 Prof. Patrick Bennett sampling distributions - sampling error - standard error of the mean - degrees-of-freedom Null and alternative/research

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

Biostatistics 4: Trends and Differences

Biostatistics 4: Trends and Differences Biostatistics 4: Trends and Differences Dr. Jessica Ketchum, PhD. email: McKinneyJL@vcu.edu Objectives 1) Know how to see the strength, direction, and linearity of relationships in a scatter plot 2) Interpret

More information

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6. Chapter 7 Reading 7.1, 7.2 Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.112 Introduction In Chapter 5 and 6, we emphasized

More information

1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11.

1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11. Chapter 4 Statistics 45 CHAPTER 4 BASIC QUALITY CONCEPTS 1.0 Continuous Distributions.0 Measures of Central Tendency 3.0 Measures of Spread or Dispersion 4.0 Histograms and Frequency Distributions 5.0

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Multiple Comparisons

Multiple Comparisons Multiple Comparisons Error Rates, A Priori Tests, and Post-Hoc Tests Multiple Comparisons: A Rationale Multiple comparison tests function to tease apart differences between the groups within our IV when

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Comparing Several Means: ANOVA

Comparing Several Means: ANOVA Comparing Several Means: ANOVA Understand the basic principles of ANOVA Why it is done? What it tells us? Theory of one way independent ANOVA Following up an ANOVA: Planned contrasts/comparisons Choosing

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Eplained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Week 8: Correlation and Regression

Week 8: Correlation and Regression Health Sciences M.Sc. Programme Applied Biostatistics Week 8: Correlation and Regression The correlation coefficient Correlation coefficients are used to measure the strength of the relationship or association

More information

appstats8.notebook October 11, 2016

appstats8.notebook October 11, 2016 Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus

More information

Statistics and parameters

Statistics and parameters Statistics and parameters Tables, histograms and other charts are used to summarize large amounts of data. Often, an even more extreme summary is desirable. Statistics and parameters are numbers that characterize

More information

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126 Psychology 60 Fall 2013 Practice Final Actual Exam: This Wednesday. Good luck! Name: To view the solutions, check the link at the end of the document. This practice final should supplement your studying;

More information

ANCOVA. Lecture 9 Andrew Ainsworth

ANCOVA. Lecture 9 Andrew Ainsworth ANCOVA Lecture 9 Andrew Ainsworth What is ANCOVA? Analysis of covariance an extension of ANOVA in which main effects and interactions are assessed on DV scores after the DV has been adjusted for by the

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank Predicting performance Assume the estimated error rate is 5%. How close is this to the true error rate? Depends on the amount of test data Prediction

More information

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami Unit Two Descriptive Biostatistics Dr Mahmoud Alhussami Descriptive Biostatistics The best way to work with data is to summarize and organize them. Numbers that have not been summarized and organized are

More information

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career. Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences

More information

Interpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score

Interpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score Interpret Standard Deviation Outlier Rule Linear Transformations Describe the Distribution OR Compare the Distributions SOCS Using Normalcdf and Invnorm (Calculator Tips) Interpret a z score What is an

More information

How do we compare the relative performance among competing models?

How do we compare the relative performance among competing models? How do we compare the relative performance among competing models? 1 Comparing Data Mining Methods Frequent problem: we want to know which of the two learning techniques is better How to reliably say Model

More information

BIOSTATISTICS NURS 3324

BIOSTATISTICS NURS 3324 Simple Linear Regression and Correlation Introduction Previously, our attention has been focused on one variable which we designated by x. Frequently, it is desirable to learn something about the relationship

More information

Chapter 23. Inference About Means

Chapter 23. Inference About Means Chapter 23 Inference About Means 1 /57 Homework p554 2, 4, 9, 10, 13, 15, 17, 33, 34 2 /57 Objective Students test null and alternate hypotheses about a population mean. 3 /57 Here We Go Again Now that

More information

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means 4.1 The Need for Analytical Comparisons...the between-groups sum of squares averages the differences

More information

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples Objective Section 9.4 Inferences About Two Means (Matched Pairs) Compare of two matched-paired means using two samples from each population. Hypothesis Tests and Confidence Intervals of two dependent means

More information

Inferential Statistics

Inferential Statistics Inferential Statistics Part 1 Sampling Distributions, Point Estimates & Confidence Intervals Inferential statistics are used to draw inferences (make conclusions/judgements) about a population from a sample.

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES BIOL 458 - Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES PART 1: INTRODUCTION TO ANOVA Purpose of ANOVA Analysis of Variance (ANOVA) is an extremely useful statistical method

More information

Confidence Intervals with σ unknown

Confidence Intervals with σ unknown STAT 141 Confidence Intervals and Hypothesis Testing 10/26/04 Today (Chapter 7): CI with σ unknown, t-distribution CI for proportions Two sample CI with σ known or unknown Hypothesis Testing, z-test Confidence

More information

Basic Statistical Analysis

Basic Statistical Analysis indexerrt.qxd 8/21/2002 9:47 AM Page 1 Corrected index pages for Sprinthall Basic Statistical Analysis Seventh Edition indexerrt.qxd 8/21/2002 9:47 AM Page 656 Index Abscissa, 24 AB-STAT, vii ADD-OR rule,

More information

Introduction to Basic Statistics Version 2

Introduction to Basic Statistics Version 2 Introduction to Basic Statistics Version 2 Pat Hammett, Ph.D. University of Michigan 2014 Instructor Comments: This document contains a brief overview of basic statistics and core terminology/concepts

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 65 http://www.stat.tamu.edu/~suhasini/teaching.html Suhasini Subba Rao Review In the previous lecture we considered the following tests: The independent

More information

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b). Confidence Intervals 1) What are confidence intervals? Simply, an interval for which we have a certain confidence. For example, we are 90% certain that an interval contains the true value of something

More information

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are

More information

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters Objectives 10.1 Simple linear regression Statistical model for linear regression Estimating the regression parameters Confidence interval for regression parameters Significance test for the slope Confidence

More information

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary Patrick Breheny October 13 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction Introduction What s wrong with z-tests? So far we ve (thoroughly!) discussed how to carry out hypothesis

More information

Relationships between variables. Visualizing Bivariate Distributions: Scatter Plots

Relationships between variables. Visualizing Bivariate Distributions: Scatter Plots SFBS Course Notes Part 7: Correlation Bivariate relationships (p. 1) Linear transformations (p. 3) Pearson r : Measuring a relationship (p. 5) Interpretation of correlations (p. 10) Relationships between

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

The t-statistic. Student s t Test

The t-statistic. Student s t Test The t-statistic 1 Student s t Test When the population standard deviation is not known, you cannot use a z score hypothesis test Use Student s t test instead Student s t, or t test is, conceptually, very

More information

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials. One-Way ANOVA Summary The One-Way ANOVA procedure is designed to construct a statistical model describing the impact of a single categorical factor X on a dependent variable Y. Tests are run to determine

More information

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E Salt Lake Community College MATH 1040 Final Exam Fall Semester 011 Form E Name Instructor Time Limit: 10 minutes Any hand-held calculator may be used. Computers, cell phones, or other communication devices

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides Chapter 7 Inference for Distributions Introduction to the Practice of STATISTICS SEVENTH EDITION Moore / McCabe / Craig Lecture Presentation Slides Chapter 7 Inference for Distributions 7.1 Inference for

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc. Chapter 23 Inferences About Means Sampling Distributions of Means Now that we know how to create confidence intervals and test hypotheses about proportions, we do the same for means. Just as we did before,

More information

Background to Statistics

Background to Statistics FACT SHEET Background to Statistics Introduction Statistics include a broad range of methods for manipulating, presenting and interpreting data. Professional scientists of all kinds need to be proficient

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 4 Problems with small populations 9 II. Why Random Sampling is Important 10 A myth,

More information

STATISTICS 141 Final Review

STATISTICS 141 Final Review STATISTICS 141 Final Review Bin Zou bzou@ualberta.ca Department of Mathematical & Statistical Sciences University of Alberta Winter 2015 Bin Zou (bzou@ualberta.ca) STAT 141 Final Review Winter 2015 1 /

More information

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

Review. One-way ANOVA, I. What s coming up. Multiple comparisons Review One-way ANOVA, I 9.07 /15/00 Earlier in this class, we talked about twosample z- and t-tests for the difference between two conditions of an independent variable Does a trial drug work better than

More information

Introduction to Analysis of Variance. Chapter 11

Introduction to Analysis of Variance. Chapter 11 Introduction to Analysis of Variance Chapter 11 Review t-tests Single-sample t-test Independent samples t-test Related or paired-samples t-test s m M t ) ( 1 1 ) ( m m s M M t M D D D s M t n s s M 1 )

More information

Preview from Notesale.co.uk Page 3 of 63

Preview from Notesale.co.uk Page 3 of 63 Stem-and-leaf diagram - vertical numbers on far left represent the 10s, numbers right of the line represent the 1s The mean should not be used if there are extreme scores, or for ranks and categories Unbiased

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Sleep data, two drugs Ch13.xls

Sleep data, two drugs Ch13.xls Model Based Statistics in Biology. Part IV. The General Linear Mixed Model.. Chapter 13.3 Fixed*Random Effects (Paired t-test) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module

More information

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population Lecture 5 1 Lecture 3 The Population Variance The population variance, denoted σ 2, is the sum of the squared deviations about the population mean divided by the number of observations in the population,

More information

Confidence Intervals. - simply, an interval for which we have a certain confidence.

Confidence Intervals. - simply, an interval for which we have a certain confidence. Confidence Intervals I. What are confidence intervals? - simply, an interval for which we have a certain confidence. - for example, we are 90% certain that an interval contains the true value of something

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 1- part 1: Describing variation, and graphical presentation Outline Sources of variation Types of variables Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease

More information