Everything is not normal

Size: px
Start display at page:

Download "Everything is not normal"

Transcription

1 Everything is not normal According to the dictionary, one thing is considered normal when it s in its natural state or conforms to standards set in advance. And this is its normal meaning. But, like many other words, normal has many other meanings. In statistics, we talking about normal we refer to a given probability distribution that is called the normal distribution, the famous Gauss bell. This distribution is characterized by its symmetry around its mean that coincides with its median, in addition to other features which we discussed in a previous post. The great advantage of the normal distribution is that it allows us to calculate probabilities of occurrence of data from this distribution, which results in the possibility of inferring population data using a sample obtained from it. Thus, virtually all parametric techniques of hypothesis testing need that data follow a normal distribution. One might think that this is not a big problem. If it s called normal it ll be because biological data do usually follow, more or less, this distribution. Big mistake, many data follow a distribution that deviates from normal. Consider, for example, the consumption of alcohol. The data will not be grouped symmetrically around a mean. In contrast, the distribution will have a positive bias (to the right): there will be a large number around zero (abstainers or very occasional drinkers) and a long right tail formed by people with higher consumption. The tail is long extended to the right with the consumption values of those people who eat breakfast with bourbon. And how affect our statistical calculations the fact that the variable doesn t follow a normal?. What should we do if data are not normal?. The first thing to do is realize that the variable is not normally distributed. We have already seen there are a number of graphical methods that allow us to visually approximate if the data follow the normal. Histograms and box-plots allow us to test whether the distribution is skewed, if too flat or peaked, or have extreme values. But the most specific graphic for this purpose is the normal probability plot (q-q plot), where the values are set to the diagonal line if they are normally distributed. Another possibility is to use numeric contrast tests such as the Shapiro-Wilk s or the Kolmogorov-Smirnov s. The problem with these tests is that they are very sensitive to the effect of sample size. If the sample is large they can be affected by minor deviations from normality. Conversely,

2 if the sample is small, they may fail to detect large deviations from normality. But these tests also have another drawback you will understand better after a small clarification. We know that in any hypothesis testing we set a null hypothesis that usually says the opposite of what we want to show. Thus, if the value of statistical significance is lower than a set value (usually 0.05), we reject the null hypothesis and stayed with the alternative, that says what we want to prove. The problem is that the null hypothesis is only falsifiable, it can never be said to be true. Simply, if the statistical significance is high, we cannot say it s untrue, but that does not mean it s true. It may happen that the study did not have enough power to reject a null hypothesis that is, in fact, false. Well, it happens that contrasts for normality are set with a null hypothesis that the data follow a normal. Therefore, if the significance is small, we can reject the null and say that the data are not normal. But if the significance is high, we simply cannot reject it and we will say that we have no ability to say that the data are not normal, which is not the same as to say that the fit a normal distribution. For these reasons, it is always advisable to complement numerical contrasts with some graphical method to test the normality of the variable. Once we know that the data are not normal, we must take this into account when describing them. If the distribution is highly skewed we cannot use the mean as a measure of central tendency and we must resort to other robust estimators such as the median or other parameters available for these situations. Furthermore, the absence of normality may discourage the use of parametric contrast tests. Student s t test and the analysis of variance (ANOVA) require that the distribution is normal. Student s t is quite robust in this regard, so that if the sample is large (n> 80) it can be used with some confidence. But if the sample is small or very different from the normal, we cannot use parametric contrast tests. One of the possible solutions to this problem would be to attempt a data transformation. The most frequently used in biology is the logarithmic transformation, useful to approximate to a normal distribution when the distribution is right-skewed. We mustn t forget to undo the transformation once the contrast in question has been made. The other possibility is to use nonparametric tests, which require no assumption about the distribution of the variable. Thus, to compare two means of unpaired data we will use the Wilcoxon s rank sum test (also called Mann-Whitney s U test). If data are paired we will have to use the Wilcoxon s sign rank test. If we compare more than two means, the Kruskal-

3 Wallis test will be the nonparametric equivalent of the ANOVA. Finally, remember that the nonparametric equivalent of the Pearson s correlation coefficient is the Spearman s correlation coefficient. The problem is that nonparametric tests are more demanding than their parametric equivalent to obtain statistical significance, but they must be used as soon as there s any doubt about the normality of the variable we re contrasting. And here we will stop for today. We could have talked about a third possibility of facing a not-normal variable, much more exotic than those mentioned. It is the use of resampling techniques such as bootstrapping, which consists of building an empirical distribution of the means of many samples drawn from our data to make inferences with the results, thus preserving the original units of the variable and avoiding the swing of data transformation techniques. But that s another story Some comparisons are not odious It s often said that comparisons are odious. And the truth is that it is not appropriate to compare people or things together, since each has its values and there s no need of being slighted for doing something differently. So it s not surprising that even the Quixote said that comparisons are always odious. Of course, this may be said about everyday life, because in medicine we are always comparing things together, sometimes in a rather beneficial way. Today we are going to talk about how to compare two data distributions graphically and we ll look at an application of this type of comparison that helps us to check whether our data follow a normal distribution. Imagine for a moment that we have a hundred serum cholesterol values from schoolchildren. What will we get if we plot the value against themselves linearly? Simple: the result would be a perfect straight line cross the diagonal of the graph. Now think about what would happen if instead of comparing with themselves we compare them with a different distribution. If the two data

4 distributions are very similar, the dots on the graph will be placed very close to the diagonal. If the distributions differ, the dots will go away from the diagonal, the further the more different the two distributions. Let s look at an example. Let s suppose we divide our distribution into two parts, the cholesterol of boys and girls. According to what our imagination tells us, the boys eat more industrial bakery than the girls, so their cholesterol level are higher, as you can see if you compare the curve from girls (black) with those of children (blue). Now, if we represent the values of the girls against the values of the boys linearly, as can be seen in the figure, the dot are far from the diagonal, being evenly over it. What is the reason of this? The values of boys are higher than the values of girls. You will tell me that all this is fine, but it can be a bit unnecessary. After all, if we want to know who have the highest values all that we have to do is look at the curves. And you will be right in your reasoning, but this type of graph has been designed for something different, which is to compare a distribution with its normal equivalent. Imagine that we have our first global distribution and we want to know if it follows a normal distribution. We only have to calculate its mean and standard deviation and represent its quantiles against the theoretical quantiles of a normal distribution with the same mean and standard deviation. If our data are normally distributed, the dots will align with the diagonal of the graph. The more they go away from it, the less likely that our data follow a normal distribution. This type of graph is called quantile-quantile plot or, more commonly, q-q plot. Let s see an example of q-q plot for its better understanding. In the second graph you can see two curves, one blue colored representing a normal distribution and a black one following a Student s t distribution. On the right side you can see the q-q plot of the Student s distribution. Central data fits quite well the diagonal, but extreme data do it worse, varying the slope of the line. This indicates that there are more data under the

5 tails of the distribution that the data that there would be if it were a normal distribution. Of course, this should not surprise us, since we know that the heavy tails are a feature of the Student s distribution. Finally, in the third graph you can see a normal distribution and its q- q plot, in which we can see how the dots fit quite well to the diagonal of the graph. As you can see, the q-q plot is a simple graphical method to determine if the data follow a normal distribution. You may say that it would be a bit tedious to calculate the quantiles of our distribution and those of the equivalent normal distribution, but remember that most statistical software can do it effortlessly. For instance, R has a function called qqnorm() that draws a q-q plot in a blink. And here we are going to end with the normal fitting by now. Just remember that there re other more accurate numerical methods to find out if data fit a normal distribution, such as the Kolmogorov-Smirnov s test or the Shapiro-Wilk s test. But that s another story

6 The big family Moviegoers do not be mistaken. We are not going to talk about the 1962 year movie in which little Chencho get lost in the Plaza Mayor at Christmas and it takes until summer to find him, largely thanks to the search tenacity of his grandpa. Today we re going to talk about another large family related to probability density functions and I hope nobody ends up as lost as the poor Chencho on the film. No doubt the queen of density functions is the normal distribution, the bell-shaped. This is a probability distribution that is characterized by its mean and standard deviation and is at the core of all the calculus of probability and statistical inference. But there re other continuous probability functions that look something or much to the normal distribution and that are also widely used when contrasting hypothesis. The first one we re going to talk about is the Student s t distribution. For those curious of science history I ll say that the inventor of this statistic was actually William Sealy Gosset, but as he must have liked his name very little, he used to sign his writings under the pseudonym of Student. Hence the name of this statistic. This density function is a bell-shaped one that is distributed symmetrically around its mean. It s very similar to the normal curve, although with a heavier tails; this is the reason why this distribution estimates are less accurate when the sample is small, since more data under the tails implies always the possibility of having more results far from the mean. There are an infinite number of student s t distributions, all of them characterized by their mean, variance and degrees of freedom, but when the sample size is greater than 30 (with increasing the degrees of freedom), t distribution can be approximate to a normal distribution, so we

7 can use the latter without making big mistakes. Student s t is used to compare the means of normally distributed populations when their sample sizes are small or when the values of the populations variances are unknown. And this works so because if we subtract the mean from a sample of variables and divide the result by the standard error, the value we get follows a Student s t distribution. Another member of this family of continuous distributions is that of the chi-square, which also plays an important role in statistics. If we have a sample of normally distributed variables and we squared them, their sum will follow a chi-square with a number of degrees of freedom equal to the sample size. In practice, when we have a series of values of a variable, we can subtract the expected values under the null hypothesis from the observed ones, square these differences, and add them up to check the probability of coming up with that value according to the density function of a chi-square. So, we will decide whether to reject or not our null hypothesis. This technique can be used with three aims: determining the goodness of fit to a theoretical population, to test the homogeneity of two populations and to contrast the independence of two variables. Unlike the normal distribution, chi-square s density function only has positive values, so it is asymmetric with a long right tail. What happens is that the curve becomes gradually more symmetric as degrees of freedom increase, increasingly resembling a normal distribution. The last distribution of which we are going to talk about is the Snedecor s F distribution. There s not surprise in its name about their invention, although it seems that a certain Fisher was also involved in the creation of this statistic. This distribution is more related to the chi-square than to normal

8 distribution, because it s de density function of the ratio of two chisquare distributions. As is easy to understand, it only has positive values and its shape depends on the number of degrees of freedom of the two chisquare distribution that determine it. This distribution is used for the constrast of means in the analysis of variance (ANOVA). In summary, we can see that there re several very similar density function distributions to calculate probabilities and that are useful in various hypothesis contrast. But there re many more, as the bivariate normal distribution, the negative binomial distribution, the uniform distribution, and the beta and gamma distributions, to name a few. But that s another story The most famous of bells The dictionary says that a bell is a simple device that makes a sound. But a bell can be much more. I think there s even a plant with that name and a flower with its diminutive. But undoubtedly, the most famous of all bells is the renowned Gauss bell curve, the most beloved and revered by statisticians and other species of scientific. But, what is a bell curve?. It s nothing more, nor less, than a probability density function. Put another way, it is a continuous probability distribution with a symmetrical bell-shape, hence the first part of its name. And I say the first part because the second one is more controversial because it is not quite clear that Gauss is the father of the child. It seems that the first who use this density function was somebody named Moivre, who was studying was happened to a binomial distribution when the sample size is large. Yet another of the many injustices of History, the name of the function is associated with Gauss, who used it some 50 years later to record data from his astronomical studies. Of course, for defense of Gauss, some people say the two of them discovered the density function independently. To avoid controversy, we will call it from now on by its other name, different from Gauss bell: normal distribution. And it seems that it was so named because people used to think that most natural phenomena were

9 consistent with this distribution. Later in time, it was found that there re other distributions that are very common in biology, such as the binomial and Poisson s. As it happens with any other density function, the utility of normal curve is that it represents the probability distribution of occurrence of the random variable we are measuring. For example, if we measure the weights of a population of individuals and plot it, the graph will represent a normal distribution. Thus, the area under the curve between two given points on the x axis represents the probability of occurrence of those values. The total area under the curve is equal to one, which means that there s a 100% chance (or a probability of one) of occurrence of any of the possible values of the distribution. There re infinite different normal distributions, all of them perfectly characterized by its mean and standard deviation. Thus, any point in the horizontal axis can be expressed as the mean plus or minus a number of times the standard deviation and its probability can be calculated using the formula of the density function, which I dare not so show you here. We can also use a computer to calculate the probability of a variable within a normal distribution, but what we do in practice is something simpler: to standardize. The standard normal distribution is the one that has a mean of zero and a standard deviation of one. The advantage of the standard normal distribution is twofold. First, we know its distribution of probabilities among different points on the horizontal axis. So, between the mean plus or minus one standard deviation are 68% out of the population, between the mean and plus or minus two deviations are 95%, and between three standard deviations 99% out of the population, approximately. The second advantage is that any normal distribution can be transform into a standard one, simply subtracting the mean to the value and dividing the result by the standard deviation of the distribution. We came up so with the z score, which is the equivalent of the value of our variable in a standard normal distribution with mean zero and a standard deviation of one. So, you can see the usefulness of it. We do not need software to calculate the probability. We just standardize and use a simple probability table, if we do not know the value by heart. Moreover, the thing goes beyond. Thanks to the magic of the central limit theorem, other distributions can be approximated to a normal one and be standardized to calculate the probability distribution of their variables. For example, if our variable follows a binomial distribution we can approximated it to a normal

10 distribution when the sample size is large. In practice, when np and n(1-p) are greater than five. The same applies to the Poisson s distribution, which can be approximated to a normal when its mean is greater than 10. And magic is twofold because besides of being able to avoid the use of complex tools and allow us to easily calculate probabilities and confidence intervals, it should be noted that both binomial and Poisson s distributions are discrete mass functions, while normal distribution is a continuous density function. And that s all for now. I only want to say that there re other continuous density functions different from normal distribution and that they can also be approximated to a normal when the sample is large. But that s another story

QUANTITATIVE TECHNIQUES

QUANTITATIVE TECHNIQUES UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION (For B Com. IV Semester & BBA III Semester) COMPLEMENTARY COURSE QUANTITATIVE TECHNIQUES QUESTION BANK 1. The techniques which provide the decision maker

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. Preface p. xi Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. 6 The Scientific Method and the Design of

More information

Intuitive Biostatistics: Choosing a statistical test

Intuitive Biostatistics: Choosing a statistical test pagina 1 van 5 < BACK Intuitive Biostatistics: Choosing a statistical This is chapter 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc.

More information

psychological statistics

psychological statistics psychological statistics B Sc. Counselling Psychology 011 Admission onwards III SEMESTER COMPLEMENTARY COURSE UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION CALICUT UNIVERSITY.P.O., MALAPPURAM, KERALA,

More information

Transition Passage to Descriptive Statistics 28

Transition Passage to Descriptive Statistics 28 viii Preface xiv chapter 1 Introduction 1 Disciplines That Use Quantitative Data 5 What Do You Mean, Statistics? 6 Statistics: A Dynamic Discipline 8 Some Terminology 9 Problems and Answers 12 Scales of

More information

Background to Statistics

Background to Statistics FACT SHEET Background to Statistics Introduction Statistics include a broad range of methods for manipulating, presenting and interpreting data. Professional scientists of all kinds need to be proficient

More information

Unit 14: Nonparametric Statistical Methods

Unit 14: Nonparametric Statistical Methods Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based

More information

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Understand Difference between Parametric and Nonparametric Statistical Procedures Parametric statistical procedures inferential procedures that rely

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module

More information

Exam details. Final Review Session. Things to Review

Exam details. Final Review Session. Things to Review Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit

More information

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n = Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,

More information

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests PSY 307 Statistics for the Behavioral Sciences Chapter 20 Tests for Ranked Data, Choosing Statistical Tests What To Do with Non-normal Distributions Tranformations (pg 382): The shape of the distribution

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table. MA 1125 Lecture 15 - The Standard Normal Distribution Friday, October 6, 2017. Objectives: Introduce the standard normal distribution and table. 1. The Standard Normal Distribution We ve been looking at

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #27 Estimation-I Today, I will introduce the problem of

More information

Do students sleep the recommended 8 hours a night on average?

Do students sleep the recommended 8 hours a night on average? BIEB100. Professor Rifkin. Notes on Section 2.2, lecture of 27 January 2014. Do students sleep the recommended 8 hours a night on average? We first set up our null and alternative hypotheses: H0: μ= 8

More information

Rank-Based Methods. Lukas Meier

Rank-Based Methods. Lukas Meier Rank-Based Methods Lukas Meier 20.01.2014 Introduction Up to now we basically always used a parametric family, like the normal distribution N (µ, σ 2 ) for modeling random data. Based on observed data

More information

Basic Statistical Analysis

Basic Statistical Analysis indexerrt.qxd 8/21/2002 9:47 AM Page 1 Corrected index pages for Sprinthall Basic Statistical Analysis Seventh Edition indexerrt.qxd 8/21/2002 9:47 AM Page 656 Index Abscissa, 24 AB-STAT, vii ADD-OR rule,

More information

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests: One sided tests So far all of our tests have been two sided. While this may be a bit easier to understand, this is often not the best way to do a hypothesis test. One simple thing that we can do to get

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and

More information

Nonparametric Statistics

Nonparametric Statistics Nonparametric Statistics Nonparametric or Distribution-free statistics: used when data are ordinal (i.e., rankings) used when ratio/interval data are not normally distributed (data are converted to ranks)

More information

3 Joint Distributions 71

3 Joint Distributions 71 2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random

More information

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC CHI SQUARE ANALYSIS I N T R O D U C T I O N T O N O N - P A R A M E T R I C A N A L Y S E S HYPOTHESIS TESTS SO FAR We ve discussed One-sample t-test Dependent Sample t-tests Independent Samples t-tests

More information

NON-PARAMETRIC STATISTICS * (http://www.statsoft.com)

NON-PARAMETRIC STATISTICS * (http://www.statsoft.com) NON-PARAMETRIC STATISTICS * (http://www.statsoft.com) 1. GENERAL PURPOSE 1.1 Brief review of the idea of significance testing To understand the idea of non-parametric statistics (the term non-parametric

More information

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown Nonparametric Statistics Leah Wright, Tyler Ross, Taylor Brown Before we get to nonparametric statistics, what are parametric statistics? These statistics estimate and test population means, while holding

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Statistics: revision

Statistics: revision NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 29 / 30 April 2004 Department of Experimental Psychology University of Cambridge Handouts: Answers

More information

Course Review. Kin 304W Week 14: April 9, 2013

Course Review. Kin 304W Week 14: April 9, 2013 Course Review Kin 304W Week 14: April 9, 2013 1 Today s Outline Format of Kin 304W Final Exam Course Review Hand back marked Project Part II 2 Kin 304W Final Exam Saturday, Thursday, April 18, 3:30-6:30

More information

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01 An Analysis of College Algebra Exam s December, 000 James D Jones Math - Section 0 An Analysis of College Algebra Exam s Introduction Students often complain about a test being too difficult. Are there

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Y i = η + ɛ i, i = 1,...,n.

Y i = η + ɛ i, i = 1,...,n. Nonparametric tests If data do not come from a normal population (and if the sample is not large), we cannot use a t-test. One useful approach to creating test statistics is through the use of rank statistics.

More information

Non-parametric tests, part A:

Non-parametric tests, part A: Two types of statistical test: Non-parametric tests, part A: Parametric tests: Based on assumption that the data have certain characteristics or "parameters": Results are only valid if (a) the data are

More information

Turning a research question into a statistical question.

Turning a research question into a statistical question. Turning a research question into a statistical question. IGINAL QUESTION: Concept Concept Concept ABOUT ONE CONCEPT ABOUT RELATIONSHIPS BETWEEN CONCEPTS TYPE OF QUESTION: DESCRIBE what s going on? DECIDE

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation

More information

Degrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large

Degrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large Z Test Comparing a group mean to a hypothesis T test (about 1 mean) T test (about 2 means) Comparing mean to sample mean. Similar means = will have same response to treatment Two unknown means are different

More information

Lecture 30. DATA 8 Summer Regression Inference

Lecture 30. DATA 8 Summer Regression Inference DATA 8 Summer 2018 Lecture 30 Regression Inference Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu) Contributions by Fahad Kamran (fhdkmrn@berkeley.edu) and

More information

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 004 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER II STATISTICAL METHODS The Society provides these solutions to assist candidates preparing for the examinations in future

More information

CENTRAL LIMIT THEOREM (CLT)

CENTRAL LIMIT THEOREM (CLT) CENTRAL LIMIT THEOREM (CLT) A sampling distribution is the probability distribution of the sample statistic that is formed when samples of size n are repeatedly taken from a population. If the sample statistic

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics

More information

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong Statistics Primer ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong 1 Quick Overview of Statistics 2 Descriptive vs. Inferential Statistics Descriptive Statistics: summarize and describe data

More information

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 13 Nonparametric Statistics 13-1 Overview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks

More information

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F.

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F. Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 13 Nonparametric Statistics 13-1 Overview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

What Are Nonparametric Statistics and When Do You Use Them? Jennifer Catrambone

What Are Nonparametric Statistics and When Do You Use Them? Jennifer Catrambone What Are Nonparametric Statistics and When Do You Use Them? Jennifer Catrambone First, a bit about Parametric Statistics Data are expected to be randomly drawn from a normal population Minimum sample size

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Chapter 18 Resampling and Nonparametric Approaches To Data

Chapter 18 Resampling and Nonparametric Approaches To Data Chapter 18 Resampling and Nonparametric Approaches To Data 18.1 Inferences in children s story summaries (McConaughy, 1980): a. Analysis using Wilcoxon s rank-sum test: Younger Children Older Children

More information

Big Data Analysis with Apache Spark UC#BERKELEY

Big Data Analysis with Apache Spark UC#BERKELEY Big Data Analysis with Apache Spark UC#BERKELEY This Lecture: Relation between Variables An association A trend» Positive association or Negative association A pattern» Could be any discernible shape»

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

My data doesn t look like that..

My data doesn t look like that.. Testing assumptions My data doesn t look like that.. We have made a big deal about testing model assumptions each week. Bill Pine Testing assumptions Testing assumptions We have made a big deal about testing

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Lecture No. # 38 Goodness - of fit tests Hello and welcome to this

More information

Dealing with the assumption of independence between samples - introducing the paired design.

Dealing with the assumption of independence between samples - introducing the paired design. Dealing with the assumption of independence between samples - introducing the paired design. a) Suppose you deliberately collect one sample and measure something. Then you collect another sample in such

More information

Foundations of Probability and Statistics

Foundations of Probability and Statistics Foundations of Probability and Statistics William C. Rinaman Le Moyne College Syracuse, New York Saunders College Publishing Harcourt Brace College Publishers Fort Worth Philadelphia San Diego New York

More information

Textbook Examples of. SPSS Procedure

Textbook Examples of. SPSS Procedure Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

Selection should be based on the desired biological interpretation!

Selection should be based on the desired biological interpretation! Statistical tools to compare levels of parasitism Jen_ Reiczigel,, Lajos Rózsa Hungary What to compare? The prevalence? The mean intensity? The median intensity? Or something else? And which statistical

More information

Sampling Distributions

Sampling Distributions Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Remember sampling? Sampling Part 1 of definition Selecting a subset of the population to create a sample Generally random sampling

More information

Nonparametric statistic methods. Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health

Nonparametric statistic methods. Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health Nonparametric statistic methods Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health Measurement What are the 4 levels of measurement discussed? 1. Nominal or Classificatory Scale Gender,

More information

Week 1 Quantitative Analysis of Financial Markets Distributions A

Week 1 Quantitative Analysis of Financial Markets Distributions A Week 1 Quantitative Analysis of Financial Markets Distributions A Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October

More information

Probability Methods in Civil Engineering Prof. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Lecture No. # 12 Probability Distribution of Continuous RVs (Contd.)

More information

Biostatistics: Correlations

Biostatistics: Correlations Biostatistics: s One of the most common errors we find in the press is the confusion between correlation and causation in scientific and health-related studies. In theory, these are easy to distinguish

More information

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics SEVERAL μs AND MEDIANS: MORE ISSUES Business Statistics CONTENTS Post-hoc analysis ANOVA for 2 groups The equal variances assumption The Kruskal-Wallis test Old exam question Further study POST-HOC ANALYSIS

More information

The Normal Distribution. Chapter 6

The Normal Distribution. Chapter 6 + The Normal Distribution Chapter 6 + Applications of the Normal Distribution Section 6-2 + The Standard Normal Distribution and Practical Applications! We can convert any variable that in normally distributed

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

Chapter 26: Comparing Counts (Chi Square)

Chapter 26: Comparing Counts (Chi Square) Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces

More information

Statistical Inference Theory Lesson 46 Non-parametric Statistics

Statistical Inference Theory Lesson 46 Non-parametric Statistics 46.1-The Sign Test Statistical Inference Theory Lesson 46 Non-parametric Statistics 46.1 - Problem 1: (a). Let p equal the proportion of supermarkets that charge less than $2.15 a pound. H o : p 0.50 H

More information

Correlation and regression

Correlation and regression NST 1B Experimental Psychology Statistics practical 1 Correlation and regression Rudolf Cardinal & Mike Aitken 11 / 12 November 2003 Department of Experimental Psychology University of Cambridge Handouts:

More information

determine whether or not this relationship is.

determine whether or not this relationship is. Section 9-1 Correlation A correlation is a between two. The data can be represented by ordered pairs (x,y) where x is the (or ) variable and y is the (or ) variable. There are several types of correlations

More information

Distribution-Free Procedures (Devore Chapter Fifteen)

Distribution-Free Procedures (Devore Chapter Fifteen) Distribution-Free Procedures (Devore Chapter Fifteen) MATH-5-01: Probability and Statistics II Spring 018 Contents 1 Nonparametric Hypothesis Tests 1 1.1 The Wilcoxon Rank Sum Test........... 1 1. Normal

More information

Table of Contents. Advanced Statistics. Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen

Table of Contents. Advanced Statistics. Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Advanced Statistics Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Table of Contents 1. Statistical inference... 2 1.1 Population and sampling... 2 2. Data organization... 4 2.1 Variable s

More information

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides Chapter 7 Inference for Distributions Introduction to the Practice of STATISTICS SEVENTH EDITION Moore / McCabe / Craig Lecture Presentation Slides Chapter 7 Inference for Distributions 7.1 Inference for

More information

PROFESSOR: WELCOME BACK TO THE LAST LECTURE OF THE SEMESTER. PLANNING TO DO TODAY WAS FINISH THE BOOK. FINISH SECTION 6.5

PROFESSOR: WELCOME BACK TO THE LAST LECTURE OF THE SEMESTER. PLANNING TO DO TODAY WAS FINISH THE BOOK. FINISH SECTION 6.5 1 MATH 16A LECTURE. DECEMBER 9, 2008. PROFESSOR: WELCOME BACK TO THE LAST LECTURE OF THE SEMESTER. I HOPE YOU ALL WILL MISS IT AS MUCH AS I DO. SO WHAT I WAS PLANNING TO DO TODAY WAS FINISH THE BOOK. FINISH

More information

QT (Al Jamia Arts and Science College, Poopalam)

QT (Al Jamia Arts and Science College, Poopalam) QUANTITATIVE TECHNIQUES Quantitative techniques may be defined as those techniques which provide the decision makes a systematic and powerful means of analysis, based on quantitative data. It is a scientific

More information

Kumaun University Nainital

Kumaun University Nainital Kumaun University Nainital Department of Statistics B. Sc. Semester system course structure: 1. The course work shall be divided into six semesters with three papers in each semester. 2. Each paper in

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

3. Nonparametric methods

3. Nonparametric methods 3. Nonparametric methods If the probability distributions of the statistical variables are unknown or are not as required (e.g. normality assumption violated), then we may still apply nonparametric tests

More information

Inferential statistics

Inferential statistics Inferential statistics Inference involves making a Generalization about a larger group of individuals on the basis of a subset or sample. Ahmed-Refat-ZU Null and alternative hypotheses In hypotheses testing,

More information

Contents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47

Contents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47 Contents 1 Non-parametric Tests 3 1.1 Introduction....................................... 3 1.2 Advantages of Non-parametric Tests......................... 4 1.3 Disadvantages of Non-parametric Tests........................

More information

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression Recall, back some time ago, we used a descriptive statistic which allowed us to draw the best fit line through a scatter plot. We

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras. Lecture 11 t- Tests

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras. Lecture 11 t- Tests Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture 11 t- Tests Welcome to the course on Biostatistics and Design of Experiments.

More information

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS. Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS. Last time, we looked at scatterplots, which show the interaction between two variables,

More information

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I 1 / 16 Nonparametric tests Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I Nonparametric one and two-sample tests 2 / 16 If data do not come from a normal

More information

One-way ANOVA Model Assumptions

One-way ANOVA Model Assumptions One-way ANOVA Model Assumptions STAT:5201 Week 4: Lecture 1 1 / 31 One-way ANOVA: Model Assumptions Consider the single factor model: Y ij = µ + α }{{} i ij iid with ɛ ij N(0, σ 2 ) mean structure random

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Statistical Intervals (One sample) (Chs )

Statistical Intervals (One sample) (Chs ) 7 Statistical Intervals (One sample) (Chs 8.1-8.3) Confidence Intervals The CLT tells us that as the sample size n increases, the sample mean X is close to normally distributed with expected value µ and

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur Lecture No. #13 Probability Distribution of Continuous RVs (Contd

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

Data analysis and Geostatistics - lecture VII

Data analysis and Geostatistics - lecture VII Data analysis and Geostatistics - lecture VII t-tests, ANOVA and goodness-of-fit Statistical testing - significance of r Testing the significance of the correlation coefficient: t = r n - 2 1 - r 2 with

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction CHAPTER 1 Introduction A typical Modern Geometry course will focus on some variation of a set of axioms for Euclidean geometry due to Hilbert. At the end of such a course, non-euclidean geometries (always

More information

Introduction to inferential statistics. Alissa Melinger IGK summer school 2006 Edinburgh

Introduction to inferential statistics. Alissa Melinger IGK summer school 2006 Edinburgh Introduction to inferential statistics Alissa Melinger IGK summer school 2006 Edinburgh Short description Prereqs: I assume no prior knowledge of stats This half day tutorial on statistical analysis will

More information

Inferential Statistics

Inferential Statistics Inferential Statistics Eva Riccomagno, Maria Piera Rogantin DIMA Università di Genova riccomagno@dima.unige.it rogantin@dima.unige.it Part G Distribution free hypothesis tests 1. Classical and distribution-free

More information

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGORICAL IV, NUMERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric

More information