Everything is not normal
|
|
- Nicholas Smith
- 5 years ago
- Views:
Transcription
1 Everything is not normal According to the dictionary, one thing is considered normal when it s in its natural state or conforms to standards set in advance. And this is its normal meaning. But, like many other words, normal has many other meanings. In statistics, we talking about normal we refer to a given probability distribution that is called the normal distribution, the famous Gauss bell. This distribution is characterized by its symmetry around its mean that coincides with its median, in addition to other features which we discussed in a previous post. The great advantage of the normal distribution is that it allows us to calculate probabilities of occurrence of data from this distribution, which results in the possibility of inferring population data using a sample obtained from it. Thus, virtually all parametric techniques of hypothesis testing need that data follow a normal distribution. One might think that this is not a big problem. If it s called normal it ll be because biological data do usually follow, more or less, this distribution. Big mistake, many data follow a distribution that deviates from normal. Consider, for example, the consumption of alcohol. The data will not be grouped symmetrically around a mean. In contrast, the distribution will have a positive bias (to the right): there will be a large number around zero (abstainers or very occasional drinkers) and a long right tail formed by people with higher consumption. The tail is long extended to the right with the consumption values of those people who eat breakfast with bourbon. And how affect our statistical calculations the fact that the variable doesn t follow a normal?. What should we do if data are not normal?. The first thing to do is realize that the variable is not normally distributed. We have already seen there are a number of graphical methods that allow us to visually approximate if the data follow the normal. Histograms and box-plots allow us to test whether the distribution is skewed, if too flat or peaked, or have extreme values. But the most specific graphic for this purpose is the normal probability plot (q-q plot), where the values are set to the diagonal line if they are normally distributed. Another possibility is to use numeric contrast tests such as the Shapiro-Wilk s or the Kolmogorov-Smirnov s. The problem with these tests is that they are very sensitive to the effect of sample size. If the sample is large they can be affected by minor deviations from normality. Conversely,
2 if the sample is small, they may fail to detect large deviations from normality. But these tests also have another drawback you will understand better after a small clarification. We know that in any hypothesis testing we set a null hypothesis that usually says the opposite of what we want to show. Thus, if the value of statistical significance is lower than a set value (usually 0.05), we reject the null hypothesis and stayed with the alternative, that says what we want to prove. The problem is that the null hypothesis is only falsifiable, it can never be said to be true. Simply, if the statistical significance is high, we cannot say it s untrue, but that does not mean it s true. It may happen that the study did not have enough power to reject a null hypothesis that is, in fact, false. Well, it happens that contrasts for normality are set with a null hypothesis that the data follow a normal. Therefore, if the significance is small, we can reject the null and say that the data are not normal. But if the significance is high, we simply cannot reject it and we will say that we have no ability to say that the data are not normal, which is not the same as to say that the fit a normal distribution. For these reasons, it is always advisable to complement numerical contrasts with some graphical method to test the normality of the variable. Once we know that the data are not normal, we must take this into account when describing them. If the distribution is highly skewed we cannot use the mean as a measure of central tendency and we must resort to other robust estimators such as the median or other parameters available for these situations. Furthermore, the absence of normality may discourage the use of parametric contrast tests. Student s t test and the analysis of variance (ANOVA) require that the distribution is normal. Student s t is quite robust in this regard, so that if the sample is large (n> 80) it can be used with some confidence. But if the sample is small or very different from the normal, we cannot use parametric contrast tests. One of the possible solutions to this problem would be to attempt a data transformation. The most frequently used in biology is the logarithmic transformation, useful to approximate to a normal distribution when the distribution is right-skewed. We mustn t forget to undo the transformation once the contrast in question has been made. The other possibility is to use nonparametric tests, which require no assumption about the distribution of the variable. Thus, to compare two means of unpaired data we will use the Wilcoxon s rank sum test (also called Mann-Whitney s U test). If data are paired we will have to use the Wilcoxon s sign rank test. If we compare more than two means, the Kruskal-
3 Wallis test will be the nonparametric equivalent of the ANOVA. Finally, remember that the nonparametric equivalent of the Pearson s correlation coefficient is the Spearman s correlation coefficient. The problem is that nonparametric tests are more demanding than their parametric equivalent to obtain statistical significance, but they must be used as soon as there s any doubt about the normality of the variable we re contrasting. And here we will stop for today. We could have talked about a third possibility of facing a not-normal variable, much more exotic than those mentioned. It is the use of resampling techniques such as bootstrapping, which consists of building an empirical distribution of the means of many samples drawn from our data to make inferences with the results, thus preserving the original units of the variable and avoiding the swing of data transformation techniques. But that s another story Some comparisons are not odious It s often said that comparisons are odious. And the truth is that it is not appropriate to compare people or things together, since each has its values and there s no need of being slighted for doing something differently. So it s not surprising that even the Quixote said that comparisons are always odious. Of course, this may be said about everyday life, because in medicine we are always comparing things together, sometimes in a rather beneficial way. Today we are going to talk about how to compare two data distributions graphically and we ll look at an application of this type of comparison that helps us to check whether our data follow a normal distribution. Imagine for a moment that we have a hundred serum cholesterol values from schoolchildren. What will we get if we plot the value against themselves linearly? Simple: the result would be a perfect straight line cross the diagonal of the graph. Now think about what would happen if instead of comparing with themselves we compare them with a different distribution. If the two data
4 distributions are very similar, the dots on the graph will be placed very close to the diagonal. If the distributions differ, the dots will go away from the diagonal, the further the more different the two distributions. Let s look at an example. Let s suppose we divide our distribution into two parts, the cholesterol of boys and girls. According to what our imagination tells us, the boys eat more industrial bakery than the girls, so their cholesterol level are higher, as you can see if you compare the curve from girls (black) with those of children (blue). Now, if we represent the values of the girls against the values of the boys linearly, as can be seen in the figure, the dot are far from the diagonal, being evenly over it. What is the reason of this? The values of boys are higher than the values of girls. You will tell me that all this is fine, but it can be a bit unnecessary. After all, if we want to know who have the highest values all that we have to do is look at the curves. And you will be right in your reasoning, but this type of graph has been designed for something different, which is to compare a distribution with its normal equivalent. Imagine that we have our first global distribution and we want to know if it follows a normal distribution. We only have to calculate its mean and standard deviation and represent its quantiles against the theoretical quantiles of a normal distribution with the same mean and standard deviation. If our data are normally distributed, the dots will align with the diagonal of the graph. The more they go away from it, the less likely that our data follow a normal distribution. This type of graph is called quantile-quantile plot or, more commonly, q-q plot. Let s see an example of q-q plot for its better understanding. In the second graph you can see two curves, one blue colored representing a normal distribution and a black one following a Student s t distribution. On the right side you can see the q-q plot of the Student s distribution. Central data fits quite well the diagonal, but extreme data do it worse, varying the slope of the line. This indicates that there are more data under the
5 tails of the distribution that the data that there would be if it were a normal distribution. Of course, this should not surprise us, since we know that the heavy tails are a feature of the Student s distribution. Finally, in the third graph you can see a normal distribution and its q- q plot, in which we can see how the dots fit quite well to the diagonal of the graph. As you can see, the q-q plot is a simple graphical method to determine if the data follow a normal distribution. You may say that it would be a bit tedious to calculate the quantiles of our distribution and those of the equivalent normal distribution, but remember that most statistical software can do it effortlessly. For instance, R has a function called qqnorm() that draws a q-q plot in a blink. And here we are going to end with the normal fitting by now. Just remember that there re other more accurate numerical methods to find out if data fit a normal distribution, such as the Kolmogorov-Smirnov s test or the Shapiro-Wilk s test. But that s another story
6 The big family Moviegoers do not be mistaken. We are not going to talk about the 1962 year movie in which little Chencho get lost in the Plaza Mayor at Christmas and it takes until summer to find him, largely thanks to the search tenacity of his grandpa. Today we re going to talk about another large family related to probability density functions and I hope nobody ends up as lost as the poor Chencho on the film. No doubt the queen of density functions is the normal distribution, the bell-shaped. This is a probability distribution that is characterized by its mean and standard deviation and is at the core of all the calculus of probability and statistical inference. But there re other continuous probability functions that look something or much to the normal distribution and that are also widely used when contrasting hypothesis. The first one we re going to talk about is the Student s t distribution. For those curious of science history I ll say that the inventor of this statistic was actually William Sealy Gosset, but as he must have liked his name very little, he used to sign his writings under the pseudonym of Student. Hence the name of this statistic. This density function is a bell-shaped one that is distributed symmetrically around its mean. It s very similar to the normal curve, although with a heavier tails; this is the reason why this distribution estimates are less accurate when the sample is small, since more data under the tails implies always the possibility of having more results far from the mean. There are an infinite number of student s t distributions, all of them characterized by their mean, variance and degrees of freedom, but when the sample size is greater than 30 (with increasing the degrees of freedom), t distribution can be approximate to a normal distribution, so we
7 can use the latter without making big mistakes. Student s t is used to compare the means of normally distributed populations when their sample sizes are small or when the values of the populations variances are unknown. And this works so because if we subtract the mean from a sample of variables and divide the result by the standard error, the value we get follows a Student s t distribution. Another member of this family of continuous distributions is that of the chi-square, which also plays an important role in statistics. If we have a sample of normally distributed variables and we squared them, their sum will follow a chi-square with a number of degrees of freedom equal to the sample size. In practice, when we have a series of values of a variable, we can subtract the expected values under the null hypothesis from the observed ones, square these differences, and add them up to check the probability of coming up with that value according to the density function of a chi-square. So, we will decide whether to reject or not our null hypothesis. This technique can be used with three aims: determining the goodness of fit to a theoretical population, to test the homogeneity of two populations and to contrast the independence of two variables. Unlike the normal distribution, chi-square s density function only has positive values, so it is asymmetric with a long right tail. What happens is that the curve becomes gradually more symmetric as degrees of freedom increase, increasingly resembling a normal distribution. The last distribution of which we are going to talk about is the Snedecor s F distribution. There s not surprise in its name about their invention, although it seems that a certain Fisher was also involved in the creation of this statistic. This distribution is more related to the chi-square than to normal
8 distribution, because it s de density function of the ratio of two chisquare distributions. As is easy to understand, it only has positive values and its shape depends on the number of degrees of freedom of the two chisquare distribution that determine it. This distribution is used for the constrast of means in the analysis of variance (ANOVA). In summary, we can see that there re several very similar density function distributions to calculate probabilities and that are useful in various hypothesis contrast. But there re many more, as the bivariate normal distribution, the negative binomial distribution, the uniform distribution, and the beta and gamma distributions, to name a few. But that s another story The most famous of bells The dictionary says that a bell is a simple device that makes a sound. But a bell can be much more. I think there s even a plant with that name and a flower with its diminutive. But undoubtedly, the most famous of all bells is the renowned Gauss bell curve, the most beloved and revered by statisticians and other species of scientific. But, what is a bell curve?. It s nothing more, nor less, than a probability density function. Put another way, it is a continuous probability distribution with a symmetrical bell-shape, hence the first part of its name. And I say the first part because the second one is more controversial because it is not quite clear that Gauss is the father of the child. It seems that the first who use this density function was somebody named Moivre, who was studying was happened to a binomial distribution when the sample size is large. Yet another of the many injustices of History, the name of the function is associated with Gauss, who used it some 50 years later to record data from his astronomical studies. Of course, for defense of Gauss, some people say the two of them discovered the density function independently. To avoid controversy, we will call it from now on by its other name, different from Gauss bell: normal distribution. And it seems that it was so named because people used to think that most natural phenomena were
9 consistent with this distribution. Later in time, it was found that there re other distributions that are very common in biology, such as the binomial and Poisson s. As it happens with any other density function, the utility of normal curve is that it represents the probability distribution of occurrence of the random variable we are measuring. For example, if we measure the weights of a population of individuals and plot it, the graph will represent a normal distribution. Thus, the area under the curve between two given points on the x axis represents the probability of occurrence of those values. The total area under the curve is equal to one, which means that there s a 100% chance (or a probability of one) of occurrence of any of the possible values of the distribution. There re infinite different normal distributions, all of them perfectly characterized by its mean and standard deviation. Thus, any point in the horizontal axis can be expressed as the mean plus or minus a number of times the standard deviation and its probability can be calculated using the formula of the density function, which I dare not so show you here. We can also use a computer to calculate the probability of a variable within a normal distribution, but what we do in practice is something simpler: to standardize. The standard normal distribution is the one that has a mean of zero and a standard deviation of one. The advantage of the standard normal distribution is twofold. First, we know its distribution of probabilities among different points on the horizontal axis. So, between the mean plus or minus one standard deviation are 68% out of the population, between the mean and plus or minus two deviations are 95%, and between three standard deviations 99% out of the population, approximately. The second advantage is that any normal distribution can be transform into a standard one, simply subtracting the mean to the value and dividing the result by the standard deviation of the distribution. We came up so with the z score, which is the equivalent of the value of our variable in a standard normal distribution with mean zero and a standard deviation of one. So, you can see the usefulness of it. We do not need software to calculate the probability. We just standardize and use a simple probability table, if we do not know the value by heart. Moreover, the thing goes beyond. Thanks to the magic of the central limit theorem, other distributions can be approximated to a normal one and be standardized to calculate the probability distribution of their variables. For example, if our variable follows a binomial distribution we can approximated it to a normal
10 distribution when the sample size is large. In practice, when np and n(1-p) are greater than five. The same applies to the Poisson s distribution, which can be approximated to a normal when its mean is greater than 10. And magic is twofold because besides of being able to avoid the use of complex tools and allow us to easily calculate probabilities and confidence intervals, it should be noted that both binomial and Poisson s distributions are discrete mass functions, while normal distribution is a continuous density function. And that s all for now. I only want to say that there re other continuous density functions different from normal distribution and that they can also be approximated to a normal when the sample is large. But that s another story
QUANTITATIVE TECHNIQUES
UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION (For B Com. IV Semester & BBA III Semester) COMPLEMENTARY COURSE QUANTITATIVE TECHNIQUES QUESTION BANK 1. The techniques which provide the decision maker
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationIntroduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.
Preface p. xi Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. 6 The Scientific Method and the Design of
More informationIntuitive Biostatistics: Choosing a statistical test
pagina 1 van 5 < BACK Intuitive Biostatistics: Choosing a statistical This is chapter 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc.
More informationpsychological statistics
psychological statistics B Sc. Counselling Psychology 011 Admission onwards III SEMESTER COMPLEMENTARY COURSE UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION CALICUT UNIVERSITY.P.O., MALAPPURAM, KERALA,
More informationTransition Passage to Descriptive Statistics 28
viii Preface xiv chapter 1 Introduction 1 Disciplines That Use Quantitative Data 5 What Do You Mean, Statistics? 6 Statistics: A Dynamic Discipline 8 Some Terminology 9 Problems and Answers 12 Scales of
More informationBackground to Statistics
FACT SHEET Background to Statistics Introduction Statistics include a broad range of methods for manipulating, presenting and interpreting data. Professional scientists of all kinds need to be proficient
More informationUnit 14: Nonparametric Statistical Methods
Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based
More informationChapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics
Section 15.1: An Overview of Nonparametric Statistics Understand Difference between Parametric and Nonparametric Statistical Procedures Parametric statistical procedures inferential procedures that rely
More informationDETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics
DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and
More informationModule 03 Lecture 14 Inferential Statistics ANOVA and TOI
Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module
More informationExam details. Final Review Session. Things to Review
Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit
More informationHypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =
Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,
More informationPSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests
PSY 307 Statistics for the Behavioral Sciences Chapter 20 Tests for Ranked Data, Choosing Statistical Tests What To Do with Non-normal Distributions Tranformations (pg 382): The shape of the distribution
More informationGlossary for the Triola Statistics Series
Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling
More informationMA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.
MA 1125 Lecture 15 - The Standard Normal Distribution Friday, October 6, 2017. Objectives: Introduce the standard normal distribution and table. 1. The Standard Normal Distribution We ve been looking at
More informationProbability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #27 Estimation-I Today, I will introduce the problem of
More informationDo students sleep the recommended 8 hours a night on average?
BIEB100. Professor Rifkin. Notes on Section 2.2, lecture of 27 January 2014. Do students sleep the recommended 8 hours a night on average? We first set up our null and alternative hypotheses: H0: μ= 8
More informationRank-Based Methods. Lukas Meier
Rank-Based Methods Lukas Meier 20.01.2014 Introduction Up to now we basically always used a parametric family, like the normal distribution N (µ, σ 2 ) for modeling random data. Based on observed data
More informationBasic Statistical Analysis
indexerrt.qxd 8/21/2002 9:47 AM Page 1 Corrected index pages for Sprinthall Basic Statistical Analysis Seventh Edition indexerrt.qxd 8/21/2002 9:47 AM Page 656 Index Abscissa, 24 AB-STAT, vii ADD-OR rule,
More informationOne sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:
One sided tests So far all of our tests have been two sided. While this may be a bit easier to understand, this is often not the best way to do a hypothesis test. One simple thing that we can do to get
More informationLecture 7: Hypothesis Testing and ANOVA
Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis
More informationHYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă
HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and
More informationNonparametric Statistics
Nonparametric Statistics Nonparametric or Distribution-free statistics: used when data are ordinal (i.e., rankings) used when ratio/interval data are not normally distributed (data are converted to ranks)
More information3 Joint Distributions 71
2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random
More informationCHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC
CHI SQUARE ANALYSIS I N T R O D U C T I O N T O N O N - P A R A M E T R I C A N A L Y S E S HYPOTHESIS TESTS SO FAR We ve discussed One-sample t-test Dependent Sample t-tests Independent Samples t-tests
More informationNON-PARAMETRIC STATISTICS * (http://www.statsoft.com)
NON-PARAMETRIC STATISTICS * (http://www.statsoft.com) 1. GENERAL PURPOSE 1.1 Brief review of the idea of significance testing To understand the idea of non-parametric statistics (the term non-parametric
More informationNonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown
Nonparametric Statistics Leah Wright, Tyler Ross, Taylor Brown Before we get to nonparametric statistics, what are parametric statistics? These statistics estimate and test population means, while holding
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationStatistics: revision
NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 29 / 30 April 2004 Department of Experimental Psychology University of Cambridge Handouts: Answers
More informationCourse Review. Kin 304W Week 14: April 9, 2013
Course Review Kin 304W Week 14: April 9, 2013 1 Today s Outline Format of Kin 304W Final Exam Course Review Hand back marked Project Part II 2 Kin 304W Final Exam Saturday, Thursday, April 18, 3:30-6:30
More informationAn Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01
An Analysis of College Algebra Exam s December, 000 James D Jones Math - Section 0 An Analysis of College Algebra Exam s Introduction Students often complain about a test being too difficult. Are there
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationY i = η + ɛ i, i = 1,...,n.
Nonparametric tests If data do not come from a normal population (and if the sample is not large), we cannot use a t-test. One useful approach to creating test statistics is through the use of rank statistics.
More informationNon-parametric tests, part A:
Two types of statistical test: Non-parametric tests, part A: Parametric tests: Based on assumption that the data have certain characteristics or "parameters": Results are only valid if (a) the data are
More informationTurning a research question into a statistical question.
Turning a research question into a statistical question. IGINAL QUESTION: Concept Concept Concept ABOUT ONE CONCEPT ABOUT RELATIONSHIPS BETWEEN CONCEPTS TYPE OF QUESTION: DESCRIBE what s going on? DECIDE
More informationProbability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur
Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation
More informationDegrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large
Z Test Comparing a group mean to a hypothesis T test (about 1 mean) T test (about 2 means) Comparing mean to sample mean. Similar means = will have same response to treatment Two unknown means are different
More informationLecture 30. DATA 8 Summer Regression Inference
DATA 8 Summer 2018 Lecture 30 Regression Inference Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu) Contributions by Fahad Kamran (fhdkmrn@berkeley.edu) and
More informationTHE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE
THE ROYAL STATISTICAL SOCIETY 004 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER II STATISTICAL METHODS The Society provides these solutions to assist candidates preparing for the examinations in future
More informationCENTRAL LIMIT THEOREM (CLT)
CENTRAL LIMIT THEOREM (CLT) A sampling distribution is the probability distribution of the sample statistic that is formed when samples of size n are repeatedly taken from a population. If the sample statistic
More informationBiostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras
Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics
More informationStatistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong
Statistics Primer ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong 1 Quick Overview of Statistics 2 Descriptive vs. Inferential Statistics Descriptive Statistics: summarize and describe data
More informationLecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 13 Nonparametric Statistics 13-1 Overview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks
More informationLecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F.
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 13 Nonparametric Statistics 13-1 Overview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationWhat Are Nonparametric Statistics and When Do You Use Them? Jennifer Catrambone
What Are Nonparametric Statistics and When Do You Use Them? Jennifer Catrambone First, a bit about Parametric Statistics Data are expected to be randomly drawn from a normal population Minimum sample size
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationChapter 18 Resampling and Nonparametric Approaches To Data
Chapter 18 Resampling and Nonparametric Approaches To Data 18.1 Inferences in children s story summaries (McConaughy, 1980): a. Analysis using Wilcoxon s rank-sum test: Younger Children Older Children
More informationBig Data Analysis with Apache Spark UC#BERKELEY
Big Data Analysis with Apache Spark UC#BERKELEY This Lecture: Relation between Variables An association A trend» Positive association or Negative association A pattern» Could be any discernible shape»
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More informationMy data doesn t look like that..
Testing assumptions My data doesn t look like that.. We have made a big deal about testing model assumptions each week. Bill Pine Testing assumptions Testing assumptions We have made a big deal about testing
More informationProbability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur
Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Lecture No. # 38 Goodness - of fit tests Hello and welcome to this
More informationDealing with the assumption of independence between samples - introducing the paired design.
Dealing with the assumption of independence between samples - introducing the paired design. a) Suppose you deliberately collect one sample and measure something. Then you collect another sample in such
More informationFoundations of Probability and Statistics
Foundations of Probability and Statistics William C. Rinaman Le Moyne College Syracuse, New York Saunders College Publishing Harcourt Brace College Publishers Fort Worth Philadelphia San Diego New York
More informationTextbook Examples of. SPSS Procedure
Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationProbability and Statistics
Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT
More informationSelection should be based on the desired biological interpretation!
Statistical tools to compare levels of parasitism Jen_ Reiczigel,, Lajos Rózsa Hungary What to compare? The prevalence? The mean intensity? The median intensity? Or something else? And which statistical
More informationSampling Distributions
Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Remember sampling? Sampling Part 1 of definition Selecting a subset of the population to create a sample Generally random sampling
More informationNonparametric statistic methods. Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health
Nonparametric statistic methods Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health Measurement What are the 4 levels of measurement discussed? 1. Nominal or Classificatory Scale Gender,
More informationWeek 1 Quantitative Analysis of Financial Markets Distributions A
Week 1 Quantitative Analysis of Financial Markets Distributions A Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October
More informationProbability Methods in Civil Engineering Prof. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur
Probability Methods in Civil Engineering Prof. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Lecture No. # 12 Probability Distribution of Continuous RVs (Contd.)
More informationBiostatistics: Correlations
Biostatistics: s One of the most common errors we find in the press is the confusion between correlation and causation in scientific and health-related studies. In theory, these are easy to distinguish
More informationChapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing
Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity
More informationRobustness and Distribution Assumptions
Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology
More informationSEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics
SEVERAL μs AND MEDIANS: MORE ISSUES Business Statistics CONTENTS Post-hoc analysis ANOVA for 2 groups The equal variances assumption The Kruskal-Wallis test Old exam question Further study POST-HOC ANALYSIS
More informationThe Normal Distribution. Chapter 6
+ The Normal Distribution Chapter 6 + Applications of the Normal Distribution Section 6-2 + The Standard Normal Distribution and Practical Applications! We can convert any variable that in normally distributed
More informationDo not copy, post, or distribute
14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible
More informationChapter 26: Comparing Counts (Chi Square)
Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces
More informationStatistical Inference Theory Lesson 46 Non-parametric Statistics
46.1-The Sign Test Statistical Inference Theory Lesson 46 Non-parametric Statistics 46.1 - Problem 1: (a). Let p equal the proportion of supermarkets that charge less than $2.15 a pound. H o : p 0.50 H
More informationCorrelation and regression
NST 1B Experimental Psychology Statistics practical 1 Correlation and regression Rudolf Cardinal & Mike Aitken 11 / 12 November 2003 Department of Experimental Psychology University of Cambridge Handouts:
More informationdetermine whether or not this relationship is.
Section 9-1 Correlation A correlation is a between two. The data can be represented by ordered pairs (x,y) where x is the (or ) variable and y is the (or ) variable. There are several types of correlations
More informationDistribution-Free Procedures (Devore Chapter Fifteen)
Distribution-Free Procedures (Devore Chapter Fifteen) MATH-5-01: Probability and Statistics II Spring 018 Contents 1 Nonparametric Hypothesis Tests 1 1.1 The Wilcoxon Rank Sum Test........... 1 1. Normal
More informationTable of Contents. Advanced Statistics. Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen
Advanced Statistics Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Table of Contents 1. Statistical inference... 2 1.1 Population and sampling... 2 2. Data organization... 4 2.1 Variable s
More informationChapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides
Chapter 7 Inference for Distributions Introduction to the Practice of STATISTICS SEVENTH EDITION Moore / McCabe / Craig Lecture Presentation Slides Chapter 7 Inference for Distributions 7.1 Inference for
More informationPROFESSOR: WELCOME BACK TO THE LAST LECTURE OF THE SEMESTER. PLANNING TO DO TODAY WAS FINISH THE BOOK. FINISH SECTION 6.5
1 MATH 16A LECTURE. DECEMBER 9, 2008. PROFESSOR: WELCOME BACK TO THE LAST LECTURE OF THE SEMESTER. I HOPE YOU ALL WILL MISS IT AS MUCH AS I DO. SO WHAT I WAS PLANNING TO DO TODAY WAS FINISH THE BOOK. FINISH
More informationQT (Al Jamia Arts and Science College, Poopalam)
QUANTITATIVE TECHNIQUES Quantitative techniques may be defined as those techniques which provide the decision makes a systematic and powerful means of analysis, based on quantitative data. It is a scientific
More informationKumaun University Nainital
Kumaun University Nainital Department of Statistics B. Sc. Semester system course structure: 1. The course work shall be divided into six semesters with three papers in each semester. 2. Each paper in
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More information3. Nonparametric methods
3. Nonparametric methods If the probability distributions of the statistical variables are unknown or are not as required (e.g. normality assumption violated), then we may still apply nonparametric tests
More informationInferential statistics
Inferential statistics Inference involves making a Generalization about a larger group of individuals on the basis of a subset or sample. Ahmed-Refat-ZU Null and alternative hypotheses In hypotheses testing,
More informationContents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47
Contents 1 Non-parametric Tests 3 1.1 Introduction....................................... 3 1.2 Advantages of Non-parametric Tests......................... 4 1.3 Disadvantages of Non-parametric Tests........................
More informationt-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression
t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression Recall, back some time ago, we used a descriptive statistic which allowed us to draw the best fit line through a scatter plot. We
More informationBiostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras. Lecture 11 t- Tests
Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture 11 t- Tests Welcome to the course on Biostatistics and Design of Experiments.
More informationWed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.
Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS. Last time, we looked at scatterplots, which show the interaction between two variables,
More informationNonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I
1 / 16 Nonparametric tests Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I Nonparametric one and two-sample tests 2 / 16 If data do not come from a normal
More informationOne-way ANOVA Model Assumptions
One-way ANOVA Model Assumptions STAT:5201 Week 4: Lecture 1 1 / 31 One-way ANOVA: Model Assumptions Consider the single factor model: Y ij = µ + α }{{} i ij iid with ɛ ij N(0, σ 2 ) mean structure random
More informationKeller: Stats for Mgmt & Econ, 7th Ed July 17, 2006
Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationStatistical Intervals (One sample) (Chs )
7 Statistical Intervals (One sample) (Chs 8.1-8.3) Confidence Intervals The CLT tells us that as the sample size n increases, the sample mean X is close to normally distributed with expected value µ and
More informationProbability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur
Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur Lecture No. #13 Probability Distribution of Continuous RVs (Contd
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org
More informationData analysis and Geostatistics - lecture VII
Data analysis and Geostatistics - lecture VII t-tests, ANOVA and goodness-of-fit Statistical testing - significance of r Testing the significance of the correlation coefficient: t = r n - 2 1 - r 2 with
More informationCHAPTER 1. Introduction
CHAPTER 1 Introduction A typical Modern Geometry course will focus on some variation of a set of axioms for Euclidean geometry due to Hilbert. At the end of such a course, non-euclidean geometries (always
More informationIntroduction to inferential statistics. Alissa Melinger IGK summer school 2006 Edinburgh
Introduction to inferential statistics Alissa Melinger IGK summer school 2006 Edinburgh Short description Prereqs: I assume no prior knowledge of stats This half day tutorial on statistical analysis will
More informationInferential Statistics
Inferential Statistics Eva Riccomagno, Maria Piera Rogantin DIMA Università di Genova riccomagno@dima.unige.it rogantin@dima.unige.it Part G Distribution free hypothesis tests 1. Classical and distribution-free
More informationHypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal
Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGORICAL IV, NUMERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric
More information