Statistics and Data Analysis in Geology

Similar documents
Unit 4 Probability. Dr Mahmoud Alhussami

9/19/2012. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

Week 11 Sample Means, CLT, Correlation

COMP6053 lecture: Sampling and the central limit theorem. Markus Brede,

Statistics, Probability Distributions & Error Propagation. James R. Graham

6.2 Normal Distribution. Ziad Zahreddine

IV. The Normal Distribution

Summary statistics, distributions of sums and means

Chapter 6 The Normal Distribution

Lecture 8 Sampling Theory

Discrete and continuous

COMP6053 lecture: Sampling and the central limit theorem. Jason Noble,

Lecture 27. DATA 8 Spring Sample Averages. Slides created by John DeNero and Ani Adhikari

II. The Normal Distribution

The Normal Distribution. Chapter 6

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Math 416 Lecture 3. The average or mean or expected value of x 1, x 2, x 3,..., x n is

IV. The Normal Distribution

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

CVE NORMAL DISTRIBUTION

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

STATISTICS. 1. Measures of Central Tendency

Chapter 8 Sampling Distributions Defn Defn

CENTRAL LIMIT THEOREM (CLT)

Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

Statistical Inference Theory Lesson 46 Non-parametric Statistics

The Central Limit Theorem

Essential Statistics Chapter 6

STA 111: Probability & Statistical Inference

21 ST CENTURY LEARNING CURRICULUM FRAMEWORK PERFORMANCE RUBRICS FOR MATHEMATICS PRE-CALCULUS

Statistical Intervals (One sample) (Chs )

1 Probability Distributions

To find the median, find the 40 th quartile and the 70 th quartile (which are easily found at y=1 and y=2, respectively). Then we interpolate:

Example. If 4 tickets are drawn with replacement from ,

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables

Statistical Methods: Introduction, Applications, Histograms, Ch

MA 1125 Lecture 33 - The Sign Test. Monday, December 4, Objectives: Introduce an example of a non-parametric test.

3. Tests in the Bernoulli Model

Sampling, Frequency Distributions, and Graphs (12.1)

Topic 6 Continuous Random Variables

Intro to probability concepts

IC 102: Data Analysis and Interpretation

Lecture 30. DATA 8 Summer Regression Inference

SAMPLING DISTRIBUTIONS

Calculus with Algebra and Trigonometry II Lecture 21 Probability applications

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

a table or a graph or an equation.

the probability of getting either heads or tails must be 1 (excluding the remote possibility of getting it to land on its edge).

additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst

Chapter 3. Measuring data

Numerical Methods Lecture 7 - Statistics, Probability and Reliability

Management Programme. MS-08: Quantitative Analysis for Managerial Applications

Statistical Concepts. Distributions of Data

TOPIC: Descriptive Statistics Single Variable

1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11.

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean

Contents. Acknowledgments. xix

Statistics and Quantitative Analysis U4320. Segment 5: Sampling and inference Prof. Sharyn O Halloran

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

4.2 The Normal Distribution. that is, a graph of the measurement looks like the familiar symmetrical, bell-shaped

Chapter 2: Tools for Exploring Univariate Data

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Data Analysis and Statistical Methods Statistics 651

University of Jordan Fall 2009/2010 Department of Mathematics

STAT 200 Chapter 1 Looking at Data - Distributions

Part 3: Parametric Models

Final Exam STAT On a Pareto chart, the frequency should be represented on the A) X-axis B) regression C) Y-axis D) none of the above

2007 Winton. Empirical Distributions

appstats8.notebook October 11, 2016

STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives

I [Xi t] n ˆFn (t) Binom(n, F (t))

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

CENTRAL LIMIT THEOREM VISUALIZED IN EXCEL

Lecture 03 Positive Semidefinite (PSD) and Positive Definite (PD) Matrices and their Properties

Probability Method in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Descriptive Univariate Statistics and Bivariate Correlation

The Normal Distribution. The Gaussian Curve. Advantages of using Z-score. Importance of normal or Gaussian distribution (ND)

Class 15. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Chapter 18. Sampling Distribution Models /51

STT 315 This lecture is based on Chapter 2 of the textbook.

(It's not always good, but we can always make it.) (4) Convert the normal distribution N to the standard normal distribution Z. Specically.

Background to Statistics

Probability Distributions for Continuous Variables. Probability Distributions for Continuous Variables

Sampling distributions:

MALLOY PSYCH 3000 MEAN & VARIANCE PAGE 1 STATISTICS MEASURES OF CENTRAL TENDENCY. In an experiment, these are applied to the dependent variable (DV)

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Introducing the Normal Distribution

Lab 5 for Math 17: Sampling Distributions and Applications

1. Exploratory Data Analysis

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

hp calculators HP 50g Probability distributions The MTH (MATH) menu Probability distributions

Definition (The carefully thought-out calculus version based on limits).

Chapter 8: An Introduction to Probability and Statistics

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

The Central Limit Theorem

4.12 Sampling Distributions 183

OPIM 303, Managerial Statistics H Guy Williams, 2006

Transcription:

Statistics and Data Analysis in Geology 6. Normal Distribution probability plots central limits theorem Dr. Franz J Meyer Earth and Planetary Remote Sensing, University of Alaska Fairbanks 1

2

An Enormously Important Distribution The normal distribution is the most commonly used distribution in statistics Partly this is due to the fact that the normal distribution is a reasonable description of many processes from industrial processes to intelligence test scores Also, under specific conditions one can assume that sampling distributions are normally distributed even if the samples are drawn from populations that are not normally distributed (this is discussed further when we talk about the Central Limits Theorem) The normal distribution is also referred to as bell curve and you see a few examples below There are an infinite number of normal distributions that differ according to their mean (μ) and variance (σ 2 ) 3 3

Almost all natural processes follow the normal distribution The shape of a Normal distribution corresponds to a binomial distribution with p = 0.5 (compare to coin toss example of lecture 5) As N becomes large, the function becomes continuous and can be represented by the following equation 2 2 1 f ( X ) = ( X μ) 2σ e for < X < σ 2π it also can be thought of as N! X N X p q for p = 0.5 X! N X! ( ) A normal distribution can be characterized by only two parameters, μ and σ Approximation of histogram by normal distribution Two normal distributions with different mean but same standard deviation Two normal distributions with same mean but different standard deviation 4 4

The Standard Normal Distribution or Z Distribution It is often useful to standardize the variables so that populations can be compared. Standardization means that the mean, μ, = 0 and the standard deviation σ, = 1 Then the equation becomes: f X = e X 2 1 ( ) / 2 for < 2π < and the curve is expressed in numbers of standard deviations from the mean X 5

The Standard Normal Distribution or Z Distribution So you convert the normal distribution to the Z distribution by converting the original values to standard scores, which allows comparison among populations with different means and variances That s interesting as all normal distributions share the following characteristics: Symmetry Unimodality Continuous range from - to + A total area under the curve of 1 A common values for the mean, median, and mode We can make some assumptions about how the data is distributed within any normal distribution About 68% of the data fall within 1 σ About 95% of all data fall within 2 σ About 99.5% of all data fall within 3 σ 6 6

The Standard Normal Distribution or Z Distribution Standardization of normal random variables 7 7

The Standard Normal Distribution or Z Distribution For any sample, the way to standardize the data is called Z-transformation. For every point we calculate a Z-score, which is really a measure of how many standard deviations a point is from the mean. X i μ X i X Zi = or Zi = depending on if you are dealing with a sample σ S or population. Z scores can be positive or negative. 8

The Standard Normal Distribution or Z Distribution For any sample, the way to standardize the data is called Z-transformation. For every point we calculate a Z-score, which is really a measure of how many standard deviations a point is from the mean. X i μ X i X Zi = or Zi = depending on if you are dealing with a sample σ S or population. Z scores can be positive or negative. Example: A shell specimen with a value of 12 mm (X = 12) is drawn from a population with μ = 10, σ = 2. What is that sample s Z score? Z = ( 12 10) 2 = 2 2 = 1 or the sample is one standard deviation longer than the mean What if that same sample is drawn from a population with μ = 10, σ = 1 (Same mean different variance)? Z = ( 12 10) 1 = 2 1 = 2 In absolute terms the specimen is the same distance from the mean, however relative to the population as a whole, it is further away (more anomalous). 9

The Standard Normal Distribution or Z Distribution Example cont.: What if a different specimen (X = 14) is drawn from the population in example 1 with μ = 10, σ = 2? Z = ( 14 10) 2 = 4 2 = 2 So this sample is in the same position relative to the population as that from example 2. Z score 4 6 8 10 12 14 16 mm 10

The Standard Normal Distribution or Z Distribution For each normal distribution, the area under the curve is equal to 1. That is, the total probability is equal to 1 (as it was with the binomial distribution). Mathematically we can express this as: + f ( X ) dx =1 For Z-transformed data this is: f ( X ) dx + 2 1 = ( X ) 2 e dx = 1 2π 11

The Standard Normal Distribution or Z Distribution Similarly, we can calculate the probability of a sample as being less than or equal to some preset value Z as Z 1 2π e ( X ) 2 2 dx A different way to represent the normal distribution is by Cumulative Probability: They are plots of the area under the curve versus X. They can be made for any distribution. These types of plots are called OGIVE PLOTS, and I will come back to them later. 12

For the normal distribution, it is a pain in the neck to calculate this integral for every problem that we are going to do, so tables have been constructed. 13 13

14

The numbers in the table below are answers to the question: What is the Z value corresponding to a particular area under the curve? 15

Example of Cumulated Probability Grades of chip samples from a body of ore have a normal distribution with a mean of 12% (μ) and a standard deviation of 1.6 % (σ). (curve to the right helps to visualize the distribution) Problem 1: Find the probability of a specimen of 15% or less Calculate Z score (15-12)/1.6 = +1.88 The chart on slide 13 gives cumulative probability from very small (minus infinity) to the value: +1.88 = 0.97 (we have to interpolate between +1.8 and +1.9) Make a sketch to see if this makes sense So the probability of finding a sample with less than 15% ore is 97% 16

Example of Cumulated Probability Problem 2: What is the probability of finding ore greater than 14%? Z = (14-12)/1.6 = +1.25 the probability associated with this Z score is 0.895. This is the probability of 14% or less. The probability of 14% or more is 1 0.895 or 0.105 So the probability of finding a sample more than 14% ore is 10.5% 17

Example of Cumulated Probability Problem 3: What is the probability of finding ore grade of less than 8%? Z = (8-12)/1.6 = -2.5 the probability associated with this Z score is 0.0062 So the probability of finding a sample less than 8% ore is 0.62%, not very likely 18

Example of Cumulated Probability Problem 4: What is the probability of a sample being between 8% and 15%? Calculate the Z scores for each value: Z 8 = (8-12)/1.6 = -2.5 --> 0.62% Z 15 = (15-12)/1.6 = 1.88 --> 97% Subtract the smaller from the larger: 97 0.62 = 96.38%, so about 96% or all samples fall in that range. 19

Example of Cumulated Probability area under the curve μ ± 1σ 0.8413-0.1587 = 0.6826 68% μ ± 2σ 0.9773-0.0228 = 0.9545 95.5% μ±1.96σ = 95% μ ± 3σ 0.9987-0.00140 = 0.9973 99% 20

The Central Limits Theorem 21

The Central Limits Theorem If you draw a number of samples from a normal distribution population, we find that the sample means will form a normal distribution BUT we don't always know the distribution of the population Central Limits Theorem: CLT states that independent of their original statistical distribution, the re-averaged sum of a sufficiently large number of identically distributed independent random variables will be approximately normally distributed. In other words, if sufficiently large sets of random samples are taken from any population, and the means are calculated for those samples, then these sample means will tend to be normally distributed. 22

The Central Limits Theorem Central Limits Theorem: Again in other words: if we take all possible samples of size n from any population with a mean of μ and a standard deviation of σ, the distribution of sample means will have: X = μ mean of also written as Standard deviation of means, This is also called the standard error of the mean, s e will be normally distribution when the parent population is normal will approach a normal distribution as N approaches infinity regardless of the distribution of the parent population. s X X X = σ = μ n 23

The Central Limits Theorem 1 2 4 25 24

The Central Limits Theorem Some animated examples: Uniform distribution: Log-normal distribution: Parabolic distributions: http://www.statisticalengineering.com/central_limit_theorem.htm 25

The Central Limits Theorem This means, if we average enough we can always reduce data of unknown statistics to data of known properties. Practically, we can use our Z-statistic X i μ Z = (1) useful when we want to infer something from single values taken from a normal population (X i drawn from population) and adapt it for CLT for a sample of size N drawn from a population with known mean and standard deviation. X μ (2) You can see that equation (1) is the same as (2) if n = 1 (a single sample) So both equations are just more specific forms of the general equation s e is the standard deviation of means = Z i Z = = σ X σ σ s e 1/ n μ 1/ n 26

The Central Limits Theorem For the example from earlier: A sample with a value of 14% (X = 14) is drawn from a population with μ = 12, σ = 1.6. What is the probability of finding a single sample equal to or greater than 14% ore? First calculate that sample s Z score. 14 12 Z = 1.6 2 = = 1.25 1.6 Or the probability of finding one such sample or greater was about 10.5%. 27

The Central Limits Theorem For the example from earlier: Now, what if we selected 4 samples (n = 4) and the mean of those specimens was 14%? 14 12 Z = 1.6 1/ 4 2 = = 1.6 (1/ 2) 2 0.8 = 2.5 And the probability of finding four such specimens is less, in fact it is only 0.62%!!! 28