Summary statistics, distributions of sums and means

Similar documents
EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

Conditional distributions (discrete case)

Part 3: Parametric Models

Chapter 3 Discrete Random Variables

Statistics and Data Analysis in Geology

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Part 3: Parametric Models

Binomial random variable

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

The Central Limit Theorem

RVs and their probability distributions

Chapter 18 Sampling Distribution Models

Introduction. Probability and distributions

Example 1. Assume that X follows the normal distribution N(2, 2 2 ). Estimate the probabilities: (a) P (X 3); (b) P (X 1); (c) P (1 X 3).

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Probability Method in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Probability and Discrete Distributions

Chapter 8: An Introduction to Probability and Statistics

Discrete and continuous

Projects in Geometry for High School Students

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

COMP6053 lecture: Sampling and the central limit theorem. Jason Noble,

2007 Winton. Empirical Distributions

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

SDS 321: Introduction to Probability and Statistics

Lecture 8 Sampling Theory

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Lecture 4: Constructing the Integers, Rationals and Reals

Conditioning a random variable on an event

Statistics, Probability Distributions & Error Propagation. James R. Graham

COMP6053 lecture: Sampling and the central limit theorem. Markus Brede,

What is a random variable

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

Probability and Probability Distributions. Dr. Mohammed Alahmed

Statistics and Econometrics I

To find the median, find the 40 th quartile and the 70 th quartile (which are easily found at y=1 and y=2, respectively). Then we interpolate:

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.

L06. Chapter 6: Continuous Probability Distributions

Basic Probability Reference Sheet

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

Polytechnic Institute of NYU MA 2212 MIDTERM Feb 12, 2009

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Review for Exam Spring 2018

MFM Practitioner Module: Quantitative Risk Management. John Dodson. September 23, 2015

Counting principles, including permutations and combinations.

Binomial and Poisson Probability Distributions

Example. If 4 tickets are drawn with replacement from ,

Chapter 4: An Introduction to Probability and Statistics

Probability and Inference. POLI 205 Doing Research in Politics. Populations and Samples. Probability. Fall 2015

the probability of getting either heads or tails must be 1 (excluding the remote possibility of getting it to land on its edge).

CS 361: Probability & Statistics

Statistics 100A Homework 5 Solutions

STATISTICS 1 REVISION NOTES

Lecture 13. Poisson Distribution. Text: A Course in Probability by Weiss 5.5. STAT 225 Introduction to Probability Models February 16, 2014

MATH 3510: PROBABILITY AND STATS July 1, 2011 FINAL EXAM

1.4 Techniques of Integration

Name: Firas Rassoul-Agha

Probability Theory and Random Variables

2.1 Lecture 5: Probability spaces, Interpretation of probabilities, Random variables

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Probability. We will now begin to explore issues of uncertainty and randomness and how they affect our view of nature.

Bell-shaped curves, variance

Math Bootcamp 2012 Miscellaneous

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 1

ORF 245 Fundamentals of Statistics Practice Final Exam

S2 QUESTIONS TAKEN FROM JANUARY 2006, JANUARY 2007, JANUARY 2008, JANUARY 2009

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

MITOCW MITRES6_012S18_L22-10_300k

Chapter 2: The Random Variable

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Intro to probability concepts

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Chapter IV: Random Variables - Exercises

Probability Methods in Civil Engineering Prof. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Class 26: review for final exam 18.05, Spring 2014

Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, Jeremy Orloff and Jonathan Bloom

4452 Mathematical Modeling Lecture 14: Discrete and Continuous Probability

MA 1125 Lecture 33 - The Sign Test. Monday, December 4, Objectives: Introduce an example of a non-parametric test.

Math 180B Problem Set 3

Chapter 5. Means and Variances

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Sociology 6Z03 Topic 10: Probability (Part I)

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Probability and Estimation. Alan Moses

Lecture 4. Models of DNA and protein change. Likelihood methods

SDS 321: Introduction to Probability and Statistics

Confidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom

An introduction to biostatistics: part 1

6.867 Machine Learning

Unit 4 Probability. Dr Mahmoud Alhussami

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

Poisson processes Overview. Chapter 10

Lecture 4: Random Variables and Distributions

STOR Lecture 16. Properties of Expectation - I

Momentum & Collisions

Mark Scheme (Results) June 2008

Essentials of Statistics and Probability

MATH4427 Notebook 4 Fall Semester 2017/2018

Transcription:

Summary statistics, distributions of sums and means Joe Felsenstein Summary statistics, distributions of sums and means p.1/17

Quantiles In both empirical distributions and in the underlying distribution, it may help us to know the points where a given fraction of the distribution lies below (or above) that point. In particular: The 2.5% point The 5% point The 25% point (the first quartile) The 50% point (the median) The 75% point (the third quartile) The 95% point (or upper 5% point) The 97.5% point (or upper 2.5% point) Summary statistics, distributions of sums and means p.2/17

The mean The mean is the average of points. If the distribution is the theoretical one, it is called the expectation, it s the theoretical mean we would be expected to get if we drew infinitely many points from that distribution. For a sample of points x 1, x 2,...,x 100 the mean is simply their average x = (x 1 + x 2 + x 3 +... + x 100 ) / 100 For a distribution with possible values 0, 1, 2, 3,... where value k has occurred a fraction f k of the time, the mean weights each of these by the fraction of times it has occurred (then in effect divides by the sum of these fractions, which however is actually 1): x = 0 f 0 + 1 f 1 + 2 f 2 +... Summary statistics, distributions of sums and means p.3/17

The expectation For a distribution that has a density function f(x), we add up the value (x) times its probability of occurring, for the zillions of tiny vertical slices each of width dx which each has a fraction f(x) dx of all points. The result is the integral E[x] = x f(x) dx The expectations are known for the common distributions: Binomial(N, p) N p Geometric(p) 1/p Poisson(λ) λ Uniform(0,1) 1/2 Uniform(a,b) (a+b)/2 Exponential(with rate λ) 1/λ Normal(0, 1) 0, of course Summary statistics, distributions of sums and means p.4/17

The variance The most useful measure of spread of a distribution is gotten by taking each point, getting its deviation from the mean, which is x x, squaring that, and averaging those: h (x 1 x) 2 + (x 1 x) 2 + (x 2 x) 2 + (x 3 x) 2 +... + (x 100 x) 2i / 100 The result is known as the variance. Note these are squares of how far the point is from the mean. So they do not measure spread in the most straightforward way. Of course there is an integral version for the true theoretical distribution as well, and we can talk of its variance. Summary statistics, distributions of sums and means p.5/17

The standard deviation The standard deviation is the square root of the variance. This gets us back on the original scale. If a distribution is of how many milligrams something weighs, the variance is in units of square milligrams which is slightly weird. The standard deviation is just in milligrams. Sometimes the ratio of the standard deviation to the mean is useful (especially if all values are positive, such as weights). This is called the coefficient of variation. Trick question: why not instead just compute the deviations of each point from the mean (which could be positive or negative) and average that? Less obvious question: why not average the absolute values of deviations from the mean? (Because its mathematical properties are uglier even though it seems simpler). Summary statistics, distributions of sums and means p.6/17

Distribution of multiples of a variable Just staring at the formula for calculating the mean you can immediately see that if you multiply all the sampled values (or all the theoretical; values) by, say, 3, the mean is 3 times as big as a result. The same holds for any constant c, even including negative values and zero. It is also true for the expectation, as the constant comes outside of the integral and that proves this. Summary statistics, distributions of sums and means p.7/17

Means of sums of variables When we draw a pair of points (x, y) where the x s come from one distribution and the y s from another, and we compute for each pair their sum, the mean of x + y is simply the mean of the x s plus the mean of the mean of the y s. That s true for samples of pairs, and its true for theoretical distribution of pairs even if the two values are not independent. In fact its true for sums of more quantities as well the means just add up, and the expectations just add up. Summary statistics, distributions of sums and means p.8/17

Variances of sums of variables It will be less obvious that variances also add up. In fact they don t really, but they do for expectations if the quantity x is drawn independently of the quantity y. In a sample the variances of sums aren t exactly the sums of variances, even in that case. If we add more (independent) variables, their expectations add up. If they aren t independent they don t add up. (Let s test the additivity of the variances for independent draws during the R exercise period). Summary statistics, distributions of sums and means p.9/17

Variances of multiples of variables You can see from the formula for computing the variance that if we multiply all the values x by the same constant, say 7, since the mean is multiplied by 7 too, the squared deviations from the mean are all multiplied by 7 2 = 49. So variances are then multiplied by the square of that common multiple (in our case 7). It follows that the standard deviation is just multiplied by 7, or whatever the quantity is. Summary statistics, distributions of sums and means p.10/17

Variances of means of variables Put the preceding two slides together, and you get that (for sums of 12 independent draws from the same distribution) the variance of the mean of the 12 of them is 1/12 of the variance of one of them. (Because the sum is 12 times as big, then that gets divided by 12 2 ). Implication: the standard deviation of a mean of 12 independent draws is 1/ 12 as big as the standard deviation of the distribution from which they are drawn. Summary statistics, distributions of sums and means p.11/17

Standard deviation of sum of variables If they re independent, it s the square root of the sum of their variances, so it s the square root of the sum of squares of their standard deviations. Summary statistics, distributions of sums and means p.12/17

Sums of independent variables get more normal There is a theorem (the Central Limit Theorem), provable under pretty general conditions, that for independent draws, sums of n quantities are more normally distributed that the original quantities are. This happens startlingly quickly: A uniform distribution sum of 2 uniform variables 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 sum of 3 sum of 4 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 Summary statistics, distributions of sums and means p.13/17

So do means Means are just scaled versions of sums, so it works for them too: they are distributed nearly normally if there are more than a few. We can predict that the mean weight of the next 100 vehicles that cross the I-5 bridge is nearly normally distributed, even if the original distribution of vehicle weights is strange. It s even true if the quantities aren t fully independent, provided there is enough lack of correlation of quantities far apart in the sum. (But there are some weird distributions that have heavy tails that wont have their averages become more normal). Summary statistics, distributions of sums and means p.14/17

What about the sum of independent Poissons? Think about this one: if business calls arrive on a telephone line (say, to a telephone switching center) independently and a Poisson number comes in each hour (with expectation 8.2 calls), and also personal calls arrive in a Poisson process too, but with expectation 17.1 calls per hour)... what is the distribution of the total number of calls in an hour? Summary statistics, distributions of sums and means p.15/17

... of independent binomials? If we toss a coin 200 times, with heads probability p, the number of Heads is a draw from a binomial distribution. Now if we make (say) 300 more tosses, what is the distribution of the total number of Heads over both sets? I thought you would say that. Now is this true if the second set of tosses is with a different coin with a different probability of Heads? (Think about the extreme case where the first coin almost never can come up Heads, and the second one almost always does.) Summary statistics, distributions of sums and means p.16/17

How it was done This projection produced as a PDF, not a PowerPoint file, and viewed using the Full Screen mode (in the View menu of Adobe Acrobat Reader): using the prosper style in LaTeX, using Latex to make a.dvi file, using dvips to turn this into a Postscript file, using ps2pdf to make it into a PDF file, and displaying the slides in Adobe Acrobat Reader. Summary statistics, distributions of sums and means p.17/17