Confidence Intervals

Similar documents
Basic Probability Reference Sheet

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Discrete Random Variables

PHYSICS 15a, Fall 2006 SPEED OF SOUND LAB Due: Tuesday, November 14

CS 361: Probability & Statistics

Joint Probability Distributions and Random Samples (Devore Chapter Five)

A Primer on Statistical Inference using Maximum Likelihood

Basic Probability. Introduction

Calculus II. Calculus II tends to be a very difficult course for many students. There are many reasons for this.

Chapter 26: Comparing Counts (Chi Square)

Lecture 12: Quality Control I: Control of Location

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

N/4 + N/2 + N = 2N 2.

Chapter 1 Review of Equations and Inequalities

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Distributions of linear combinations

Quadratic Equations Part I

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

DIFFERENTIAL EQUATIONS

Statistics 100A Homework 5 Solutions

CS 361: Probability & Statistics

Data Analysis and Monte Carlo Methods

Originality in the Arts and Sciences: Lecture 2: Probability and Statistics

Instructor (Brad Osgood)

Lecture 5. 1 Review (Pairwise Independence and Derandomization)

3 Multiple Discrete Random Variables

Solving with Absolute Value

CS 361: Probability & Statistics

STA 111: Probability & Statistical Inference

X = X X n, + X 2

Stat 20 Midterm 1 Review

CSC236 Week 3. Larry Zhang

Line Integrals and Path Independence

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Week 12-13: Discrete Probability

Physics 6720 Introduction to Statistics April 4, 2017

Lecture 4: September Reminder: convergence of sequences

T has many other desirable properties, and we will return to this example

P (A) = P (B) = P (C) = P (D) =

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

CS 246 Review of Proof Techniques and Probability 01/14/19

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

Parameter estimation Conditional risk

Probability (Devore Chapter Two)

University of Regina. Lecture Notes. Michael Kozdron

Finite Mathematics : A Business Approach

Metric spaces and metrizability

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

1.4 Techniques of Integration

Uncertainty. Michael Peters December 27, 2013

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Guide to Proofs on Sets

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Tutorial:A Random Number of Coin Flips

A Brief Review of Probability, Bayesian Statistics, and Information Theory

1 Review of The Learning Setting

STA Module 4 Probability Concepts. Rev.F08 1

Probability and Estimation. Alan Moses

An introduction to basic information theory. Hampus Wessman

Getting Started with Communications Engineering

Descriptive Statistics (And a little bit on rounding and significant digits)

Please bring the task to your first physics lesson and hand it to the teacher.

Main topics for the First Midterm Exam

CS 124 Math Review Section January 29, 2018

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

Math 52: Course Summary

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 18

Review of Probability. CS1538: Introduction to Simulations

CS 147: Computer Systems Performance Analysis

CS 361: Probability & Statistics

CS280, Spring 2004: Final

Section 5.4. Ken Ueda

Parametric Models: from data to models

Understanding Exponents Eric Rasmusen September 18, 2018

Achilles: Now I know how powerful computers are going to become!

Probability and Independence Terri Bittner, Ph.D.

Figure 1: Doing work on a block by pushing it across the floor.

( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of

1 Impact Evaluation: Randomized Controlled Trial (RCT)

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Why maximize entropy?

Sequence convergence, the weak T-axioms, and first countability

MATH MW Elementary Probability Course Notes Part I: Models and Counting

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.

TheFourierTransformAndItsApplications-Lecture28

Expectation is linear. So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then,

MITOCW watch?v=vjzv6wjttnc

Part 3: Parametric Models

MATH2206 Prob Stat/20.Jan Weekly Review 1-2

SDS 321: Introduction to Probability and Statistics

MI 4 Mathematical Induction Name. Mathematical Induction

Solution to Proof Questions from September 1st

Statistical Inference, Populations and Samples

Why should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4.

Toss 1. Fig.1. 2 Heads 2 Tails Heads/Tails (H, H) (T, T) (H, T) Fig.2

Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models

Inferring information about models from samples

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

MAT Mathematics in Today's World

Transcription:

Quantitative Foundations Project 3 Instructor: Linwei Wang Confidence Intervals Contents 1 Introduction 3 1.1 Warning....................................... 3 1.2 Goals of Statistics.................................. 3 1.3 Random Variables.................................. 6 1.4 Distributions..................................... 6 1.5 Expectations..................................... 7 1.6 Variance....................................... 9 1.7 Exercise........................................ 9 1.8 More Information.................................. 12 2 The Central Limit Theorem 13 2.1 Averaging Variables................................. 13 2.2 Entropy........................................ 14 2.3 The Normal Distribution.............................. 16 2.4 Central Limit Theorem............................... 17 2.5 Summary....................................... 18 2.6 More Information.................................. 19 3 Confidence Intervals 20 3.1 Definition...................................... 20 3.2 Exact Intervals using Hoeffding s Inequality.................. 20 1

Confidence Intervals 2 3.3 Asymptotic Intervals Using the Normal Distribution.............. 24 3.3.1 Known σ................................... 24 3.3.2 Unknown σ................................. 25 3.4 More Information.................................. 27

Confidence Intervals 3 1 Introduction 1.1 Warning By necessity, we will need to use many concepts here such as random variable, independent or probability density without defining them in a mathematically rigorous way. (Which would take the entire time alloted for this section!) Feel free to ask questions, and we will try to convey the meanings of these concepts at a level needed for working knowledge, but in the end, it is your responsibility to fill gaps in your own background as needed. 1.2 Goals of Statistics Let s begin by talking about the basic goals of statistics. Let s take a very simple example. Suppose we have a weighted 4-sided die, which has some probability of returning each number in 1, 2,...,4. Thus, we can think of that die as a simple probability distribution: p 1 x =1 p 2 x =2 p(x) = p 3 x =3 p 4 x =4 So, if we knew the values of p 1,...,p 4 we would know everything we care about the die. Now, in order to be a valid probability distribution, there are a couple of obvious conditions that must be satisfied. First, probabilities must be non-negative, that is i, p i 0. Secondly, we have to get some number. This means that p 1 + p 2 + p 3 + p 4 =1. Any set of numbers p 1,...,p 4 constitutes a valid probability distribution. Now, suppose that we flip the die a bunch of times, and we get the following result: 4, 1, 3, 2, 1, 2, 4, 2, 1, 1. Example statistical problem 1: What is p? Can you think of a way to estimate it? An obvious solution would be to make a histogram, with probability proportional to the

Confidence Intervals 4 number of times each flip occurred. This yields the estimated distribution.4 x =1.3 x =2 ˆp(x) =..1 x =3.2 x =4 This is a reasonable guess, but there are some obvious problems here. In this particular case, perhaps we just happened to get more results of x =1by chance. Obviously, we can t expect that the above probabilities are the true ones. For example, simulating another dataset of size 10, I get the data: along with the estimated distribution 2, 1, 1, 2, 4, 4, 2, 4, 1, 3.3 x =1.3 x =2 ˆp(x) =..1 x =3.3 x =4 If we really consider the situation, we can t make any rigorous guarantees on the difference of our estimated distribution to the true one. Maybe the dice rolls we got just happened to be highly unusual! Let s consider another example. Suppose we are interested in the probability p(x) that a person has a given height. Now, notice a worrisome technical difficulty here. If we pick any particular height, say 152.92125243cm, it seems exceedingly unlikely that we will ever find a person with such a height. Rather, for continuous variables, we should formally speak of probability densities, not probability distributions. This means, we are looking for a function p(x) such that Pr[a X b] = b x=a p(x) dx. That means, we get real probabilities by integrating a probability density. In particular, notice that it is possible for a probability density to be greater than one. (E.g. the density p(x) =ai[0 x 1/a] can have any arbitrarily high value a.) In any case, probability densities obey similar rules to probability distributions, namely x, p(x) 0

Confidence Intervals 5 and + p(x) =1. x= Anyway, suppose that we go out onto the street, get a set of 10 random people, and measure their heights in centimeters. We might get data like the following: 154, 192, 145, 101, 155, 167.23 Now, we might ask to recover the original probability density p(x). However, we might also be interested in only aspects of the distribution. For example, we might only care about the mean of p, µ = + x= xp(x) dx. Example statistical problem 2: What is µ? Can you think of a way to estimate it? The mean of the above dataset is 152.3717. But, of course, we want to know the true mean, which is presumably different. So, rather than simply reporting the mean, we should report some sort of guarantee of its reliability. It would be really nice, if we could make a statement like the following: The true mean is in the range 149-155. The problem is, we can t do that! We could have gotten really unlucky in our dataset. In principle (knowing nothing about the real heights of humans on earth) the true mean height could be 50, and we just happened to be very unlucky and get unusually tall people when we collected our data. Thus, in statistics, we will have to resign ourselves to fundamentally weaker guarantees. Roughly speaking, we will make guarantees of the following type: Unless we were unlucky, the true mean is in the range 149-155. We will even go on to quantify exactly what unlucky means and how unlucky we would have to be. That is, we will ultimately make a guarantee like this: A 95% confidence interval for the true mean is the range 149-155. Now, notice: This does NOT mean that there is a 95% probability the true mean is in the range 149-155. (If you remember one thing about statistics from this course, let it be this!) The true mean is a fixed number. We don t happen to know it, but it is out there in the world, and it is what it is.

Confidence Intervals 6 Rather, what we are saying is this: We have a procedure for building these things we call confidence intervals. The guarantee we make is precisely this: If you go out into the world and collect data, and then build confidence intervals, than 95% of the time your confidence interval will contain the true mean. That s all the guarantee they make. It isn t really the guarantee we would like to make. It is awkward. In real life, you do one experiment, and you want to know what the mean is. A confidence interval doesn t tell you what you want to know. We compute confidence intervals because they are the thing we are able to compute, not because they are the thing we want to compute. The rest of these notes will concentrate on background material to get your statistical brain muscles warmed-up. 1.3 Random Variables Very informally, a random variable is a number that comes from a random event. Example: Flip a coin 7 times, and let X be the number of heads that come up. Example: Gather data on the heights of 15 people, and let X be the mean measured heights. You will come to appreciate the purpose of random variables in time. 1.4 Distributions A variable has a uniform distribution if its probability density is given by, for some numbers a<b 1 a x b b a p(x) =. 0 else A variable has a Bernoulli distribution if its probability distribution is given by for some number θ [0, 1] p(x) = θ x =1 1 θ x =0. A variable has a Normal or Gaussian distribution if it is given by, for some numbers µ and σ>0

Confidence Intervals 7 p(x) σ 2π exp( 1 2σ 2 (x µ)2 ) The normal is extremely important because (as you might imagine from the name) many phenomena tend to have a Normal or approximately Normal distribution. Exercise: Draw some data of sizes 10, 100, 1000, and 10000 from each of these three distributions. Calculate a histogram in each case. Calculate the mean of your data. Do you notice anything funny? 1.5 Expectations Given a random variable X, its expected value is defined as E[X] = xp(x)dx x if Xis continuous, and E[X] = x xp(x) if X is discrete. Exercise: Suppose X is uniform. What is the expected value? (Answer: xp(x)dx = b xp(x) = b x 1 = 1 b x 1 a a b a b a a 2 b a x2 b a 1 2 b a (b2 a 2 ) (b+a)(b a) (b + a)) 2 (b a) 2 Exercise: Suppose X is Bernoulli. What is the expected value? (Answer: 0p(0)+1p(1) = θ) Exercise: Suppose X is Normal. What is the expected value? (Answer: calculus gets ugly. However, clearly by symmetry the answer is µ) An important property of expectations is that Theorem 1. The expected value of the sum of a finite number of random variables is the sum of expected values, i.e. n n E[ X i ]= E[X i ] Note that this theorem does not assume anything about the random variables (other than that the expected values exist). In particular, we do not assume that they are independent.

Confidence Intervals 8 Exercise: Prove this, for the case of two continuous random variables. Answer: E[X 1 + X 2 ] = (x 1 + x 2 )p(x 1,x 2 )dx 1 dx 2 x 1 x 2 = x 1 p(x 1,x 2 )dx 1 dx 2 + x 2 p(x 1,x 2 )dx 1 dx 2 x 1 x 2 x 1 x 2 = x 1 p(x 1 )dx 1 + x 2 p(x 2 )dx 2 x 1 x 2 = E[X 1 ]+E[X 2 ] A second, easy property of expectations is this: Theorem 2. The expected value of a constant times a random variable is that constant times the expected value, i.e. E[aX] =ae[x] Exercise: Prove this. Answer (for continuous variables): E[aX] = ax p(x) dx x = a xp(x) dx x = ae[x] Another important property, which is true only for independent random variables, is this: Theorem 3. The expected value of product of a finite number of random variables is the product of expected values, i.e. n n E[ X i ]= E[X i ] Exercise: Prove this for the case of two variables. Answer: E[X 1 X 2 ] = x 1 x 2 p(x 1,x 2 )dx 1 dx 2 x 1 x 2 = x 1 x 2 p(x 1 )p(x 2 )dx 1 dx 2 (using independence) x 1 x 2 = x 1 p(x 1 )dx 1 x 2 p(x 2 )dx 2 x 1 x 2 = E[X 1 ]E[X 2 ]

Confidence Intervals 9 1.6 Variance Given some random variable X with mean µ, the variance is defined to be V[X] =E[(X µ)] 2, where µ = E[X]. A standard and useful result is that Theorem 4. V[X] =E[X 2 ] µ 2 Exercise: Prove this. (Answer: V[X] =E[(X µ)] 2 = E[X 2 2Xµ+µ 2 ]=E[X 2 ] 2µE[X]+ µ 2 = E[X 2 ] µ 2.) 1.7 Exercise Exercise: Suppose we have a dataset of size N, generated from a Bernoulli distribution (i.e. a bent coin). Let the data be X 1,X 2,X 3,...,X N. Suppose we want to estimate the parameter θ for this distribution. The obvious estimator for this would be θ N N X i. N That is, we estimate the bias of the coin to be exactly the fraction of the data that resulted in a head. Part 1: What is the expected value of θ N? Part 2: What is the variance of θ N? Part 3: Simulate this estimator and calculate its variance. Specifically, write a function that takes a value of θ and N. Generate 10000 Make sure that your simulation actually displays the variance you calculated. Answer to part 1:

Confidence Intervals 10 Eθ N = E 1 N N X i N N E X i (by Theorem 2) N N = θ N EX i (by Theorem 1) N θ

Confidence Intervals 11 Answer to part 2, in laborious detail: Vθ N = E[(θ N µ) 2 ] = E[θ 2 N] µ 2 (by Theorem 4) = E[θ 2 N] θ 2 E[θ 2 N] = E[ 1 N N 2] X i N E[ X 2 i X j ] i j N E[ X 2 i X j ] i j N E[ X 2 i 2 + X i X j ] (split into two groups) i i j=i N E[ X 2 i + X i X j ] (since 0 2 =0and 1 2 =1) i i j=i E[X N 2 i ]+ E[X i X j ] (by Theorem 1) i i j=i E[X N 2 i ]+ E[X i ]E[X j ] (by Theorem 3) i i j=i θ + θ 2 N 2 i i j=i Nθ + N(N 1)θ 2 N 2 N Thus, finally, the variance is (N 1) θ + N θ2 Vθ N (N 1) θ + N N θ2 θ 2 (N 1) θ + N N θ2 N N θ2 N θ 1 N θ2 N (θ θ2 )

Confidence Intervals 12 In particular, over the [0, 1] interval, θ θ 2 is maximized at 1 with θ 2 θ2, and so 4 Answer to part 3: Vθ N 1 4N. function estimate_bernoulli_variance(theta,n) maxrep 0000; theta_est = zeros(maxrep,1); for rep :maxrep X = sample_bernoulli(n,theta); theta_est(rep) = mean(x); end [mean(theta_est) theta] [var(theta_est) (1/N)*(theta-theta^2)] 1.8 More Information See Arian Maleki and Tom Do s review of probability theory..