Lecture 20 Random Samples 0/ 13

Similar documents
Discrete Distributions

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 16 Notes. Class URL:

Each trial has only two possible outcomes success and failure. The possible outcomes are exactly the same for each trial.

Great Theoretical Ideas in Computer Science

Confidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom

STA 260: Statistics and Probability II

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 2

Chapters 3.2 Discrete distributions

Bernoulli Trials, Binomial and Cumulative Distributions

Last few slides from last time

ECE 450 Lecture 2. Recall: Pr(A B) = Pr(A) + Pr(B) Pr(A B) in general = Pr(A) + Pr(B) if A and B are m.e. Lecture Overview

Binomial random variable

Discrete Random Variables

STAT 31 Practice Midterm 2 Fall, 2005

Midterm Exam 1 Solution

Analysis of Engineering and Scientific Data. Semester

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Stats for Engineers: Lecture 4

Bernoulli and Binomial Distributions. Notes. Bernoulli Trials. Bernoulli/Binomial Random Variables Bernoulli and Binomial Distributions.

Introduction to Statistical Data Analysis Lecture 4: Sampling

Econ 325: Introduction to Empirical Economics

Stat Lecture 20. Last class we introduced the covariance and correlation between two jointly distributed random variables.

success and failure independent from one trial to the next?

Precept 4: Hypothesis Testing

Geometric Distribution The characteristics of a geometric experiment are: 1. There are one or more Bernoulli trials with all failures except the last

3 Conditional Expectation

Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya

Chapter 3 Discrete Random Variables

CS 237: Probability in Computing

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables

Guidelines for Solving Probability Problems

Probability Distributions

Let us think of the situation as having a 50 sided fair die; any one number is equally likely to appear.

Example A. Define X = number of heads in ten tosses of a coin. What are the values that X may assume?

Statistics 100A Homework 5 Solutions

Carolyn Anderson & YoungShil Paek (Slide contributors: Shuai Wang, Yi Zheng, Michael Culbertson, & Haiyan Li)

Bernoulli Trials and Binomial Distribution

With Question/Answer Animations. Chapter 7

BINOMIAL DISTRIBUTION

Continuous-Valued Probability Review

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Lecture 16. Lectures 1-15 Review

Probability Distributions.

CS 361: Probability & Statistics

Discrete Random Variables

Discrete random variables and probability distributions

Solutions - Final Exam

Chapter 18: Sampling Distributions

Some Basic Concepts of Probability and Information Theory: Pt. 1

Binomial and Poisson Probability Distributions

EE126: Probability and Random Processes

P (A) = P (B) = P (C) = P (D) =

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

4. Discrete Probability Distributions. Introduction & Binomial Distribution

1 Basic continuous random variable problems

Homework 2 Fall 2018 STAT 305B Due 9/13 (R) NOTE: All work associated with a given problem part MUST be placed DIRECTLY beneath that part.

Class Note #20. In today s class, the following four concepts were introduced: decision

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Confidence Intervals for the Sample Mean

II. The Binomial Distribution

STA 584 Supplementary Examples (not to be graded) Fall, 2003

Introduction to Bayesian Learning. Machine Learning Fall 2018

Probabilistic Systems Analysis Spring 2018 Lecture 6. Random Variables: Probability Mass Function and Expectation

ORIE 4741: Learning with Big Messy Data. Generalization

Management Programme. MS-08: Quantitative Analysis for Managerial Applications

Probabilistic Aspects of Voting

Intermediate Math Circles November 15, 2017 Probability III

Introduction to Probability, Fall 2009

Chapter 2 Random Variables

Massachusetts Institute of Technology Department of Electrical Engineering & Computer Science 6.041/6.431: Probabilistic Systems Analysis

To find the median, find the 40 th quartile and the 70 th quartile (which are easily found at y=1 and y=2, respectively). Then we interpolate:

MTH4451Test#2-Solutions Spring 2009

Gov 2000: 6. Hypothesis Testing

6.3 Bernoulli Trials Example Consider the following random experiments

CS 237: Probability in Computing

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Central Theorems Chris Piech CS109, Stanford University

(A) Incorrect! A parameter is a number that describes the population. (C) Incorrect! In a Random Sample, not just a sample.

Uncertainty. Michael Peters December 27, 2013

Probability Distributions

Exam 2 Practice Questions, 18.05, Spring 2014

Discrete random variables

Lecture 2: Discrete Probability Distributions

Categorical Data Analysis 1

Introduction to Probability

Logic. Quantifiers. (real numbers understood). x [x is rotten in Denmark]. x<x+x 2 +1

Northwestern University Department of Electrical Engineering and Computer Science

Chapter Three. Hypothesis Testing

STAT J535: Introduction

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

STAT:5100 (22S:193) Statistical Inference I

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 15 Notes. Class URL:

Review of probabilities

Outline PMF, CDF and PDF Mean, Variance and Percentiles Some Common Distributions. Week 5 Random Variables and Their Distributions

CISC 1100/1400 Structures of Comp. Sci./Discrete Structures Chapter 7 Probability. Outline. Terminology and background. Arthur G.

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Business Statistics 41000: Homework # 5

Transcription:

0/ 13

One of the most important concepts in statistics is that of a random sample. The definition of a random sample is rather abstract. However it is critical to understand the idea behind the definition, so we will spend an entire lecture motivating the definition we will do this by giving three motivating examples: polling for elections, testing the lifetime of a Gateway computer, and picking a sequence of random numbers. 1/ 13

First Motivating Example We recall that a random variable X is a Bernoulli random variable if X takes exactly two values 0 and 1 such that P(X = 1) = p P(X = 0) = q q = 1 p In this case we write X Bin(1, p) (the Bernoulli distribution is the special case of the binomial distribution where n = 1). We define a Bernoulli random variable X election as follows. Choose a random voter in the U.S. Ask him (her) if he (she) intends to vote for Trump in the next election. Record 1 if yes and 0 if no. So X election takes values 0 and 1 with definite (but unknown to us) probabilities q and p. 2/ 13

The $ 64,000 question What is p? How do you answer this question? Take a poll - in the language of statistics we say one is taking a sample from a Bin(1, p) distribution where p is unknown. If we poll n people we arrive at a sequence of 0 s and 1 s We can represent this schematically by x 1, x 2,..., x n The X i s here should be lower case. 3/ 13

We think of x 1, x 2,..., x n as the results after the poll is taken. We now introduce random variables X 1, X 2,..., X n representing the potential outcomes before the poll is taken - we assume we have decided how many people we will talk to and how we are going to choose them. Thus taking a poll assigns definite values x 1, x 2,..., x n to the random variables X 1, X 2,..., X n. We may schematically represent the situation before the poll is taken by The dotted arrow means we have not yet performed the poll. 4/ 13

It is critical to observe that X 1, X 2,..., X n are random variables (x 1, x 2,..., x n are ordinary i.e. numerical variables). The X i s take values 0 and 1 with probabilities q and p respectively. So the X 1s have the same probability distribution as the underlying (i.e. the distribution we are sampling from) random variable X election. The random variables X 1,..., X n will be independent if the poll is constructed properly. Hence, the random variables X 1, X 2,..., X n are independent and identically distributed. We say X 1, X 2,..., X n is a random sample from a Bin(1, p) distribution. 5/ 13

We conclude this example with a formal mathematical construction of X 1, X 2,..., X n. The sample space S of the above poll ( experiment ) is the set of all n-tuples (x 1, x 2,..., x n ) of 0 s and 1 s. It is the same as the sample space for n flips of a weighted coin. There is a probability measure P defined on S. For example, P(0, 0, 0) = q n The random variables X 1, X 2,..., X n are defined to be functions on S defined by X i (x 1,..., x n ) = X i So they are random variables - a random variable is a function on a probability space, that is a set S with equipped with a probability measure P. 6/ 13

Second Motivating Example Suppose now we wish to study the expected life of a Gateway ( a computer company which I think is no longer in business)computer so in this case we would be studying the random variable X Gateway which is defined as follows: (X Gateway = t) means that a randomly selected Gateway computer fails at time t. A good model for the distribution of X Gateway is an exponential distribution with a definite but unknown mean µ = 1 λ. 7/ 13

The new $ 64,000 Question What is µ? To answer this question,we obtain a number of Gateway computers and run them until they break down and record these results. We may represent the results schematically by The X i s should be lower case. Once again, we introduce random variables X 1, X 2,..., X n, after we have decided how many computers we are going to look at etc, but before we actually test the computers. So schematically we have the before picture. Mathematically testing the n computers amounts to assigning definite definite numerical values (the failure times)x 1, x 2,..., x n to the random variables X 1, X 2,..., X n. Hence, X 1, X 2,..., X n are random variables with the same probability distribution as the underlying random variable X Gateway. 8/ 13

Assuming that our test is correctly designed, X 1, X 2,..., X n will be independent so they are identically distributed independent random variables,this will later be the definition of a random sample. So we say X 1, X 2,..., X n is a random sample from an exponential distribution with parameter. Once again we have a formal mathematical construction. The sample space S of the experiment is now the set of all n-tuples (x 1, x 2,..., x n ) of positive real numbers, the possible break-down times for the n computers, to be tested. S is a probability space (but not discrete). We define the random variables X 1, X 2,..., X n is as functions on S as before: X i (x 1,..., x n ) = x i, 1 i n. 9/ 13

Third Motivating Example Our third motivating example will be the experiment of choosing n random numbers from the interval [0, 1]. We have seen that a good model for choosing a random number from [0, 1] is the uniform distribution U(0, 1). Precisely we make [0, 1] into a probability space by defining a probability measure P on [0, 1] by the formula P(a X b) = b a (assuming 0 a b 1. We then define a random variable (function) X on [0, 1] by defining X to be the identity function I. So we think of evaluating I on an element of [0, 1] as selecting a random number. We may represent the probability space [0, 1], P by 10/ 13

After we choose n random numbers using some procedure for producing random numbers, we obtain n real numbers x 1, x 2,..., x n in [0, 1]. The X i s should be lower case x i s. Before we make the choices we have random variables X 1, X 2,..., X n representing the first, second,..., n-th choice. Schematically we have The sample space S of all possible choices of n random numbers is given by S = {(x 1, x 2,..., x n ) : x i [0, 1]} We have i functions X 1,..., X n defined by X i : S [0, 1] where X i (x 1, x 2,..., x n ) = x i = the i-th choice so X i is a U(0, 1)-random variable. We note that X 1, X 2,..., X n all have U(0, 1)-distribution and are all independent. 11/ 13

The definition of a random sample Hopefully, with the three basic examples we have just discussed we have motivated: Definition A random sample of size n is a sequence X 1, X 2,..., X n of random variables such that (i) X 1, X 2,..., X n are independent AND (ii) X 1, X 2,..., X n all have the same probability distribution i.e. are identically distributed often abbreviated to iid. The probability distribution common to the X i s will be called the underlying distribution - it is the one we are sampling from. Now we have a second fundamental definition. Definition A statistic is a random variable that is a function h(x 1, X 2,..., X n ) of X 1, X 2,..., X n. 12/ 13

Three very important statistics The following statistics will be very important to us. (1) The sample total T 0 defined by T 0 = X 1 + X 2 +... + X n (2) The sample mean X = 1 n T o = X 1 + X 2 +... + X n n (3) The sample variance S 2 = 1 n 1 1 (X i X) i=1 2 13/ 13