Introduction and Overview STAT 421, SP Course Instructor

Similar documents
ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables

MATH Notebook 5 Fall 2018/2019

Statistics for Applications. Chapter 1: Introduction 1/43

Math 493 Final Exam December 01

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Notes for Math 324, Part 17

Question Points Score Total: 76

Chapter 1: Revie of Calculus and Probability

Lecture 1: August 28

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Statistical Methods in Particle Physics

Relationship between probability set function and random variable - 2 -

Things to remember when learning probability distributions:

Limiting Distributions

Problem Points S C O R E Total: 120

Analysis of Engineering and Scientific Data. Semester

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

1. If X has density. cx 3 e x ), 0 x < 0, otherwise. Find the value of c that makes f a probability density. f(x) =

Probability Distributions Columns (a) through (d)

Mathematical Statistics 1 Math A 6330

Fundamental Tools - Probability Theory IV

Definition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R

STAT 414: Introduction to Probability Theory

Statistics for Economists. Lectures 3 & 4

1 Basic continuous random variable problems

Chapter (4) Discrete Probability Distributions Examples

FINAL EXAM: 3:30-5:30pm

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

MATH/STAT 3360, Probability

Some Statistics. V. Lindberg. May 16, 2007

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 5 Spring 2006

Find the value of n in order for the player to get an expected return of 9 counters per roll.

STAT Chapter 5 Continuous Distributions

STAT 418: Probability and Stochastic Processes

Probability and Distributions

Problem Points S C O R E Total: 120

Course information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment.

CS 361: Probability & Statistics

DISCRETE VARIABLE PROBLEMS ONLY

Fundamentals of Probability Theory and Mathematical Statistics

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

1 Basic continuous random variable problems

Plotting data is one method for selecting a probability distribution. The following

Limiting Distributions

LECTURE 1. Introduction to Econometrics

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

Random variables, Expectation, Mean and Variance. Slides are adapted from STAT414 course at PennState

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Chapter 2: Discrete Distributions. 2.1 Random Variables of the Discrete Type

Lecture 2: Repetition of probability theory and statistics

Continuous random variables

3 Continuous Random Variables

Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions

Probability Distributions for Continuous Variables. Probability Distributions for Continuous Variables

Slides 8: Statistical Models in Simulation

HW1 (due 10/6/05): (from textbook) 1.2.3, 1.2.9, , , (extra credit) A fashionable country club has 100 members, 30 of whom are

CDA6530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables

CS145: Probability & Computing

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Week 2. Review of Probability, Random Variables and Univariate Distributions

STAT 430/510 Probability Lecture 12: Central Limit Theorem and Exponential Distribution

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) HIGHER CERTIFICATE IN STATISTICS, 1996

ECE 302, Final 3:20-5:20pm Mon. May 1, WTHR 160 or WTHR 172.

p. 4-1 Random Variables

Math 407: Probability Theory 5/10/ Final exam (11am - 1pm)

ACM 116: Lecture 2. Agenda. Independence. Bayes rule. Discrete random variables Bernoulli distribution Binomial distribution

Continuous-Valued Probability Review

Page Max. Possible Points Total 100

IEOR 6711: Stochastic Models I SOLUTIONS to the First Midterm Exam, October 7, 2008

Discrete Distributions

Chapter 3 Single Random Variables and Probability Distributions (Part 1)

Notes on Mathematical Expectations and Classes of Distributions Introduction to Econometric Theory Econ. 770

Statistics for scientists and engineers

Discrete Distributions Chapter 6

Special distributions

3 Modeling Process Quality

Chapter 6 Continuous Probability Distributions

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 1

Class 26: review for final exam 18.05, Spring 2014

Chapter 4 : Discrete Random Variables

IV. The Normal Distribution

Unit 4 Probability. Dr Mahmoud Alhussami

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Notes for Math 324, Part 19

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Discrete probability distributions

II. The Binomial Distribution

Chapter 4. Chapter 4 sections

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

Introduction to Stochastic Processes

TOPIC 12 PROBABILITY SCHEMATIC DIAGRAM

Probability and Statistics Concepts

for valid PSD. PART B (Answer all five units, 5 X 10 = 50 Marks) UNIT I

Physics 403 Probability Distributions II: More Properties of PDFs and PMFs

GOV 2001/ 1002/ Stat E-200 Section 1 Probability Review

Twelfth Problem Assignment

IE 230 Probability & Statistics in Engineering I. Closed book and notes. 120 minutes.

Chapter 3.3 Continuous distributions

Brief Review of Probability

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

Transcription:

Introduction and Overview STAT 421, SP 212 Prof. Prem K. Goel Mon, Wed, Fri 3:3PM 4:48PM Postle Hall 118 Course Instructor Prof. Goel, Prem E mail: goel.1@osu.edu Office: CH 24C (Cockins Hall) Phone: 614 292 811 Office Hours: Tuesday 2: PM 3: PM Wednesday 1:3 PM 2:3 PM Friday 1: AM 11: AM Other times by appointment Course Website: www.stat.osu.edu/~goel/ Relevant course material will be posted on this website by 8:PM day before the class 1

Recitation Instructors and Graders Ms. Siyoen Kil Office: 34C CockinsHall E Mail: kil.3@osu.edu Th 11:3AM 12:18PM Caldwell Lab 277 Th 12:3PM 1:18PM Dreese Lab 369 Office Hours: TBD Ms. Lira Pi Office: 34F Cockins Hall E mail: pi.5@osu.edu Th 1:3AM 11:18AM Dreese Lab 317 Th 12:3PM 1:18PM Univ. Hall 151 Office Hours: Tuesdays, 1: AM 12: Noon Student Information Needed Name: Last, First Signature For matching with attendance sign up sheet Major Math Courses Background 2

Probability and Statistics Why study Statistics Science of Decision Making Under Uncertainty Understanding Variability Explaining Variability o E.g., assigning most likely causes to breakdowns (Challenger Shuttle) Almost every discipline depends on quantitative evidence All of us need to understand and analyze this evidence Probability Formal Language for Statistical Reasoning Basic Rules of Probability Calculus How to assign probabilities to various outcomes (collection of outcomes EVENT) of interest How to interpret the probability of an event You Learned it in Stat 42 ( Critical Prerequisite) Key topics you need to review this week Various Distributions Chapter 5, 6, and Appendices B, C Sampling Variability and Sampling Distributions Chapter 8 Why Study Uncertainty Almost nothing in nature is deterministic Variability in Outcomes when an experiment is performed repeatedly Unit to unit Person to person DNA to DNA Natural objects Games of Chance Deterministic problem but measurement errors may lead to variability in repeated outcomes 3

Uncertainty and Variation: Simple Examples Does each insured client have an accident during a given year? If you kick a football several times, will the distance the ball traveled be the same? We can t necessarily predict who will have an accident in a given year or predict the distance the ball will travel for any particular kick. However, lots of the time, data will follow a general pattern. From this pattern, we can get an idea of the expected (most likely) number of claims or the distance the football will travel. Applications of Statistics and Probability Gambling what are the odds? Medicine & Biology drug development/ genomics, Manufacturing process/quality control Economics and Politics Predicting unemployment rates/opinion polls Engineering designing/testing products Business advertising/marketing Insurance Actuarial estimates Law DNA matching 4

Probability and Statistics The study of randomness and uncertainty Chances, odds, likelihood, expected, probably, on average,... PROBABILITY Population Sample INFERENTIAL STATISTICS Quick Review: Stat 42 Probability Concepts Text Book: Chapters 5, 6, and 8 5

Discrete Distributions r N r Hypergeometric x n x PX ( x) N n Negative Binomial x 1 x r PX ( x) (1 p) p r 1, x max{, n ( N r)},,min{ n, r} E( X) np, where p r/ N. Var( X ) fpc n p 1 p r, x = r, r+1, (1 ) ( ) r ( ) r EX VX p p 2 p Chapter 5 11 Before the Distribution Distribution Bell Curve N(, 2 ) Continuous distribution, defined on entire real line (allows positive density on negative numbers, even though it may be negligible) Symmetric We may want to study a phenomena that has a skewed distribution? [For example taking all values in (, ) Income and Consumption [Economics, Management] Time until the next hit on a web page [Web Data Mining] Response time to a stimulus [Psychology] Pay off for car insurance policy [Insurance] Time to Event [Insurance] Lifetime of a component of a device [reliability studies] Inter arrival times of events [Transportation, Reliability, Queuing] 6

Gamma, Exponential and 2 Distributions A flexible family of distributions used to model these phenomena The family can represent a large variety of shapes Gamma Distribution: X Gamma(, ) Flexible, two parameter family of distributions. Special Cases: Exponential Distribution: X Exp( ) Chi Squared Distribution: X 2 Section 6.3 Gamma Distribution Def: A continuous RV X is said to have a gamma distribution with parameters > and > if the pdf of X is 7

Exponential Distribution Def: A continuous RV X follows an exponential distribution with parameter > if X has pdf Chi Squared Distribution The Chi Squared distribution : A special case of the Gamma distribution when = / 2 (for some positive integer ) and = 2. The parameter is called the degrees of freedom. Used for statistical inference on sampling distribution of Sample Variance of observations from a Population 8

The (Gaussian) Distribution Notation: ~ short hand for (is distributed as) Probability density function (pdf): Section 6.5 17 The Standard Distribution Special case: = and 2 = 1 (Standardized Scores) cdf: Can not be expressed in closed functional form! USE Tables or Numerical Integration to evaluate this Integral 18 9

Standard Distribution CDF Table III provides values of area under the curve from to z, for z = (.1) 3.9, and z= 4., 5., 6. For, negative values of z, use the symmetry of the standard normal pdf. 19 Transforming (Standardizing) RV s Idea: Transform a N(, 2 ) RV into a N(, 1) RV... then: 2 1

Using the Transformation Say and we want to compute Idea: Transform to the standard normal distribution: 21 Sampling Distributions Select a random sample of n observations from a specified population Independent & identically distributed (i.i.d.) random vars These arise in a variety of experimental situations. Sampling distribution of a statistic: Given repeated samples, the value of the statistic (e.g. sample mean) varies from sample to sample. The Sampling Distribution describes the pattern of variability in the values over all possible samples of a fixed size. Chapter 8, Section 8.1 11

Sampling Distributions 2 If we know the underlying population distribution f, we can obtain the EXACT SAMPLING distribution of the sum (or, average) Sometimes, it is not easy to find this distribution analytically. Approximations via Monte Carlo simulations Large sample limiting distribution Sampling Distribution of the Sample Mean We will focus on the probability distribution of two situations: 1. X i N(, 2 ) No approximation needed. X n 1 n n i 1 2. X i s have an arbitrary (but same) underlying probability distribution For large sample size, use Limit distribution : n In statistical inference problems, f is usually unknown. For large n, we are able to approximate this sampling distribution without knowing f Modes of convergence of Sample Mean X i in 24 12

Situation 1: Sample from a population In this case: 25 Convergence for Large Sample Size Example: Toss a coin a large number of, say n, times. How does one formalize the phrase Proportion of heads is ~ ½? Suppose that is a sequence of independent Bernoulli trials, each with probability of success p. Then E(X i ) = p. The proportion of successes in n trials = This sequence is random, in that over repeated trials, the sequence will takes different values. However, as n gets large, it does converge in some well defined sense. 13

Modes of Convergence Law of Large Numbers (LLN) Convergence in Probability Used in studying Consistency of estimators Convergence in Distribution Central Limit Theorem and Poisson Approximations for Binomial distribution Law of Large Numbers Theorem: Let be a sequence of independent random variables, with Let Then for any We say that the sequence of random variables converges in probability to the number or Proof: Uses Chebyshev s Inequality Repeated Measurements (Random Sampling): More generally, for any function Y = g(x), such that mean and variance of Y exist, sample mean of Y i = g(x i ), i=1,2,,n, converges in probability to E[Y]. 14

Why do we need to look beyond LLN? LLN implies that one can estimate population mean and second moment quite well from a sample of n independent draws, if n is reasonably large. The chance that error in sample mean beyond any specified threshold get smaller and smaller as n increases. However, it does not allow us to assess the size of error, when one uses sample average to estimate the population mean. If we want to approximate probability of a given size of error, we need to zoom into these errors on a more and more micro scale c n that goes to zero as n gets large. Thus we consider a normalized quantity, whose distribution, f n, does not degenerate to zero. Convergence of means of a random sample of n observations: Central Limit Theorem Second Case: We want to understand how the error { } fluctuates around zero over repeated sampling. When n is sufficiently large, we say In what sense? Consider the standardized random variable Note that E[Z n ] =, Var[Z n ] = 1. 3 15

Insight on CLT The CLT asserts that the cdf of Z n converges to the cdf of a standard normal random variable Z for all real numbers, so we can approximate The proof in the book is under more restrictive conditions assuming that X i s have same distribution, and its mgf exists. But this is not necessary. CLT holds under very general conditions. Illustrations 32 16

5 4 3 2 1 6 5 4 3 2 1.6.64.68 Histogram of C15 C15.72 C12.76 1.8.84 Mean.736 StDev.4412 N 5 Mean.751 StDev.434 N 5 3 25 2 15 1 5 9 8 7 6 5 4 3 2 1..63.2.66.4.69 Histogram of C8.6 C8.72 C13.8.75 1. 1.2.78 1.4.81 Mean.74 StDev.3172 N 5 Mean.7481 StDev.261 N 5 16 14 12 1 8 6 4 2 8 7 6 5 4 3 2 1.3.72.4.5.735.6.7 C9.75 C14.8.765.9.78 1. Mean.7476 StDev.1321 N 5 Mean.7514 StDev.148 N 5 1 8 6 4 2 5 4 3 2 1.45.525.6.675.75.825.9.975 C1.72.73.74.75 C17.76.77.78 Mean.7515 StDev.8731 N 5 Mean.754 StDev.1126 N 5 8 7 6 5 4 3 2 1 5 4 3 2 1.6.66.72.78 C11.84.9.732.738.744.75.756.762.768.774 C19 Mean.7482 StDev.6112 N 5 Mean.752 StDev.8428 N 5 Summary Statistics for 5 Repeated Samples of Averages of Various Sample Sizes Underlying Population: Bernoulli (p=.75) n=1 n=2 n=1 n=25 n=5 Histogram of C9 Histogram of C1 Histogram of C11 n=1 n=25 n=1 n=16 n=25 Histogram of C12 Histogram of C13 Histogram of C14 Histogram of C17 Histogram of C19 33 Sum of n Throws for a six sided fair die 34 17

Sample Means from Gamma Distributed Random Variable X ~ Gamma 75,.5) The population pdf is sketched below: 35 Sampling distribution of the average 36 18

Central Limit Theorem for Totals Define the sum of these n random variables: When n is sufficiently large, 37 19