GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

Similar documents
This paper is not to be removed from the Examination Halls

Topic 3: The Expectation of a Random Variable

Chapter 4. Continuous Random Variables

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Advanced Herd Management Probabilities and distributions

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Analysis of Engineering and Scientific Data. Semester

CSE 312 Final Review: Section AA

Continuous Random Variables and Continuous Distributions

This does not cover everything on the final. Look at the posted practice problems for other topics.

Probability and Distributions

S n = x + X 1 + X X n.

CONTINUOUS RANDOM VARIABLES

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

SDS 321: Introduction to Probability and Statistics

MATH 3510: PROBABILITY AND STATS June 15, 2011 MIDTERM EXAM

Mark Scheme (Results) June 2008

STATISTICS 1 REVISION NOTES

IB Mathematics HL Year 2 Unit 7 (Core Topic 6: Probability and Statistics) Valuable Practice

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

2.3 Analysis of Categorical Data

Counting principles, including permutations and combinations.

Page Max. Possible Points Total 100

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Things to remember when learning probability distributions:

TUTORIAL 8 SOLUTIONS #

Statistics. Statistics

Closed book and notes. 60 minutes. Cover page and four pages of exam. No calculators.

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Sociology 6Z03 Review II

Math Review Sheet, Fall 2008

3 Multiple Discrete Random Variables

Institute of Actuaries of India

ZIMBABWE SCHOOL EXAMINATIONS COUNCIL (ZIMSEC) ADVANCED LEVEL SYLLABUS

CHAPTER 3 Describing Relationships

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

An inferential procedure to use sample data to understand a population Procedures

Hypothesis Tests and Estimation for Population Variances. Copyright 2014 Pearson Education, Inc.

Week 2: Review of probability and statistics

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Solutions to the Spring 2015 CAS Exam ST

Smoking Habits. Moderate Smokers Heavy Smokers Total. Hypertension No Hypertension Total

MAS113 Introduction to Probability and Statistics. Proofs of theorems

EE 345 MIDTERM 2 Fall 2018 (Time: 1 hour 15 minutes) Total of 100 points

Brief Review of Probability

f (1 0.5)/n Z =

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Preliminary Statistics. Lecture 3: Probability Models and Distributions

1 Review of Probability and Distributions

Topic 3: The Expectation of a Random Variable

Evaluating Hypotheses

Copyright c 2006 Jason Underdown Some rights reserved. choose notation. n distinct items divided into r distinct groups.

Q Scheme Marks AOs. Notes. Ignore any extra columns with 0 probability. Otherwise 1 for each. If 4, 5 or 6 missing B0B0.

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

Mark Scheme (Results) Summer 2009

Notes on Continuous Random Variables

ECE 313 Probability with Engineering Applications Fall 2000

Lectures on Statistics. William G. Faris

PROBABILITY DISTRIBUTION

Chapter 23: Inferences About Means

Common Discrete Distributions

Homework 4 Solution, due July 23

Practice Problems Section Problems

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Semester , Example Exam 1

Exercises and Answers to Chapter 1

BINOMIAL DISTRIBUTION

Fundamental Tools - Probability Theory II

ab = c a If the coefficients a,b and c are real then either α and β are real or α and β are complex conjugates

(It's not always good, but we can always make it.) (4) Convert the normal distribution N to the standard normal distribution Z. Specically.

Stochastic Models of Manufacturing Systems

(b). What is an expression for the exact value of P(X = 4)? 2. (a). Suppose that the moment generating function for X is M (t) = 2et +1 3

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Statistics 224 Solution key to EXAM 2 FALL 2007 Friday 11/2/07 Professor Michael Iltis (Lecture 2)

Some Continuous Probability Distributions: Part I. Continuous Uniform distribution Normal Distribution. Exponential Distribution

Statistical distributions: Synopsis

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

Mark Scheme (Results) Summer 2007

Chapter 5. Means and Variances

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

Disjointness and Additivity

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

Basic Probability Reference Sheet

STAT 430/510: Lecture 16

CME 106: Review Probability theory

ECE 302 Division 1 MWF 10:30-11:20 (Prof. Pollak) Final Exam Solutions, 5/3/2004. Please read the instructions carefully before proceeding.

, 0 x < 2. a. Find the probability that the text is checked out for more than half an hour but less than an hour. = (1/2)2

Reading Material for Students

2. Topic: Series (Mathematical Induction, Method of Difference) (i) Let P n be the statement. Whenn = 1,

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Chapter 24. Comparing Means

REVIEW: Midterm Exam. Spring 2012

Test Problems for Probability Theory ,

Guidelines for Solving Probability Problems

Transcription:

STATISTICS 4 Summary Notes. Geometric and Exponential Distributions GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs P(X = x) = ( p) x p x =,, 3,... where 0 < p < GIVEN IN FORMULA BOOK Conditions/Assumptions : there is a sequence of independent trials only outcomes, success and failure, constant probability, p of success at each trial a For P(X a) or P(X a) use the sum of a GP to evaluate S = - r X~Geo(0.5) Find P(X>5) P(X>5) = P(X=6)+P(X=7)+P(X=8).. = 05 075 5 + 05 075 6 + 05 075 7... GP. a = 0.5 x 0.75 5 r= 0.75 P(X>5) = 0.5 0.75 5 = 0.37-0.75 Mean of Geo(p) = p Variance of Geo(p) = mean x (mean ) MAKE SURE YOU CAN WRITE OUT FULLY THE PROOF FOR THESE!!! EXPONENTIAL continuous Intervals of time between events occurring related to the Poisson if the number of events occurring in a given period of time is Poisson then the time between successive events is exponential -constant probability of an even occurring per unit of time. f(x) = le lx x ³ 0 0 x < 0 l = average time between events GIVEN IN FORMULA BOOK MEAN = E(X) = VARIANCE = l l For P(X b) or P(X b) use the distribution function P(X b) = b ò 0 le lx dx = [ e lx ] b 0 = e lb Can be shown using integration by parts

Important Feature exponential is a memoryless distribution important for conditional probability. The probability that we need to wait more tha0 more seconds for the first event to occur given that it has not happened after waiting 30 seconds, is the same as the probability that we need to wait more tha0 seconds.. Estimation MAKE SURE YOU LEARN THE PROOF for E(S ) = s (Page 0) A statistic used to estimate the value of a parameter of a population is called an estimator The Most Efficient estimator is the one which o is unbiased it s expected value = the parameter it is estimating o has the smallest variance. Consistent Estimator : If U is an unbiased estimator for an unknown parameter θ, then U is a consistent estimator for θ if Var (U) 0 as n, where n is the size of the sample - you may need to use Σr, Σr, Σr 3 - all given in formula booklet Relative Efficiency of Estimator A to Estimator B = = / Var(Estimator A) / Var(Estimator B) A random variable X has mean µ and variance 0 A random variable Y has mean µ and variance 5 a) Given that ax + by is and unbiased estimator of µ, show that a + b=. E(aX + by) = m ae(x) + be(y) = m ma + mb = m a + b = b) The variance of ax + by is denoted by V. Express V in the form pa + qa + r Var(aX + by) = a Var(X) + b Var(Y) = 0a + 5b =0a + 5(a ) =30a -0a + 5 c) Find the values of a and b such that V takes its minimum value. Method Differentiation Method Completing the square dv = 60a 0 30 æ da è ç a ö 3a + 5 ø 60a 0 = 0 a = 3 so b = 3 30 æ è ç a ö 3 ø 30 æ ç ö è 3 ø + 5 a = 3 so b = 3 d) A single observation is taken on each of X and Y. The values observed are 0 and 6 respectively. Use results from c) to estimate µ. m = E(X) + E(Y) = 0 + 3 3 3 3 6 = 5 3

Estimator of a Population Proportion (Binomial) From a binomial population which p, is the proportion of successes (unknown), a random sample of size n is taken. X is the number of successes P s is the proportion of successes in the sample P s = X n P s is an unbiased estimator for p as æ E X ö E(P s ) = ç = E(X) = è n ø n n (np) = p Mean of a binomial E(X) = np Var(P s ) = Var æ è ç X ö n ø = n Var(X) = p( p) (np( p)) = n n Pooled estimators of Population Proportions Size Unbiased estimator Of popn proportion Variance of a binomial Var(X) = np( - p) Proportion Sample I P s Sample II n P s p = P s + n P s + n E æ n P è ç + n P ö s s + n = [E( P s ) + E(n P s )] ø + n = = = p + n [ E(P s ) + n E(P s )] + n ( p + n p) Pooled estimators Mean and Variance needed for Ci and hypothesis testing Size Mean Variance Sample I X s Sample II n X s Given in formula booklet S p Mean m = X + n X + n Variance Using sample variances s = n s + n s + n Using unbiased Estimators of Population Variances Sample Variance (σ n ) on calculator ( )S + (n )S + n Using summary values s = S (x i x ) + S (x j x j ) + n

3. Confidence Intervals Interpretation of a 95% CI different samples of size n lead to different values of the estimator and hence to different 95% confidence Intervals. On average 95% of these intervals will contain the true population value. Difference between means Assumptions o A Normal Distribution is stated or can be assumed o Unknown Population Variance o Small samples are used x x ± t c s + n t c t-tables n- degrees of freedom 95% look up 0.975 s = ( )S + (n )S + n If the confidence interval includes 0 we can say that we are 95% confident that there is no difference between the means of the two populations. Population Variance (or Standard Deviation) - Uses s unbiased estimate of population variance (s n ) - Uses chi-squared c L (lower) c U (upper) so for 95% use 0.5 and 0.975 - n - degrees of freedom (n )s c U < s < (n )s c L Standard Deviation Confidence Interval Work as for variance but square root the final answers Ratio of two normal population Variances - uses s unbiased estimate of population variance (s n ) - uses F-Distribution must get the degrees of freedom in the correct order Sample X Sample size = n x Degrees of freedom v x = n x - Sample Y Sample size = n y Degrees of freedom v y = n y - If looking for 90% Confidence Interval use p = 0.95 (5% at upper and lower but use the upper limit to find the values of F) numerator denominator F = v F vy x F = F v y v x s x sx F s y s y F If the confidence interval includes it is reasonable to conclude that the two population variances are e qual.

4. Hypothesis Testing For each type of test state Null hypothesis H 0 : m X = m Y Alternative Hypothesis H : m x > m y ( tail test) State significance level and distribution Determine critical value/ region sketch graph Calculate the appropriate test statistic Conclude accept or reject H o in favour of H MEANS s is the pooled sample variance Difference between means Two small samples and n have mean values x and x Test statistic t = x x s + n Distribution Use t- tables +n degrees of freedom H : m ¹ m x y H : m < m x y H : m > m x y tailed test tailed test tailed test Assumptions - the two populations are Normal - the two populations have the same variance Remember to divide your rejection region critical value Significance level by or critical region Difference between matches pairs Paired Samples If samples can be paired exactly, the difference between the pairs of values can be tested to see if they form a distribution with zero mean, assumed to be normal As we don t known the population variances of these differences - use the t-distribution Test statistic t = d 0 d is the mean of the differences of the matches pairs s is the unbiased estimate of the variance of the differences of s the matches pairs n Distribution t distribution with n - degrees of freedom ( n= number of pairs used)

VARIANCES (Standard Deviations always work in terms of variance) Tests about a SINGLE population variance tail test tailed test Test Statistic Distribution H 0 : s =s 0 H 0 : s = s 0 Chi-squared H : s > s 0 H : s (n )s s 0 c = s n- degrees of freedom 0 H : s < s 0 Assumption population is approximately normal Comparison of population variances - can be used to check that the variances are roughly the same (one of the assumptions needed to use the t-distribution when comparing means) - uses the ratio of the two population variances compares to - always have the larger variance as the numerator ONE TAILED TESTS (rare in an exam) NUMERATOR n - degrees of freedom H : s > s or H : s > s Rejection Region F > F F = s V Test statistic s F = V Test statistic s TWO TAILED TESTS H : s s F = s s if s > s s or F = s if s > s s DENOMINATOR n - degrees of freedom Rejection Region - as above but remember to divide the significance level by A scientist records lengths of worms in fields A and B Field A (cm).9 9.8 0.5 0.8 9.5.3 Field B (cm).3 3.4 0. 3.6 4. Assuming that these are random samples from normal populations, test at the 5% significance level that the population variances are equal. H 0 : s A = s B H : s A s B F distribution tailed test 0.975 in tables S A = 0.845 n A = 6 Degrees of freedom = 5 S B =.545 n B = 5 Degrees of freedom = 4.545 Test Statistic F = 0.845 = 3.5 F 54 = 7.39 As 3.5 < 7.39 no significant evidence at the 5% level to indicate that the variances are not equal : Accept H 0

5. Goodness of Fit Chi-Squared www.mathsbox.org.uk n X = S i = (O i E i ) E i Expected Frequencies must be 5 Degrees of freedom : if there are k groups (in your X calculation) and p parameters are estimated, then the no. of degrees of freedom is k-p-. The observed X is compared with c one-sided tables. If X is too high, we reject the hypothesis that this is the correct model for the distribution. Formula book contains the functions for the Binomial use n - degrees of freedom if you have estimated p from the data Poisson use n degrees of freedom if you have estimated λ from the data you may need to make the last group k Geometric you may need to make the last group k Uniform also test for independence e.g. if number customers is independent of the day of the week then each day would have the same frequency Normal standardise and use tables to find the probabilities z = x m s n 3 degrees of freedom if mean and variance estimated from data use and at the lower and upper limits to ensure all covered Analysis of the goals scored per match by a football team gave the following results. Goals per match (x) 0 3 4 5 6 7 Matches (f) 4 8 9 8 0 7 3 Test at the 5% level whether the distribution can be modelled by a Poisson distribution. ALWAYS start with a hypothesis H 0 : The distribution is Poisson Significance Level 5% From the data mean λ =.3 P(X = x) = e 3 (3) x 0 3 4 5 6 7 8 Observed 4 8 9 8 0 7 3 0 Expected 0.0 3. 6.5 0.3.7 5.4. 0.7 0. x! Extra group added X = 4.75 compare to c (5%) with 4 degrees of freedom c = 9.49 As X < 9.49 we do not reject H 0 and conclude that the distribution follows a Poisson Distribution having the same mean.