Reading Material for Students

Similar documents
Brief Review of Probability

Binomial and Poisson Probability Distributions

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

TUTORIAL 8 SOLUTIONS #

Probability Distributions Columns (a) through (d)

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Chapter 5. Chapter 5 sections

Things to remember when learning probability distributions:

3 Modeling Process Quality

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Solutions to the Spring 2015 CAS Exam ST

Continuous Random Variables

2.3 Analysis of Categorical Data

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

Chapter 1. Sets and probability. 1.3 Probability space

Chapter 5 continued. Chapter 5 sections

STAT 430/510 Probability Lecture 12: Central Limit Theorem and Exponential Distribution

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Institute of Actuaries of India

Introduction to Probability and Statistics Slides 3 Chapter 3

Random Variables Example:

15 Discrete Distributions

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

Known probability distributions

Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY. SECOND YEAR B.Sc. SEMESTER - III

3 Continuous Random Variables

Statistics, Data Analysis, and Simulation SS 2013

Advanced Herd Management Probabilities and distributions

COVENANT UNIVERSITY NIGERIA TUTORIAL KIT OMEGA SEMESTER PROGRAMME: ECONOMICS

FIRST YEAR EXAM Monday May 10, 2010; 9:00 12:00am

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Continuous random variables

Subject CS1 Actuarial Statistics 1 Core Principles

Guidelines for Solving Probability Problems

CHAPTER 6 SOME CONTINUOUS PROBABILITY DISTRIBUTIONS. 6.2 Normal Distribution. 6.1 Continuous Uniform Distribution

Plotting data is one method for selecting a probability distribution. The following

Common probability distributionsi Math 217 Probability and Statistics Prof. D. Joyce, Fall 2014

BINOMIAL DISTRIBUTION

Discrete probability distributions

Institute of Actuaries of India

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Notes on Continuous Random Variables

(Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3)

Basics on Probability. Jingrui He 09/11/2007

Chapter 3 Multiple Regression Complete Example

Stat 5101 Notes: Brand Name Distributions

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

CS 361: Probability & Statistics

Practice Problems Section Problems

Discrete Probability Distribution

Summary of Chapters 7-9

Chapter 12 - Lecture 2 Inferences about regression coefficient

2011 Pearson Education, Inc

First Year Examination Department of Statistics, University of Florida

Continuous Random Variables and Continuous Distributions

Stat 5101 Notes: Brand Name Distributions

Statistics for scientists and engineers

MA6451 PROBABILITY AND RANDOM PROCESSES

Mathematical statistics

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Week 1 Quantitative Analysis of Financial Markets Distributions A

16.400/453J Human Factors Engineering. Design of Experiments II

Chapter 3. Discrete Random Variables and Their Probability Distributions

Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests

Exponential & Gamma Distributions

errors every 1 hour unless he falls asleep, in which case he just reports the total errors

Probability and Probability Distributions. Dr. Mohammed Alahmed

Relationship between probability set function and random variable - 2 -

Hypothesis Testing. ECE 3530 Spring Antonio Paiva

Master s Written Examination - Solution

Mathematical Statistics 1 Math A 6330

S n = x + X 1 + X X n.

Definition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R

Chapter 9 Inferences from Two Samples

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

Lecture 13. Poisson Distribution. Text: A Course in Probability by Weiss 5.5. STAT 225 Introduction to Probability Models February 16, 2014

(Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3)

Solutions to First Midterm Exam, Stat 371, Spring those values: = Or, we can use Rule 6: = 0.63.

RMSC 2001 Introduction to Risk Management

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

STAT 3610: Review of Probability Distributions

Discrete Random Variables

Probability and Distributions

Random variables. DS GA 1002 Probability and Statistics for Data Science.

STAT:5100 (22S:193) Statistical Inference I

ECE 313 Probability with Engineering Applications Fall 2000

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Three hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

, 0 x < 2. a. Find the probability that the text is checked out for more than half an hour but less than an hour. = (1/2)2

Exam 2 Practice Questions, 18.05, Spring 2014

An-Najah National University Faculty of Engineering Industrial Engineering Department. Course : Quantitative Methods (65211)

Econ 325: Introduction to Empirical Economics

CHAPTER 6. 1, if n =1, 2p(1 p), if n =2, n (1 p) n 1 n p + p n 1 (1 p), if n =3, 4, 5,... var(d) = 4var(R) =4np(1 p).

ARCONES MANUAL FOR THE SOA EXAM P/CAS EXAM 1, PROBABILITY, SPRING 2010 EDITION.

Masters Comprehensive Examination Department of Statistics, University of Florida

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Continuous random variables

Lecture 2: Conjugate priors

Transcription:

Reading Material for Students Arnab Adhikari Indian Institute of Management Calcutta, Joka, Kolkata 714, India, arnaba1@email.iimcal.ac.in Indranil Biswas Indian Institute of Management Lucknow, Prabandh Nagar, Lucknow 613, India, indranil@iiml.ac.in Arnab Bisi Johns opkins Carey Business School, 1 International Drive, Baltimore, Maryland 1, abisi1@jhu.edu Probability Distributions Binomial distribution. The Binomial distribution describes the probability of exactly x successes out of N trials; the probability associated with a success in a single trial is given by p and that with a failure is given by 1 p (also designated by q). The expression of the probability mass function (pmf) of this distribution is as follows p(x; N, p) = ( N x )px (1 p) N x, where the variable x and the parameter N are integers, satisfying the conditions x N and N >. The parameter p is a real quantity and p [,1]. The expected value and the variance of a random variable X having binomial distribution can be expressed as follows: and Var X N p (1 p ) X N p. ypergeometric distribution. The hypergeometric distribution describes the experiment where out of total N elements, M possesses a certain attribute [and the remaining (N M) does not]; if we then choose n elements at random without replacement, p(x; n, N, M) gives the probability that exactly x of the selected n elements have come from the group of M elements that possesses the attribute. Let the number of elements with that certain attribute be denoted by X. The probability mass function (pmf) of X with hypergeometric distribution is given by f(x; n, N, M) = (M x )(N M n x ) ( N n ) where x is discrete and its range is given by: x [max(, n N + M), min (n, M)]. The parameters n, N and M are all integers and satisfy the following conditions: 1 n N, N 1 1

and M 1. Let probability of success be represented by M p N the variance of X under hypergeometric distribution can be expressed as follows: X np and X np (1 p) ( N n) Var. ( N 1). Then, the expected value and In real life, when a marketing group is trying to understand their customer base by testing a set of known customers for over-representation of various demographic subgroups, they use hypergeometric test designed based on hypergeometric distribution. Negative Binomial distribution. The negative binomial distribution (also known as Pascal distribution) gives the probability of waiting for exactly x trials until k th success has occurred. Let the number of trials before k th success be denoted by X. ere p and q(= 1 p) designates the probability of a success and a failure in a single trial, respectively. The probability mass function (pmf) of this distribution is given by f(x; k, p) = ( x 1 k 1 )pk (1 p) x k, where the variable x and parameter k are integers and satisfies the following condition: x k >. Now, the expected value and the variance of a random variable X under negative binomial distribution can be expressed as follows: X k ( 1 and Var X p p) k (1 p). p The negative binomial distribution has applications in the insurance industry, where for example the rate at which people have accidents is affected by a random variable like the weather condition. Geometric distribution. The geometric distribution is a special case of the negative binomial distribution discussed above with k = 1. It expresses the probability of waiting for exactly x trials before the occurrence of the first successful event. Let the number of trials before the first success be denoted by X. Then, the probability mass function (pmf) of X with this distribution is given by f(x; p) = p(1 p) x 1, where p denotes the probability of success in each trial. The expected value and the variance of a random variable X under geometric distribution can be expressed as follows:

X ( 1 and Var X p p) (1 p). p In real life, if a NGO wants to know the number of male births before one female birth regarding the study of sex ratio in human population then it can use this kind of distribution. Poisson distribution. The Poisson distribution gives the probability of finding occurrence of exactly x events in a given length of time when the events are independent in nature and happens at a constant rate, given by. The probability mass function (pmf) of this distribution is given by e f ( x ; ), x! where the variable x is a positive integer and the parameter is a real positive quantity. Now, the expected value and the variance of a random variable X under Poisson distribution can be expressed as follows: X and X x Var. When the value of N is very large and p is very small in the binomial distribution described before, then it can be approximated by a Poisson distribution with expected value = Np. Poisson distribution is applied to determine the probability of rare events like birth defects, genetic mutations, car accidents, etc. Uniform distribution. If a continuous random variable X follows the uniform distribution, then its probability density function (pdf) is given by the expression f(x; a, b) = 1 b a for a X b. The expected value and the variance of a random variable X under uniform distribution can be expressed as follows X b a and Var X b a. 1 In oil exploration, the position of the oil-water contact in a potential prospect is often considered to be uniformly distributed. xponential distribution. If a continuous random variable X follows the exponential distribution, then its pdf can be expressed as follows: 3

f(x; θ) = 1 θ e x θ, where θ represents the scale parameter. The expected value and the variance of a random variable X under exponential distribution are given by: X and X Var. In real life, the radioactive or particle decays is considered to follow exponential distribution. Normal distribution. The normal distribution (also called the Gauss distribution) is one of the most important distributions in statistics. The pdf of normal distribution is given by the following expression: f(x; μ, σ ) = 1 1 σ π e (x μ σ ), where μ is the mean or expected value and σ is the variance of the distribution. For μ = and σ = 1, the distribution is called the standard normal distribution. It has widespread applications in natural and social sciences, financial models, etc. Beta distribution. The beta distribution has been applied to model the behavior of random variables limited to intervals of finite length in a wide variety of disciplines. The pdf of beta distribution is given by: f(x; α, β) = 1 B(α,β) xα 1 (1 x) β 1, where the shape parameters α and β are positive real numbers, and the variable x satisfies the condition x 1. B(α, β) designates the beta function and is given by the following expression B(α, β) = Γ(α)Γ(β) Γ(α+β). For α R +, the gamma function Γ(α) is defined by the integral Γ(α) = t α 1 e t dt. When α = β = 1, the beta distribution assumes the form of the uniform distribution between and 1; when α = β = the distribution takes parabolic shape; when α = and β = 1 or vise versa the distribution takes triangular shaped distribution. The expected value and the variance of a random variable X under beta distribution can be expressed as follows: X and X Var. ( 1) 4

Beta distribution is usually applied to determine the time allocation in project management/ control systems, heterogeneity in the probability of IV transmission, etc. Gamma distribution. It is a two-parameter family of continuous probability distributions. xponential distribution is a special case of the gamma distribution. The pdf of gamma distribution can be represented by the following functional form: f(x; k, θ) = xk 1 e x θ, θ k Γ(k) where the shape parameter k and the scale parameter θ are positive real numbers (k R + and θ R + ) and the variable x is also a positive real number (x R + ). The expected value and the variance of a random variable X under gamma distribution are given by: X k and X Var k. Sampling Distribution and Confidence Interval. If we take repeated samples from the same population, samples means x would vary from sample to sample and form a sampling distribution of sample means. It explains the random behavior of a sample mean. The variability of x from can be obtained by determining the variance of x. The variance of the sample mean with a sample of size n is given by:. x n Next, the confidence interval contains the true population parameter. A confidence interval comprise point estimate, i.e., the best estimate of the population parameter from the sample statistic and the margin of error or maximum sampling error (the maximum accepted difference between the true population parameter and a sample estimate of that parameter). The confidence interval where lies can be determined by the following expression: x z x z / / n n The confidence level is denoted by 1 1 %. The margin of error denoted by is given by the following formula:. z / n. 5

From the formula given above, the required minimum sample size can be easily obtained and it is given by: n z ( / ). ypothesis Testing. ypothesis testing is a technique to check with the help of a sample data whether a claim or hypothesis about a population parameter is true or not. In hypothesis testing, the stated conjecture defined as the null hypothesis can be disproved, but it cannot be proved. owever, by disproving the null hypothesis, one can prove that the contrary is true. The contrary of the null hypothesis is termed as the alternative hypothesis. The test statistic represents the value determined using the sample data. A test statistic for testing a hypothesis on population mean is given by the following formula: z x, n where denotes the hypothesized value of the population mean. Following are the null ( ) and alternative ( The Two-Tailed Test. A ) hypotheses for three standard tests on population mean: : A z : x n reject if z z or z z. / / The One-Tailed Test to the Right : A z : x n reject if. z z 6

The One-Tailed Test to the Left : A : x z n reject if. z z Regression Models Simple linear regression ere we present a simple linear regression model to determine the relationship between the dependent variable Y and the independent variable X, captured by the following equation: ( Y X) = α + βx. Then the regression model can be designated as: Y = α + βx + ε, where ε = Y ( Y X) is a random variable or an error term with (ε) = and Var ( ε) = σ. If α and β denote the best estimates of the parameters α and β, respectively, then the estimated linear regression equation of Y on X is: Multiple linear regression Y = α + β X. The effect of independent variables X 1, X and X 3 on the dependent variable Y can be captured by the following equation: ( Y X 1, X, X 3 ) = α + β 1 X 1 + β X + β 3 X 3, where ε = Y ( Y X 1, X, X 3 ) is a random variable or an error term with (ε) = and Var( ε) = σ. If α, β, 1 β, and β 3 denote the best estimates of the parameters α, β 1, β, and β 3, respectively, then the estimated multiple linear regression equation of Y on X 1, X and X 3 is given by: Multicollinearity check Y = α + β X 1 1 + β X + β X 3 3. Often regression model is affected by linear relationship between independent variables termed as multicollinearity. Variance Inflation Factor (VIF) is one of the conventional techniques employed to check whether any multicollinearity exists or not. VIF between two independent variables X 1 and X can be determined by the following expression: 7

VIF X1,X = 1 1 R X 1,X, where R X1,X denotes the co-efficient of determination between X 1 and X. If the value of VIF is greater than 5, then it indicates multicollinearity and the overall regression model gets affected by it. Sources Anderson, D., Sweeney, D., Williams, T., Camm, J., Cochran, J. 11. Statistics for Business & conomics, 11 th ed. Cengage Learning, Mason. Berenson, M., Levine, D., Krehbiel, T. C. 11. Basic business statistics: Concepts and applications. Pearson ducation, New Jersey. Groebner, D. F., Shannon, P. W., Fry, P. C., Smith, K. D. 13. Business statistics: a decision making approach, 9 th ed. Pearson ducation, New Jersey. ildebrand, D. K. and O. Lyman. 1998. Statistical Thinking for Managers, 4 th ed. Duxbury Press, California. Levin, R. I. and D. S. Rubin. 1997. Statistics for Management, 7 th ed. Prentice all International, New Jersey. http://wps.aw.com/wps/media/objects/15/1551/formulas.pdf http://www.nzqa.govt.nz/assets/qualifications-and-standards/qualifications/ncea/ncasubject-resources/mathematics/l3-stats-formulae-13.pdf 8