Review of Discrete Probability (contd.)

Size: px
Start display at page:

Download "Review of Discrete Probability (contd.)"

Transcription

1 Stat 504, Lecture 2 1 Review of Discrete Probability (contd.) Overview of probability and inference Probability Data generating process Observed data Inference The basic problem we study in probability: Given a data generating process, what are the properties of the outcomes? The basic problem of statistical inference: Given the outcomes, what can we say about the process that generated the data? (ref: Wasserman(2004))

2 Stat 504, Lecture 2 2 Bernoulli distribution The most basic of all discrete random variables is the Bernoulli. X is said to have a Bernoulli distribution if X = 1 occurs with probability p and X = 0 occurs with probability 1 p, 8 >< p x =1 f(x) = 1 p x =0 >: 0 otherwise. Another common way to write it is f(x) =p x (1 p) 1 x for x =0, 1. Suppose an experiment has only two possible outcomes, success and failure, and let p be the probability of a success. If we let X denote the number of successes (either zero or one), then X will be Bernoulli. The mean of a Bernoulli is E(X) =1(p)+0(1 p) =p, and the variance of a Bernoulli is V (X) = E(X 2 ) ( E(X)) 2 = 1 2 p +0 2 (1 p) p 2 = p(1 p).

3 Stat 504, Lecture 2 3 Binomial distribution Suppose that X 1,X 2,...,X n are independent and identically distributed (iid) Bernoulli random variables, each having the distribution f(x i )=p x i (1 p) 1 x i for x i =0, 1. Let X = P n i=1 X i.thenx is said to have a binomial distribution with parameters n and p, X Bin(n, p). Suppose that an experiment consists of n repeated Bernoulli-type trials, each trial resulting in a success with probability p and a failure with probability 1 p. If all the trials are independent that is, if the probability of success on any trial is unaffected by the outcome of any other trial then the total number of successes in the experiment will have a binomial distribution. The binomial distribution can be written as f(x) = n! x!(n x)! px (1 p) n x for x =0, 1, 2,...,n.

4 Stat 504, Lecture 2 4 The Bernoulli distribution is a special case of the binomial with n =1. Thatis,X Bin(1,p)means that X has a Bernoulli distribution with success probability p. One can show algebraically that if X Bin(n, p) then E(X) =np and V (X) =np(1 p). An easier way to arrive at these results is to note that X = X 1 + X X n where X 1,X 2,...,X n are iid Bernoulli random variables. Then, by the additive properties of mean and variance, E(X) = E(X 1 )+E(X 2 )+ + E(X n ) = np and V (X) = V (X 1 )+V(X 2 )+ + V (X n ) = np(1 p).

5 Stat 504, Lecture 2 5 Note that X will not have a binomial distribution if the probability of success p is not constant from trial to trial, or if the trials are not entirely independent (i.e. a success or failure on one trial alters the probability of success on another trial). If X 1 Bin(n 1,p)andX 2 Bin(n 2,p), then X 1 + X 2 Bin(n 1 + n 2,p) As n increases, for fixed p, the binomial distribution approaches normal distribution N(np, np(1 p)).

6 Stat 504, Lecture 2 6 Poisson distribution The Poisson is a limiting case of the binomial. Suppose that X Bin(n, p) andletn and p 0 in such a way that np λ where λ is a constant. Then, in the limit, X will have a Poisson distribution with parameter λ. The notation X P (λ) will mean X has a Poisson distribution with parameter λ. The Poisson probability distribution is f(x) = λx e λ x =0, 1, 2,... x! The mean and the variance of the Poisson are both λ; that is, E(X) = V (X) = λ. Note that the parameter λ must always be positive; negative values are not allowed. Because the Poisson is limit of the Bin(n, p), it is useful as an approximation to the binomial when n is large and p is small. That is, if n is large and p is small, then n! x!(n x)! px (1 p) n x λx e λ x! where λ = np. The right-hand side of (1) is typically less tedious and easier to calculate than the left-hand side. (1)

7 Stat 504, Lecture 2 7 Aside from its use as an approximation to the binomial, the Poisson distribution is also an important probability model in its own right. It is often used to model discrete events occurring in time or in space. For example, suppose that X is the number of telephone calls arriving at a switchboard in one hour. Suppose that in the long run, the average number of telephone calls per hour is λ. Thenitmaybe reasonable to assume X P (λ). For the Poisson model to hold, however, the average arrival rate λ must be fairly constant over time; that is, there should be no systematic or predictable changes in the arrival rate. Moreover, the arrivals should be independent of one another; that is, the arrival of one call should not make the arrival of another call more or less likely.

8 Stat 504, Lecture 2 8 Likelihood function One of the most fundamental concepts of modern statistics is that of likelihood. In each of the discrete random variables we have considered thus far, the distribution depends on one or more parameters that are, in most statistical applications, unknown. In the Poisson distribution, the parameter is λ. In the binomial, the parameter of interest is p (since n is typically fixed and known). Likelihood is a tool for summarizing the data s evidence about parameters. Let us denote the unknown parameter(s) of a distribution generically by θ. Since the probability distribution depends on θ, we can make this dependence explicit by writing f(x) as f(x ; θ). For example, in the Bernoulli distribution the parameter is θ = p, and the distribution is f(x ; p) =p x (1 p) 1 x x =0, 1. (2) Once a value of X has been observed, we can plug this observed value x into f(x ; p) andobtaina function of p only. For example, if we observe X =1, then plugging x = 1 into (2) gives the function p. If we observe X = 0, the function becomes 1 p.

9 Stat 504, Lecture 2 9 Whatever function of the parameter results when we plug the observed data x into f(x ; θ) is called the likelihood function. We will write the likelihood function as L(θ ; x) = Q n i=1 f(x i; θ) or sometimes just L(θ). Algebraically, the likelihood L(θ ; x) is just the same as the distribution f(x ; θ), but its meaning is quite different because it is regarded as a function of θ rather than a function of x. Consequently, a graph of the likelihood usually looks very different from a graph of the probability distribution. For example, suppose that X has a Bernoulli distribution with unknown parameter p. We can graph the probability distribution for any fixed value of p. For example, if p =.5 wegetthis: f(x).50 x 0 1

10 Stat 504, Lecture 2 10 Now suppose that we observe a value of X, say X = 1. Plugging x = 1 into the distribution p x (1 p) 1 x gives the likelihood function L(p ; x) =p, which looks like this: 1.0 L(p;x) 0 1 p For discrete random variables, a graph of the probability distribution f(x ; θ) has spikes at specific values of x, whereas a graph of the likelihood L(θ ; x) is a continuous curve (e.g. a line) over the parameter space, the domain of possible values for θ. L(θ ; x) summarizes the evidence about θ contained in the event X = x. L(θ ; x) is high for values of θ that make X = x likely, and small for values of θ that make X = x unlikely. In the Bernoulli example, observing X = 1 gives some (albeit weak) evidence that p is nearer to 1 than to 0, so the likelihood for x =1risesasp moves from 0 to 1.

11 Stat 504, Lecture 2 11 Maximum-likelihood (ML) estimation Suppose that an experiment consists of n =5 independent Bernoulli trials, each having probability of success p. LetX be the total number of successes in the trials, so that X Bin(5,p). If the outcome is X = 3, the likelihood is L(p ; x) = = n! x!(n x)! px (1 p) n x 5! 3! (5 3)! p3 (1 p) 5 3 p 3 (1 p) 2 where the constant at the beginning is ignored. A graph of L(p; x) =p 3 (1 p) 2 over the unit interval p (0, 1) looks like this:

12 Stat 504, Lecture 2 12 It s interesting that this function reaches its maximum value at p =.6. An intelligent person would have said that if we observe 3 successes in 5 trials, a reasonable estimate of the long-run proportion of successes p would be 3/5 =.6. This example suggests that it may be reasonable to estimate an unknown parameter θ by the value for which the likelihood function L(θ ; x) is largest. This approach is called maximum-likelihood (ML) estimation. We will denote the value of θ that maximizes the likelihood function by ˆθ, read theta hat. ˆθ is called the maximum-likelihood estimate (MLE) of θ.

13 Stat 504, Lecture 2 13 Finding MLE s usually involves techniques of differential calculus. To maximize L(θ ; x) with respect to θ: first calculate the derivative of L(θ ; x) with respect to θ, set the derivative equal to zero, and solve the resulting equation for θ. These computations can often be simplified by maximizing the loglikelihood function, l(θ ; x) =logl(θ ; x), where log means natural log (logarithm to the base e). Because the natural log is an increasing function, maximizing the loglikelihood is the same as maximizing the likelihood. The loglikelihood often has a much simpler form than the likelihood and is usually easier to differentiate.

14 Stat 504, Lecture 2 14 In Stat 504 you will not be asked to derive MLE s by yourself. In most of the probability models that we will use later in the course (logistic regression, loglinear models, etc.) no explicit formulas for MLE s are available, and we will have to rely on computer packages to calculate the MLE s for us. For the simple probability models we have seen thus far, however, explicit formulas for MLE s are available and are given next.

15 Stat 504, Lecture 2 15 ML for Bernoulli trials. If our experiment is a single Bernoulli trial and we observe X = 1 (success) then the likelihood function is L(p ; x) = p. This function reaches its maximum at ˆp =1. IfweobserveX =0 (failure) then the likelihood is L(p ; x) =1 p, which reaches its maximum at ˆp = 0. Of course, it is somewhat silly for us to try to make formal inferences about θ on the basis of a single Bernoulli trial; usually multiple trials are available. Suppose that X =(X 1,X 2,...,X n )representsthe outcomes of n independent Bernoulli trials, each with success probability p. The likelihood for p based on X is defined as the joint probability distribution of X 1,X 2,...,X n.sincex 1,X 2,...,X n are iid random variables, the joint distribution is L(p ; x) = f(x ; p) ny = f(x i ; p) = i=1 ny p x i (1 p) 1 x i i=1 = p P n i=1 x i (1 p) n P n i=1 x i.

16 Stat 504, Lecture 2 16 Differentiating the log of L(p ; x) withrespecttop and setting the derivative to zero shows that this function achieves a maximum at ˆp = P n i=1 x i/n. Since P n i=1 x i is the total number of successes observed in the n trials, ˆp is the observed proportion of successes in the n trials. We often call ˆp the sample proportion to distinguish it from p, the true or population proportion. For repeated Bernoulli trials, the MLE ˆp is the sample proportion of successes.

17 Stat 504, Lecture 2 17 ML for Binomial. Suppose that X is an observation from a binomial distribution, X Bin(n, p), where n is known and p is to be estimated. The likelihood function is L(p ; x) = n! x!(n x)! px (1 p) n x, which, except for the factor n!/(x!(n x)!), is identical to the likelihood from n independent Bernoulli trials with x = P n i=1 x i. But since the likelihood function is regarded as a function only of the parameter p, thefactorn!/(x!(n x)!) is a fixed constant and does not affect the MLE. Thus the MLE is again ˆp = x/n, the sample proportion of successes.

18 Stat 504, Lecture 2 18 The fact that the MLE based on n independent Bernoulli random variables and the MLE based on a single binomial random variable are the same is not surprising, since the binomial is the result of n independent Bernoulli trials anyway. In general, whenever we have repeated, independent Bernoulli trials with the same probability of success p for each trial, the MLE will always be the sample proportion of successes. This is true regardless of whether we know the outcomes of the individual trials X 1,X 2,...,X n, or just the total number of successes for all trials X = P n i=1 X i.

19 Stat 504, Lecture 2 19 Suppose now that we have a sample of iid binomial random variables. For example, suppose that X 1,X 2,...,X 10 are an iid sample from a binomial distribution with n = 5 and p unknown. Since each X i is actually the total number of successes in 5 independent Bernoulli trials, and since the X i s are independent of one another, their sum X = P 10 i=1 X i is actually the total number of successes in 50 independent Bernoulli trials. Thus X Bin(50,p) and the MLE is ˆp = x/n, the observed proportion of successes across all 50 trials. Whenever we have independent binomial random variables with a common p, we can always add them together to get a single binomial random variable. Adding the binomial random variables together produces no loss of information about p if the model is true. But collapsing the data in this way may limit our ability to diagnose model failure, i.e. to check whether the binomial model is really appropriate.

20 Stat 504, Lecture 2 20 ML for Poisson. Suppose that X =(X 1,X 2,...,X n ) are iid observations from a Poisson distribution with unknown parameter λ. The likelihood function is L(λ ; x) = = ny f(x i ; λ) i=1 ny i=1 λ x i e λ x i! = λpn i=1 x i e nλ x 1! x 2! x n! By differentiating the log of this function with respect to λ, one can show that the maximum is achieved at ˆλ = P n i=1 x i/n. Thus, for a Poisson sample, the MLE for λ isjustthesamplemean. Next: What happens to the loglikelihood as n gets large

Loglikelihood and Confidence Intervals

Loglikelihood and Confidence Intervals Stat 504, Lecture 2 1 Loglikelihood and Confidence Intervals The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). For a variety of reasons,

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Mathematical statistics

Mathematical statistics October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation

More information

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

Math 152. Rumbos Fall Solutions to Assignment #12

Math 152. Rumbos Fall Solutions to Assignment #12 Math 52. umbos Fall 2009 Solutions to Assignment #2. Suppose that you observe n iid Bernoulli(p) random variables, denoted by X, X 2,..., X n. Find the LT rejection region for the test of H o : p p o versus

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1 Estimation September 24, 2018 STAT 151 Class 6 Slide 1 Pandemic data Treatment outcome, X, from n = 100 patients in a pandemic: 1 = recovered and 0 = not recovered 1 1 1 0 0 0 1 1 1 0 0 1 0 1 0 0 1 1 1

More information

Northwestern University Department of Electrical Engineering and Computer Science

Northwestern University Department of Electrical Engineering and Computer Science Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability

More information

Mathematical statistics

Mathematical statistics October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:

More information

MATH4427 Notebook 2 Fall Semester 2017/2018

MATH4427 Notebook 2 Fall Semester 2017/2018 MATH4427 Notebook 2 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments STAT 135 Lab 3 Asymptotic MLE and the Method of Moments Rebecca Barter February 9, 2015 Maximum likelihood estimation (a reminder) Maximum likelihood estimation Suppose that we have a sample, X 1, X 2,...,

More information

Chapter 2. Discrete Distributions

Chapter 2. Discrete Distributions Chapter. Discrete Distributions Objectives ˆ Basic Concepts & Epectations ˆ Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric Distributions ˆ Introduction to the Maimum Likelihood Estimation

More information

Stat Lecture 20. Last class we introduced the covariance and correlation between two jointly distributed random variables.

Stat Lecture 20. Last class we introduced the covariance and correlation between two jointly distributed random variables. Stat 260 - Lecture 20 Recap of Last Class Last class we introduced the covariance and correlation between two jointly distributed random variables. Today: We will introduce the idea of a statistic and

More information

Guidelines for Solving Probability Problems

Guidelines for Solving Probability Problems Guidelines for Solving Probability Problems CS 1538: Introduction to Simulation 1 Steps for Problem Solving Suggested steps for approaching a problem: 1. Identify the distribution What distribution does

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

SUFFICIENT STATISTICS

SUFFICIENT STATISTICS SUFFICIENT STATISTICS. Introduction Let X (X,..., X n ) be a random sample from f θ, where θ Θ is unknown. We are interested using X to estimate θ. In the simple case where X i Bern(p), we found that the

More information

Fundamental Tools - Probability Theory IV

Fundamental Tools - Probability Theory IV Fundamental Tools - Probability Theory IV MSc Financial Mathematics The University of Warwick October 1, 2015 MSc Financial Mathematics Fundamental Tools - Probability Theory IV 1 / 14 Model-independent

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

HT Introduction. P(X i = x i ) = e λ λ x i

HT Introduction. P(X i = x i ) = e λ λ x i MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework

More information

Lecture 13. Poisson Distribution. Text: A Course in Probability by Weiss 5.5. STAT 225 Introduction to Probability Models February 16, 2014

Lecture 13. Poisson Distribution. Text: A Course in Probability by Weiss 5.5. STAT 225 Introduction to Probability Models February 16, 2014 Lecture 13 Text: A Course in Probability by Weiss 5.5 STAT 225 Introduction to Probability Models February 16, 2014 Whitney Huang Purdue University 13.1 Agenda 1 2 3 13.2 Review So far, we have seen discrete

More information

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm Midterm Examination STA 215: Statistical Inference Due Wednesday, 2006 Mar 8, 1:15 pm This is an open-book take-home examination. You may work on it during any consecutive 24-hour period you like; please

More information

SOLUTION FOR HOMEWORK 8, STAT 4372

SOLUTION FOR HOMEWORK 8, STAT 4372 SOLUTION FOR HOMEWORK 8, STAT 4372 Welcome to your 8th homework. Here you have an opportunity to solve classical estimation problems which are the must to solve on the exam due to their simplicity. 1.

More information

1 Review of Probability and Distributions

1 Review of Probability and Distributions Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote

More information

Discrete Distributions

Discrete Distributions Discrete Distributions STA 281 Fall 2011 1 Introduction Previously we defined a random variable to be an experiment with numerical outcomes. Often different random variables are related in that they have

More information

COMP2610/COMP Information Theory

COMP2610/COMP Information Theory COMP2610/COMP6261 - Information Theory Lecture 9: Probabilistic Inequalities Mark Reid and Aditya Menon Research School of Computer Science The Australian National University August 19th, 2014 Mark Reid

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

CS281A/Stat241A Lecture 17

CS281A/Stat241A Lecture 17 CS281A/Stat241A Lecture 17 p. 1/4 CS281A/Stat241A Lecture 17 Factor Analysis and State Space Models Peter Bartlett CS281A/Stat241A Lecture 17 p. 2/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian

More information

Topic 10: Hypothesis Testing

Topic 10: Hypothesis Testing Topic 10: Hypothesis Testing Course 003, 2016 Page 0 The Problem of Hypothesis Testing A statistical hypothesis is an assertion or conjecture about the probability distribution of one or more random variables.

More information

Lecture 2: Discrete Probability Distributions

Lecture 2: Discrete Probability Distributions Lecture 2: Discrete Probability Distributions IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge February 1st, 2011 Rasmussen (CUED) Lecture

More information

From Model to Log Likelihood

From Model to Log Likelihood From Model to Log Likelihood Stephen Pettigrew February 18, 2015 Stephen Pettigrew From Model to Log Likelihood February 18, 2015 1 / 38 Outline 1 Big picture 2 Defining our model 3 Probability statements

More information

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method Rebecca Barter February 2, 2015 Confidence Intervals Confidence intervals What is a confidence interval? A confidence interval is calculated

More information

Review. December 4 th, Review

Review. December 4 th, Review December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

ML Testing (Likelihood Ratio Testing) for non-gaussian models

ML Testing (Likelihood Ratio Testing) for non-gaussian models ML Testing (Likelihood Ratio Testing) for non-gaussian models Surya Tokdar ML test in a slightly different form Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) = {θ : l x (θ) max θ Θ l

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks Outline 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks Likelihood A common and fruitful approach to statistics is to assume

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Statistics 135 Fall 2007 Midterm Exam

Statistics 135 Fall 2007 Midterm Exam Name: Student ID Number: Statistics 135 Fall 007 Midterm Exam Ignore the finite population correction in all relevant problems. The exam is closed book, but some possibly useful facts about probability

More information

Chapters 3.2 Discrete distributions

Chapters 3.2 Discrete distributions Chapters 3.2 Discrete distributions In this section we study several discrete distributions and their properties. Here are a few, classified by their support S X. There are of course many, many more. For

More information

i=1 h n (ˆθ n ) = 0. (2)

i=1 h n (ˆθ n ) = 0. (2) Stat 8112 Lecture Notes Unbiased Estimating Equations Charles J. Geyer April 29, 2012 1 Introduction In this handout we generalize the notion of maximum likelihood estimation to solution of unbiased estimating

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Lecture 8 : The Geometric Distribution

Lecture 8 : The Geometric Distribution 0/ 24 The geometric distribution is a special case of negative binomial, it is the case r = 1. It is so important we give it special treatment. Motivating example Suppose a couple decides to have children

More information

Part 3: Parametric Models

Part 3: Parametric Models Part 3: Parametric Models Matthew Sperrin and Juhyun Park August 19, 2008 1 Introduction There are three main objectives to this section: 1. To introduce the concepts of probability and random variables.

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017 1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function

More information

Lecture 6. Probability events. Definition 1. The sample space, S, of a. probability experiment is the collection of all

Lecture 6. Probability events. Definition 1. The sample space, S, of a. probability experiment is the collection of all Lecture 6 1 Lecture 6 Probability events Definition 1. The sample space, S, of a probability experiment is the collection of all possible outcomes of an experiment. One such outcome is called a simple

More information

Link lecture - Lagrange Multipliers

Link lecture - Lagrange Multipliers Link lecture - Lagrange Multipliers Lagrange multipliers provide a method for finding a stationary point of a function, say f(x, y) when the variables are subject to constraints, say of the form g(x, y)

More information

success and failure independent from one trial to the next?

success and failure independent from one trial to the next? , section 8.4 The Binomial Distribution Notes by Tim Pilachowski Definition of Bernoulli trials which make up a binomial experiment: The number of trials in an experiment is fixed. There are exactly two

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

IE 303 Discrete-Event Simulation

IE 303 Discrete-Event Simulation IE 303 Discrete-Event Simulation 1 L E C T U R E 5 : P R O B A B I L I T Y R E V I E W Review of the Last Lecture Random Variables Probability Density (Mass) Functions Cumulative Density Function Discrete

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

ECO220Y Continuous Probability Distributions: Uniform and Triangle Readings: Chapter 9, sections

ECO220Y Continuous Probability Distributions: Uniform and Triangle Readings: Chapter 9, sections ECO220Y Continuous Probability Distributions: Uniform and Triangle Readings: Chapter 9, sections 9.8-9.9 Fall 2011 Lecture 8 Part 1 (Fall 2011) Probability Distributions Lecture 8 Part 1 1 / 19 Probability

More information

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3) STAT/MATH 395 A - PROBABILITY II UW Winter Quarter 07 Néhémy Lim Moment functions Moments of a random variable Definition.. Let X be a rrv on probability space (Ω, A, P). For a given r N, E[X r ], if it

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models David Rosenberg New York University April 12, 2015 David Rosenberg (New York University) DS-GA 1003 April 12, 2015 1 / 20 Conditional Gaussian Regression Gaussian Regression Input

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

Data Analysis and Uncertainty Part 2: Estimation

Data Analysis and Uncertainty Part 2: Estimation Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable

More information

Theory of Statistics.

Theory of Statistics. Theory of Statistics. Homework V February 5, 00. MT 8.7.c When σ is known, ˆµ = X is an unbiased estimator for µ. If you can show that its variance attains the Cramer-Rao lower bound, then no other unbiased

More information

TUTORIAL 8 SOLUTIONS #

TUTORIAL 8 SOLUTIONS # TUTORIAL 8 SOLUTIONS #9.11.21 Suppose that a single observation X is taken from a uniform density on [0,θ], and consider testing H 0 : θ = 1 versus H 1 : θ =2. (a) Find a test that has significance level

More information

Pump failure data. Pump Failures Time

Pump failure data. Pump Failures Time Outline 1. Poisson distribution 2. Tests of hypothesis for a single Poisson mean 3. Comparing multiple Poisson means 4. Likelihood equivalence with exponential model Pump failure data Pump 1 2 3 4 5 Failures

More information

One-Way Tables and Goodness of Fit

One-Way Tables and Goodness of Fit Stat 504, Lecture 5 1 One-Way Tables and Goodness of Fit Key concepts: One-way Frequency Table Pearson goodness-of-fit statistic Deviance statistic Pearson residuals Objectives: Learn how to compute the

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Quick review on Discrete Random Variables

Quick review on Discrete Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Quarter 2017 Néhémy Lim Quick review on Discrete Random Variables Notations. Z = {..., 2, 1, 0, 1, 2,...}, set of all integers; N = {0, 1, 2,...}, set of natural

More information

Poisson Processes and Poisson Distributions. Poisson Process - Deals with the number of occurrences per interval.

Poisson Processes and Poisson Distributions. Poisson Process - Deals with the number of occurrences per interval. Poisson Processes and Poisson Distributions Poisson Process - Deals with the number of occurrences per interval. Eamples Number of phone calls per minute Number of cars arriving at a toll both per hour

More information

Likelihoods. P (Y = y) = f(y). For example, suppose Y has a geometric distribution on 1, 2,... with parameter p. Then the pmf is

Likelihoods. P (Y = y) = f(y). For example, suppose Y has a geometric distribution on 1, 2,... with parameter p. Then the pmf is Likelihoods The distribution of a random variable Y with a discrete sample space (e.g. a finite sample space or the integers) can be characterized by its probability mass function (pmf): P (Y = y) = f(y).

More information

STAT 430/510 Probability Lecture 12: Central Limit Theorem and Exponential Distribution

STAT 430/510 Probability Lecture 12: Central Limit Theorem and Exponential Distribution STAT 430/510 Probability Lecture 12: Central Limit Theorem and Exponential Distribution Pengyuan (Penelope) Wang June 15, 2011 Review Discussed Uniform Distribution and Normal Distribution Normal Approximation

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Hypothesis testing: theory and methods

Hypothesis testing: theory and methods Statistical Methods Warsaw School of Economics November 3, 2017 Statistical hypothesis is the name of any conjecture about unknown parameters of a population distribution. The hypothesis should be verifiable

More information

Copyright c 2006 Jason Underdown Some rights reserved. choose notation. n distinct items divided into r distinct groups.

Copyright c 2006 Jason Underdown Some rights reserved. choose notation. n distinct items divided into r distinct groups. Copyright & License Copyright c 2006 Jason Underdown Some rights reserved. choose notation binomial theorem n distinct items divided into r distinct groups Axioms Proposition axioms of probability probability

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

2 Random Variable Generation

2 Random Variable Generation 2 Random Variable Generation Most Monte Carlo computations require, as a starting point, a sequence of i.i.d. random variables with given marginal distribution. We describe here some of the basic methods

More information

Topic 10: Hypothesis Testing

Topic 10: Hypothesis Testing Topic 10: Hypothesis Testing Course 003, 2017 Page 0 The Problem of Hypothesis Testing A statistical hypothesis is an assertion or conjecture about the probability distribution of one or more random variables.

More information

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14 Math 325 Intro. Probability & Statistics Summer Homework 5: Due 7/3/. Let X and Y be continuous random variables with joint/marginal p.d.f. s f(x, y) 2, x y, f (x) 2( x), x, f 2 (y) 2y, y. Find the conditional

More information

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D. probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 17 Probability and Statistics Prof. D. Joyce, Fall 014 I

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes

More information

PROBABILITY DISTRIBUTION

PROBABILITY DISTRIBUTION PROBABILITY DISTRIBUTION DEFINITION: If S is a sample space with a probability measure and x is a real valued function defined over the elements of S, then x is called a random variable. Types of Random

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information