Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks

Similar documents
Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks

Simple Linear Regression. John McGready Johns Hopkins University

Probability measures A probability measure, P, is a real valued function from the collection of possible events so that the following

Probability Pr(A) 0, for any event A. 2. Pr(S) = 1, for the sample space S. 3. If A and B are mutually exclusive, Pr(A or B) = Pr(A) + Pr(B).

Lecture 27. December 13, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

27 Binary Arithmetic: An Application to Programming

Sampling Variability and Confidence Intervals. John McGready Johns Hopkins University

Mathematics. ( : Focus on free Education) (Chapter 16) (Probability) (Class XI) Exercise 16.2

MODULE 2 RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES DISTRIBUTION FUNCTION AND ITS PROPERTIES

SDS 321: Introduction to Probability and Statistics

Lecture 21. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

First Digit Tally Marks Final Count

1 Hypothesis testing for a single mean

V. RANDOM VARIABLES, PROBABILITY DISTRIBUTIONS, EXPECTED VALUE


Statistics for laboratory scientists II

Expected Value 7/7/2006

Section B. The Theoretical Sampling Distribution of the Sample Mean and Its Estimate Based on a Single Sample

Lecture 1 : The Mathematical Theory of Probability

Probability Distributions for Discrete RV

From last time... The equations

Events A and B are said to be independent if the occurrence of A does not affect the probability of B.

Applied Statistics I

NLP: Probability. 1 Basics. Dan Garrette December 27, E : event space (sample space)

Probability (10A) Young Won Lim 6/12/17

Statistics Statistical Process Control & Control Charting

Conditional Probability

CISC 1100/1400 Structures of Comp. Sci./Discrete Structures Chapter 7 Probability. Outline. Terminology and background. Arthur G.

6.2 Introduction to Probability. The Deal. Possible outcomes: STAT1010 Intro to probability. Definitions. Terms: What are the chances of?

Conditional Probability

4/17/2012. NE ( ) # of ways an event can happen NS ( ) # of events in the sample space

What is Probability? Probability. Sample Spaces and Events. Simple Event

Formal Modeling in Cognitive Science Lecture 19: Application of Bayes Theorem; Discrete Random Variables; Distributions. Background.

Notation: X = random variable; x = particular value; P(X = x) denotes probability that X equals the value x.

Formal Modeling in Cognitive Science

What is a random variable

Probability. VCE Maths Methods - Unit 2 - Probability

Lecture 3: Random variables, distributions, and transformations

Probability: Terminology and Examples Class 2, Jeremy Orloff and Jonathan Bloom

Inference using structural equations with latent variables

More on Distribution Function

CS 361: Probability & Statistics

Lecture 23. November 15, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Random Variables. Statistics 110. Summer Copyright c 2006 by Mark E. Irwin

Example. Test for a proportion

Toxicokinetics: Absorption, Distribution, and Excretion. Michael A. Trush, PhD Johns Hopkins University

Discrete Probability Distribution

The devil is in the denominator

Expected Value. Lecture A Tiefenbruck MWF 9-9:50am Center 212 Lecture B Jones MWF 2-2:50pm Center 214 Lecture C Tiefenbruck MWF 11-11:50am Center 212

Chapter 13, Probability from Applied Finite Mathematics by Rupinder Sekhon was developed by OpenStax College, licensed by Rice University, and is

Why should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4.

Introduction to bivariate analysis

Sec$on Summary. Assigning Probabilities Probabilities of Complements and Unions of Events Conditional Probability

Introduction to bivariate analysis

Lecture Lecture 5

TA Qinru Shi: Based on poll result, the first Programming Boot Camp will be: this Sunday 5 Feb, 7-8pm Gates 114.

Introduction Probability. Math 141. Introduction to Probability and Statistics. Albyn Jones

ECE 450 Lecture 2. Recall: Pr(A B) = Pr(A) + Pr(B) Pr(A B) in general = Pr(A) + Pr(B) if A and B are m.e. Lecture Overview

Deep Learning for Computer Vision

Prof. Thistleton MAT 505 Introduction to Probability Lecture 18

324 Stat Lecture Notes (1) Probability

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)

Joint Distribution of Two or More Random Variables

Independence Solutions STAT-UB.0103 Statistics for Business Control and Regression Models

CS206 Review Sheet 3 October 24, 2018

3.5. First step analysis

CS 441 Discrete Mathematics for CS Lecture 20. Probabilities. CS 441 Discrete mathematics for CS. Probabilities

9/6/2016. Section 5.1 Probability. Equally Likely Model. The Division Rule: P(A)=#(A)/#(S) Some Popular Randomizers.

Discrete Structures for Computer Science

Lecture 3 - Axioms of Probability

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Probability Pearson Education, Inc. Slide

(i) Given that a student is female, what is the probability of having a GPA of at least 3.0?

Marquette University MATH 1700 Class 5 Copyright 2017 by D.B. Rowe

Steve Smith Tuition: Maths Notes

Lecture 6 Random Variable. Compose of procedure & observation. From observation, we get outcomes

CSC Discrete Math I, Spring Discrete Probability

Conditional Probability and Bayes Theorem (2.4) Independence (2.5)

Lecture 2. Constructing Probability Spaces

The Bernoulli distribution has only two outcomes, Y, success or failure, with values one or zero. The probability of success is p.

ELEG 3143 Probability & Stochastic Process Ch. 1 Experiments, Models, and Probabilities

Parametric Models: from data to models

Sociology 6Z03 Topic 10: Probability (Part I)

Introduction to Probability and Sample Spaces

Conditional Expectation

k P (X = k)

CLASS 6 July 16, 2015 STT

Chapter 2 PROBABILITY SAMPLE SPACE

Mean, Median and Mode. Lecture 3 - Axioms of Probability. Where do they come from? Graphically. We start with a set of 21 numbers, Sta102 / BME102

What are the odds? Coin tossing and applications

Quantitative Methods for Decision Making

Conditional Probability (cont...) 10/06/2005

5.2 Fisher information and the Cramer-Rao bound

I - Probability. What is Probability? the chance of an event occuring. 1classical probability. 2empirical probability. 3subjective probability

Advanced Structural Equations Models I

Nuisance parameters and their treatment

Frequentist Statistics and Hypothesis Testing Spring

1 Random variables and distributions

2. Conditional Probability

Transcription:

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site. Copyright 2006, The Johns Hopkins University and Brian Caffo. All rights reserved. Use of these materials permitted only in accordance with license rights granted. Materials provided AS IS ; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for obtaining permissions for use from third parties as needed.

Outline 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks

Likelihood A common and fruitful approach to statistics is to assume that the data arises from a family of distributions indexed by a parameter that represents a useful summary of the distribution The likelihood of a collection of data is the joint density evalued as a function of the parameters with the data fixed Likelihood analysis of data uses the likelihood to perform inference regarding the unknown parameter

Likelihood Given a statistical probability mass function or density, say f(x, θ), where θ is an unknown parameter, the likelihood is f viewed as a function of θ for a fixed, observed value of x.

Interpretations of likelihoods The likelihood has the following properties: 1. Ratios of likelihood values measure the relative evidence of one value of the unknown parameter to another. 2. Given a statistical model and observed data, all of the relevant information contained in the data regarding the unkown parameter is contained in the likelihood. 3. If {X i } are independent events, then their likelihoods multiply. That is, the likelihood of the parameters given all of the X i is simply the produce of the individual likelihoods.

Example Suppose that we flip a coin with success probability θ Recall that the mass function for x f(x, θ) = θ x (1 θ) 1 x for θ [0, 1]. where x is either 0 (Tails) or 1 (Heads) Suppose that the result is a head The likelihood is L(θ, 1) = θ 1 (1 θ) 1 1 = θ for θ [0, 1]. Therefore, L(.5, 1)/L(.25, 1) = 2, There is twice as much evidence supporting the hypothesis that θ =.5 to the hypothesis that θ =.25

Example cont d Suppose now that we flip our coin from the previous example 4 times and get the sequence 1, 0, 1, 1 The likelihood is: L(θ, 1, 0, 1, 1) = θ 1 (1 θ) 1 1 θ 0 (1 θ) 1 0 θ 1 (1 θ) 1 1 θ 1 (1 θ) 1 1 = θ 3 (1 θ) 1 This likelihood only depends on the total number of heads and the total number of tails; we might write L(θ, 0, 3) for shorthand Now consider L(.5, 0, 3)/L(.25, 0, 3) = 5.33 There is over five times as much evidence supporting the hypothesis that θ =.5 over that θ =.25

Plotting likelihoods Generally, we want to consider all the values of θ between 0 and 1 A likelihood plot displays θ by L(θ, x) Usually, it is divided by its maximum value so that its height is 1 Because the likelihood measures relative evidence, dividing the curve by its maximum value (or any other value for that matter) does not change its interpretation

0.0 0.2 0.4 0.6 0.8 1.0 θ Likelihood 0.0 0.2 0.4 0.6 0.8 1.0

Maximum likelihood The value of θ where the curve reaches its maximum has a special meaning It is the value of θ that is most well supported by the data This point is called the maximum likelihood estimate (or MLE) of θ MLE = argmax θ L(θ, x). Another interpretation of the MLE is that it is the value of θ that would make the data that we observed most probable

Maximum likelihood, coin example The maximum liklelihood estimate for θ is always the proprotion of heads Proof: Let x be the number of heads and n be the number of trials Recall L(θ, x) = θ x (1 θ) n x It s easier to maximize the log-likelihood l(θ, x) = x log(θ) + (n x) log(1 θ)

Cont d Taking the derivative we get d dθ l(θ, x) = x θ n x 1 θ Setting equal to zero implies (1 x n )θ = (1 θ)x n Which is clearly solved at θ = x n Notice that the second derivative d 2 dθ 2l(θ, x) = x θ 2 n x (1 θ) 2 < 0 provided that x is not 0 or n

What constitutes strong evidence? Again imagine an experiment where a person repeatedly flips a coin Consider the possiblity that we are entertaining three hypotheses: H 1 : θ = 0, H 2 : θ =.5, and H 3 : θ = 1

Outcome X P(X H 1 ) P(X H 2 ) P(X H 3 ) L(H 1 )/L(H 2 ) L(H 3 )/L(H 2 ) H 0.5 1 0 2 T 1.5 0 2 0 HH 0.25 1 0 4 HT 0.25 0 0 0 TH 0.25 0 0 0 TT 1.25 0 4 0 HHH 0.125 1 8 0 HHT 0.125 0 0 0 HTH 0.125 0 0 0 THH 0.125 0 0 0 HTT 0.125 0 0 0 THT 0.125 0 0 0 TTH 0.125 0 0 0 TTT 1.125 0 0 8

Benchmarks Using this example as a guide, researchers tend to think of a likelihood ratio of 8 as being moderate evidence of 16 as being moderately strong evidence of 32 as being strong evidence of one hypothesis over another Because of this, it is common to draw reference lines at these values on likelihood plots Parameter values above the 1/8 reference line, for example, are such that no other point is more than 8 times better supported given the data