Notes on Decision Theory and Prediction

Size: px
Start display at page:

Download "Notes on Decision Theory and Prediction"

Transcription

1 Notes on Decision Theory and Prediction Ronald Christensen Professor of Statistics Department of Mathematics and Statistics University of New Mexico October 7, Decision Theory Decision theory is a very general theory that allows one to examine Bayesian estimation and hypothesis testing as well as Neyman-Pearson hypothesis testing and many aspects of frequentist estimation. I am not aware that it has anything to say about Fisherian significance testing. In decision theory we start with states of nature Θ, potential actions a A, and a loss function L(, a) that takes real values. We are interesting in taking actions that will reduce our losses. Some formulations of decision theory incorporate a utility function U(, a) and seek actions that increase utility. The formulations are interchangeable by simply taking U(, a) L(, a). Eventually, we will want to incorporate data in the form of a random vector X taking values in X and having density f(x ). The distribution of X is called the sampling distribution. We will focus on three special cases. 0

2 Estimation of a scalar state of nature involves scalar actions with Θ A R. Three commonly used loss functions are Squared error, L(, a) ( a) 2 ; Weighted squared error, L(, a) w()( a) 2, wherein w() is a known weighting function taking positive values; Absolute error, L(, a) a. Estimation of a vector involves Θ A R r. Three commonly used loss functions are L(, a) ( a) ( a) a 2 L(, a) w() a 2, w() > 0 L(, a) r j1 j a j Hypothesis testing involves two hypotheses, say Θ { 0, 1 }, and two corresponding actions A {a 0, a 1 }. What is key in this problem is that there are only two states of nature in Θ that we can think of as the null and alternative hypotheses, and two corresponding actions in A that we can think of as accepting the null (rejecting the alternative) and accepting the alternative (rejecting the null). The standard loss function is L(, a) a 0 a A more general loss function is L(, a) a 0 a 1 0 c 00 c 01 1 c 10 c 11 wherein, presumably, c 00 c 01 and c 10 c 11. 1

3 2. Optimal Prior Actions If is random, i.e., if has a prior distribution, then the optimal action is defined to be the action that minimizes the expected loss, E[L(, a)] E [L(, a)] Proposition 1: For Θ A R and L(, a) ( a) 2, if is random, the optimal action is â E(). Proof: It is enough to show that E[( a) 2 ] E[( â) 2 ] + (â a) 2 because then the minimizing value of a occurs when â a. As is so often the case, the proof proceeds by subtracting and adding the correct answer. E[( a) 2 ] E[({ â} + {â a}) 2 ] E[( â) 2 ] + 2E[( â)(â a)] + E[(â a) 2 ] E[( â) 2 ] + 2(â a)e[( â)] + (â a) 2 E[( â) 2 ] + (â a) 2 The third equality holds because (â a) 2 is a constant and the fourth holds because E[ E()] 0. Proposition 2: For Θ A R and L(, a) w()( a) 2, if is random, the optimal action is â E[w()]/E[w()]. Proof: The proof is an exercise. Write E[w()( a) 2 ] E[w()( â + â a) 2 ]. 2

4 Proposition 3: For Θ A R and L(, a) a, if is random, the optimal action is â m Median(). Proof: I changed some notation in this proof but I did not really look at it, I had generated it some time ago. Without loss of generality assume a is greater than the median m of so that p a P r[ > a] 0.5 E[ a ] a a dp a a + + ( a)dp + ( a)dp + m m a a + (a )dp + a m (a )dp + ( m)dp + m m ( m)dp + a a (a )dp + (a m)dp + m m a a ( m)dp + m a a (m a)dp + (m a)dp m m (m + a 2)dP (a )dp (m a)dp (m )dp m m (a m)dp + (m )dp + (a m)dp a m m dp + (m a)dp + (m + a 2)dP + (a m)dp a a m m dp + (m a)p a + (m + a 2)dP + 0.5(a m) m a m dp + (0.5 p a )(a m) + (m + a 2)dP m a m dp + (0.5 p a )(a m) + (m a)dp m m dp + (0.5 p a )(a m) + (0.5 p a )(m a) m dp E[ m ] 3

5 Proposition 4: action is For Θ { 0, 1 }, A {a 0, a 1 }, L(, a) I( a), the optimal a 0 if Pr( 0 ) > 0.5 â a 1 if Pr( 0 ) < 0.5. Proof: Note that E[L(, a 0 )] L( 0, a 0 )Pr( 0 ) + L( 1, a 0 )Pr( 1 ) Pr( 1 ) and E[L(, a 1 )] L( 0, a 1 )Pr( 0 ) + L( 1, a 1 )Pr( 1 ) Pr( 0 ). If Pr( 1 ) < Pr( 0 ) the optimal action is a 0 and if Pr( 1 ) > Pr( 0 ) the optimal action is a 1. However, Pr( 0 ) + Pr( 1 ) 1, so Pr( 1 ) < Pr( 0 ) if and only if Pr( 0 ) > Optimal Posterior Actions Suppose we have a data vector X with density f(x ). If is random, i.e., if has a prior density p(), a Bayesian updates the distribution of using the data and Bayes Theorem to get the posterior density p( X) f(x )p() f(x )p()dµ() CLASS: think of dµ() d. The Bayes action is defined to be the action that minimizes the expected loss, E[L(, a) X] E X [L(, a)]. The Bayes action is just the optimal action when the distribution on is the posterior distribution given X. Recognizing this fact, the previous section provides a number of results immediately. Proposition 1a: For Θ A R, data X x, and L(, a) ( a) 2, if is random, the Bayes action is â E X () E( X x). 4

6 Proposition 2a: For Θ A R, data X x, and L(, a) w()( a) 2, if is random, the Bayes action is â E[w() X x]/e[w() X x]. Proposition 3a: For Θ A R, data X x, and L(, a) a, if is random, the Bayes action is â m Median( X x). Proposition 4a: the Bayes action is For Θ { 0, 1 }, data X x, A {a 0, a 1 }, L(, a) I( a), a 0 if Pr( 0 X x) > 0.5 â a 1 if Pr( 0 X x) < Traditional Decision Theory With states of nature Θ, potential actions a A, and a data vector X taking values in X and having density f(x ), a decision function is defined as a mapping of the data into the action space, i.e., : X A. With a loss function L(, a), the risk function is defined as R(, ) E X {L[, (X)]}. To frequentists, the risk function is the soul of decision theory. The Bayes risk is a frequentist idea of what a Bayesian should worry about. With a prior distribution, call it p, on, the Bayes risk is defined as r(p, ) E [R(, )]. Frequentists think that Bayesians should be concerned about finding the Bayes decision rule that minimizes the Bayes risk. 5

7 Formally, for a prior p, the Bayes rule is a decision function p with r(p, p ) inf r(p, ). Bayesians think that they should be concerned with finding the Bayes action given the data, as discussed in the previous section. Fortunately, these amount to the same thing. To minimize the Bayes risk, you pick (x) to minimize r(p, ) E [R(, )] E ( EX {L[, (X)]} ) E X ( E X {L[, (X)]} ). This can be minimized by picking (x) to be the Bayes action that minimizes E Xx {L[, (x)]} for every value of x. One exception to Bayesians being concerned about the Bayes action rather than the Bayes decision rule is when a Bayesian is trying to design an experiment, hence is concerned with possible data rather than already observed data. 5. Prediction Theory In prediction theory one wishes to predict an unobserved random vector y based on an observed random vector x. Let s say that y has q dimensions and that x has p 1 dimensions. We assume that the joint distribution of x and y is known. Any predictor of y is some function of x, say ỹ(x). We define a predictive loss function, L[y, ỹ(x)] and seek to find a predictor ŷ(x) that minimizes the expected prediction loss, E{L[y, ỹ(x)]}, where the expectation is over both y and x. Note that ( E x,y {L[y, ỹ(x)]} E x Ey x {L[y, ỹ(x)]} ) 6

8 or in alternative notation E{L[y, ỹ(x)]} E (E{L[y, ỹ(x)] x}). In particular, there is a one to one correspondence between prediction theory and the approach of traditional decision theory to Bayesian analysis. We associate y with and x with X. In prediction we assume a joint distribution for x and y whereas in Bayesian analysis we specify the sampling distribution and the prior that together determine the joint distribution of and X. A predictor ỹ(x) is analogous to a decision rule. The expected prediction error E x,y {L[y, ỹ(x)]} is analogous to the Bayes risk. Just like in Bayesian analysis, the way to find the best predictor is, for each value of x, to find the value of ỹ(x) that minimizes E{L[y, ỹ(x)] x}. The most common prediction problem is similar to linear regression in which y takes values in R and uses squared error loss, L[y, ỹ(x)] [y ỹ(x)] 2. We want to minimize the expected prediction error E{L[y, ỹ(x)]} E{[y ỹ(x)] 2 } where the expectation is over both y and x. Identifying prediction with decision and conditioning on x, we see that Proposition 1a implies Proposition 1b: For data (x, y), y R, and L(y, ỹ(x)) [y ỹ(x)] 2, the best predictor is ŷ E(y x). Regression, both linear and nonparametric, is about estimating the optimal predictor E(y x). Note that this result holds even when y is Bernoulli, in which case the best predictor under squared error loss is E(y x) Pr[y 1 x]. Using squared error loss with a Bernoulli variable y is essentially using Brier Scores. 7

9 Similarly we can get other best predictors. Proposition 2b: For data (x, y), y R, and L(y, ỹ(x)) w(y)[y ỹ(x)] 2, the best predictor is ŷ E[w(y)y x]/e[w(y) x]. Proposition 3b: For data (x, y), y R, and L(y, ỹ(x)) y ỹ(x), the best predictor is ŷ m Median(y x). When y takes values in {0, 1}, and alternative loss function is the so called Hamming loss, L[y, ỹ(x)] I[y ỹ(x)], wherein a predictor ỹ(x) also needs to take values in {0, 1}. We want to minimize the expected prediction error E{L[y, ỹ(x)]} E{I[y ỹ(x)]} where the expectation is over both y and x. We see that Proposition 4a implies Proposition 4b: best predictor is For data (x, y), y {0, 1} and L(y, ỹ(x)) I(y ỹ(x)), the 0 if Pr(y 0 x) > 0.5 ŷ(x) 1 if Pr(y 0 x) < 0.5. In binary regression people tend to focus on the probability of getting a 1, rather than getting a 0 (which is analogous to a null hypothesis), so it is more common to think of the optimal predictor as 0 if Pr(y 1 x) < 0.5 ŷ(x) 1 if Pr(y 1 x) > 0.5. Binomial (logistic/probit) regression is about estimating the probability Pr(y 1 x). For squared error loss, this gives the estimated optimal predictor. For Hamming loss, the estimated optimal predictor is 0 or 1 depending on whether the estimated value of Pr(y 1 x) is less than 0.5 8

10 Fisher argued (similarly to Bayesians) that prediction problems should be considered entirely as conditional on the predictor vector x. However, there are some predictive measures such as the coefficient of determination that are defined with respect to the distribution on x. Measures that depend on the distribution of x are inappropriate to compare when the distribution of x changes. Thus it is common to argue that R 2 values for the same model on different data are not comparable. In fact, that is only true if the x data have been sample from a different population which is usually the case. 9

11 6. Minimax Rules Definition 1: A decision rule 0 is a minimax rule if sup R(, 0 ) inf sup R(, ). Definition 2: A prior distribution on, say g, is a least favorable distribution if inf If is a Bayes rule with respect to g then r(g, ) sup inf r(g, ). g r(g, ) inf r(g, ) sup inf r(g, ). g We present without proof the Minimax Theorem Theorem 3: inf sup g r(g, ) sup g inf r(g, ). Corollary 4: For any, sup R(, ) sup g r(g, ). Proof: Observe that r(g, ) E [R(, )] E [sup R(, )] sup R(, ), so sup g r(g, ) sup R(, ). Conversely, by considering the subset of priors that take on the value with probability one, say g, note that r(g, ) R(, ) and sup r(g, ) sup r(g, ) sup R(, ). g g : Θ 10

12 Proposition 5: If the Minimax Theorem holds, 0 is a minimax rule, and g is a least favorable distribution with corresponding Bayes rule, then 0 is also a Bayes rule with respect to the least favorable distribution. (If the Bayes rule happens to be unique, we must have 0.) Proof: Using Corollary 4, Definition 1, Corollary 4, the Minimax Theorem 3, and Definition 2, r(g, 0 ) sup r(g, 0 ) g sup R(, 0 ) inf inf sup R(, ) sup r(g, ) g sup inf r(g, ) g r(g, ) This must be an equality since we know by definition of the Bayes rule that r(g, 0 ) r(g, ) Since 0 and have the same Bayes risk, they must both be Bayes rules. The point is that a Bayes rule for a least favorable distribution isn t necessarily a minimax rule, but a minimax rule, if it exists, is necessarily a Bayes rule for a least favorable distribution. 11

13 Definition 6: 0 is an equalizer rule if for some constant K, R(, 0 ) K for all. Proposition 7: If the Minimax Theorem holds and if 0 is both an equalizer rule and the Bayes rule for some prior distribution g 0, then 0 is minimax. Proof: inf sup R(, ) sup R(, 0 ) K r(g 0, 0 ) inf By the Minimax Theorem, all of these are equal, so in particular r(g 0, ) sup inf r(g 0, ). g inf sup R(, ) sup R(, 0 ). Exercise: Let X Bin(n, ) and Beta(α, β). Assume that the Minimax Theorem holds! For squared error loss, find the Bayes rule, say, αβ. Find R(, αβ ). Pick α and β so that αβ is an equalizer rule. Establish that αβ a minimax rule. 12

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior: Pi Priors Unobservable Parameter population proportion, p prior: π ( p) Conjugate prior π ( p) ~ Beta( a, b) same PDF family exponential family only Posterior π ( p y) ~ Beta( a + y, b + n y) Observed

More information

Topic 10: Hypothesis Testing

Topic 10: Hypothesis Testing Topic 10: Hypothesis Testing Course 003, 2016 Page 0 The Problem of Hypothesis Testing A statistical hypothesis is an assertion or conjecture about the probability distribution of one or more random variables.

More information

LECTURE NOTES 57. Lecture 9

LECTURE NOTES 57. Lecture 9 LECTURE NOTES 57 Lecture 9 17. Hypothesis testing A special type of decision problem is hypothesis testing. We partition the parameter space into H [ A with H \ A = ;. Wewrite H 2 H A 2 A. A decision problem

More information

Topic 10: Hypothesis Testing

Topic 10: Hypothesis Testing Topic 10: Hypothesis Testing Course 003, 2017 Page 0 The Problem of Hypothesis Testing A statistical hypothesis is an assertion or conjecture about the probability distribution of one or more random variables.

More information

Introduction to Bayesian Statistics 1

Introduction to Bayesian Statistics 1 Introduction to Bayesian Statistics 1 STA 442/2101 Fall 2018 1 This slide show is an open-source document. See last slide for copyright information. 1 / 42 Thomas Bayes (1701-1761) Image from the Wikipedia

More information

Lecture 21. Hypothesis Testing II

Lecture 21. Hypothesis Testing II Lecture 21. Hypothesis Testing II December 7, 2011 In the previous lecture, we dened a few key concepts of hypothesis testing and introduced the framework for parametric hypothesis testing. In the parametric

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are

More information

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Hypothesis Testing Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Suppose the family of population distributions is indexed

More information

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

Detection and Estimation Chapter 1. Hypothesis Testing

Detection and Estimation Chapter 1. Hypothesis Testing Detection and Estimation Chapter 1. Hypothesis Testing Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2015 1/20 Syllabus Homework:

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

Empirical Risk Minimization Algorithms

Empirical Risk Minimization Algorithms Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:

More information

Two examples of the use of fuzzy set theory in statistics. Glen Meeden University of Minnesota.

Two examples of the use of fuzzy set theory in statistics. Glen Meeden University of Minnesota. Two examples of the use of fuzzy set theory in statistics Glen Meeden University of Minnesota http://www.stat.umn.edu/~glen/talks 1 Fuzzy set theory Fuzzy set theory was introduced by Zadeh in (1965) as

More information

Lectures on Statistics. William G. Faris

Lectures on Statistics. William G. Faris Lectures on Statistics William G. Faris December 1, 2003 ii Contents 1 Expectation 1 1.1 Random variables and expectation................. 1 1.2 The sample mean........................... 3 1.3 The sample

More information

Derivation of Monotone Likelihood Ratio Using Two Sided Uniformly Normal Distribution Techniques

Derivation of Monotone Likelihood Ratio Using Two Sided Uniformly Normal Distribution Techniques Vol:7, No:0, 203 Derivation of Monotone Likelihood Ratio Using Two Sided Uniformly Normal Distribution Techniques D. A. Farinde International Science Index, Mathematical and Computational Sciences Vol:7,

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Spring, 2006 1. DeGroot 1973 In (DeGroot 1973), Morrie DeGroot considers testing the

More information

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Lecture Notes 15 Prediction Chapters 13, 22, 20.4. Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Lecture notes on statistical decision theory Econ 2110, fall 2013

Lecture notes on statistical decision theory Econ 2110, fall 2013 Lecture notes on statistical decision theory Econ 2110, fall 2013 Maximilian Kasy March 10, 2014 These lecture notes are roughly based on Robert, C. (2007). The Bayesian choice: from decision-theoretic

More information

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1 Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Statistics 100A Homework 5 Solutions

Statistics 100A Homework 5 Solutions Chapter 5 Statistics 1A Homework 5 Solutions Ryan Rosario 1. Let X be a random variable with probability density function a What is the value of c? fx { c1 x 1 < x < 1 otherwise We know that for fx to

More information

Special Topic: Bayesian Finite Population Survey Sampling

Special Topic: Bayesian Finite Population Survey Sampling Special Topic: Bayesian Finite Population Survey Sampling Sudipto Banerjee Division of Biostatistics School of Public Health University of Minnesota April 2, 2008 1 Special Topic Overview Scientific survey

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Bayesian Inference for Normal Mean

Bayesian Inference for Normal Mean Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

HYPOTHESIS TESTING: FREQUENTIST APPROACH. HYPOTHESIS TESTING: FREQUENTIST APPROACH. These notes summarize the lectures on (the frequentist approach to) hypothesis testing. You should be familiar with the standard hypothesis testing from previous

More information

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

Data Analysis and Monte Carlo Methods

Data Analysis and Monte Carlo Methods Lecturer: Allen Caldwell, Max Planck Institute for Physics & TUM Recitation Instructor: Oleksander (Alex) Volynets, MPP & TUM General Information: - Lectures will be held in English, Mondays 16-18:00 -

More information

(1) Introduction to Bayesian statistics

(1) Introduction to Bayesian statistics Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics School of Computing & Communication, UTS January, 207 Random variables Pre-university: A number is just a fixed value. When we talk about probabilities: When X is a continuous random variable, it has a

More information

Frequentist Statistics and Hypothesis Testing Spring

Frequentist Statistics and Hypothesis Testing Spring Frequentist Statistics and Hypothesis Testing 18.05 Spring 2018 http://xkcd.com/539/ Agenda Introduction to the frequentist way of life. What is a statistic? NHST ingredients; rejection regions Simple

More information

MIT Spring 2016

MIT Spring 2016 MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 3 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a)

More information

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses. 1 Review: Let X 1, X,..., X n denote n independent random variables sampled from some distribution might not be normal!) with mean µ) and standard deviation σ). Then X µ σ n In other words, X is approximately

More information

http://www.math.uah.edu/stat/hypothesis/.xhtml 1 of 5 7/29/2009 3:14 PM Virtual Laboratories > 9. Hy pothesis Testing > 1 2 3 4 5 6 7 1. The Basic Statistical Model As usual, our starting point is a random

More information

Inference for a Population Proportion

Inference for a Population Proportion Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist

More information

Bayesian Computation

Bayesian Computation Bayesian Computation CAS Centennial Celebration and Annual Meeting New York, NY November 10, 2014 Brian M. Hartman, PhD ASA Assistant Professor of Actuarial Science University of Connecticut CAS Antitrust

More information

ECE531 Lecture 8: Non-Random Parameter Estimation

ECE531 Lecture 8: Non-Random Parameter Estimation ECE531 Lecture 8: Non-Random Parameter Estimation D. Richard Brown III Worcester Polytechnic Institute 19-March-2009 Worcester Polytechnic Institute D. Richard Brown III 19-March-2009 1 / 25 Introduction

More information

Ch. 5 Hypothesis Testing

Ch. 5 Hypothesis Testing Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation,

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

ECE531 Lecture 4b: Composite Hypothesis Testing

ECE531 Lecture 4b: Composite Hypothesis Testing ECE531 Lecture 4b: Composite Hypothesis Testing D. Richard Brown III Worcester Polytechnic Institute 16-February-2011 Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 1 / 44 Introduction

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

21.1 Lower bounds on minimax risk for functional estimation

21.1 Lower bounds on minimax risk for functional estimation ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 1: Functional estimation & testing Lecturer: Yihong Wu Scribe: Ashok Vardhan, Apr 14, 016 In this chapter, we will

More information

Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004

Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004 Hypothesis Testing BS2 Statistical Inference, Lecture 11 Michaelmas Term 2004 Steffen Lauritzen, University of Oxford; November 15, 2004 Hypothesis testing We consider a family of densities F = {f(x; θ),

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Fundamental Probability and Statistics

Fundamental Probability and Statistics Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D. probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 17 Probability and Statistics Prof. D. Joyce, Fall 014 I

More information

Confidence Intervals. CAS Antitrust Notice. Bayesian Computation. General differences between Bayesian and Frequntist statistics 10/16/2014

Confidence Intervals. CAS Antitrust Notice. Bayesian Computation. General differences between Bayesian and Frequntist statistics 10/16/2014 CAS Antitrust Notice Bayesian Computation CAS Centennial Celebration and Annual Meeting New York, NY November 10, 2014 Brian M. Hartman, PhD ASA Assistant Professor of Actuarial Science University of Connecticut

More information

STAT 830 Hypothesis Testing

STAT 830 Hypothesis Testing STAT 830 Hypothesis Testing Richard Lockhart Simon Fraser University STAT 830 Fall 2018 Richard Lockhart (Simon Fraser University) STAT 830 Hypothesis Testing STAT 830 Fall 2018 1 / 30 Purposes of These

More information

BEST TESTS. Abstract. We will discuss the Neymann-Pearson theorem and certain best test where the power function is optimized.

BEST TESTS. Abstract. We will discuss the Neymann-Pearson theorem and certain best test where the power function is optimized. BEST TESTS Abstract. We will discuss the Neymann-Pearson theorem and certain best test where the power function is optimized. 1. Most powerful test Let {f θ } θ Θ be a family of pdfs. We will consider

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Bayesian inference: what it means and why we care

Bayesian inference: what it means and why we care Bayesian inference: what it means and why we care Robin J. Ryder Centre de Recherche en Mathématiques de la Décision Université Paris-Dauphine 6 November 2017 Mathematical Coffees Robin Ryder (Dauphine)

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/?? to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter

More information

Probability and Statistics qualifying exam, May 2015

Probability and Statistics qualifying exam, May 2015 Probability and Statistics qualifying exam, May 2015 Name: Instructions: 1. The exam is divided into 3 sections: Linear Models, Mathematical Statistics and Probability. You must pass each section to pass

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

Bayesian Model Diagnostics and Checking

Bayesian Model Diagnostics and Checking Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Introduction MCMCMC 2 / 34 Introduction MCMCMC Steps in

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten

More information

TUTORIAL 8 SOLUTIONS #

TUTORIAL 8 SOLUTIONS # TUTORIAL 8 SOLUTIONS #9.11.21 Suppose that a single observation X is taken from a uniform density on [0,θ], and consider testing H 0 : θ = 1 versus H 1 : θ =2. (a) Find a test that has significance level

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

CSE 312 Final Review: Section AA

CSE 312 Final Review: Section AA CSE 312 TAs December 8, 2011 General Information General Information Comprehensive Midterm General Information Comprehensive Midterm Heavily weighted toward material after the midterm Pre-Midterm Material

More information

Class 26: review for final exam 18.05, Spring 2014

Class 26: review for final exam 18.05, Spring 2014 Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event

More information

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4 STA 73: Inference Notes. Neyman-Pearsonian Classical Hypothesis Testing B&D 4 1 Testing as a rule Fisher s quantification of extremeness of observed evidence clearly lacked rigorous mathematical interpretation.

More information

STAT 830 Hypothesis Testing

STAT 830 Hypothesis Testing STAT 830 Hypothesis Testing Hypothesis testing is a statistical problem where you must choose, on the basis of data X, between two alternatives. We formalize this as the problem of choosing between two

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics

More information

Bayes rule and Bayes error. Donglin Zeng, Department of Biostatistics, University of North Carolina

Bayes rule and Bayes error. Donglin Zeng, Department of Biostatistics, University of North Carolina Bayes rule and Bayes error Definition If f minimizes E[L(Y, f (X))], then f is called a Bayes rule (associated with the loss function L(y, f )) and the resulting prediction error rate, E[L(Y, f (X))],

More information

The Delta Method and Applications

The Delta Method and Applications Chapter 5 The Delta Method and Applications 5.1 Local linear approximations Suppose that a particular random sequence converges in distribution to a particular constant. The idea of using a first-order

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Statistical Inference

Statistical Inference Statistical Inference Classical and Bayesian Methods Class 7 AMS-UCSC Tue 31, 2012 Winter 2012. Session 1 (Class 7) AMS-132/206 Tue 31, 2012 1 / 13 Topics Topics We will talk about... 1 Hypothesis testing

More information

Lecture 13 and 14: Bayesian estimation theory

Lecture 13 and 14: Bayesian estimation theory 1 Lecture 13 and 14: Bayesian estimation theory Spring 2012 - EE 194 Networked estimation and control (Prof. Khan) March 26 2012 I. BAYESIAN ESTIMATORS Mother Nature conducts a random experiment that generates

More information

Quiz 2 Date: Monday, November 21, 2016

Quiz 2 Date: Monday, November 21, 2016 10-704 Information Processing and Learning Fall 2016 Quiz 2 Date: Monday, November 21, 2016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED. 2. Write your name,

More information

Making peace with p s: Bayesian tests with straightforward frequentist properties. Ken Rice, Department of Biostatistics April 6, 2011

Making peace with p s: Bayesian tests with straightforward frequentist properties. Ken Rice, Department of Biostatistics April 6, 2011 Making peace with p s: Bayesian tests with straightforward frequentist properties Ken Rice, Department of Biostatistics April 6, 2011 Biowhat? Biostatistics is the application of statistics to topics in

More information

DR.RUPNATHJI( DR.RUPAK NATH )

DR.RUPNATHJI( DR.RUPAK NATH ) Contents 1 Sets 1 2 The Real Numbers 9 3 Sequences 29 4 Series 59 5 Functions 81 6 Power Series 105 7 The elementary functions 111 Chapter 1 Sets It is very convenient to introduce some notation and terminology

More information

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math. Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if

More information

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test. Economics 52 Econometrics Professor N.M. Kiefer LECTURE 1: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING NEYMAN-PEARSON LEMMA: Lesson: Good tests are based on the likelihood ratio. The proof is easy in the

More information

8: Hypothesis Testing

8: Hypothesis Testing Some definitions 8: Hypothesis Testing. Simple, compound, null and alternative hypotheses In test theory one distinguishes between simple hypotheses and compound hypotheses. A simple hypothesis Examples:

More information

Lecture 21: Minimax Theory

Lecture 21: Minimax Theory Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways

More information

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX Term Test 3 December 5, 2003 Name Math 52 Student Number Direction: This test is worth 250 points and each problem worth 4 points DO ANY SIX PROBLEMS You are required to complete this test within 50 minutes

More information

Terminology for Statistical Data

Terminology for Statistical Data Terminology for Statistical Data variables - features - attributes observations - cases (consist of multiple values) In a standard data matrix, variables or features correspond to columns observations

More information

Chapters 10. Hypothesis Testing

Chapters 10. Hypothesis Testing Chapters 10. Hypothesis Testing Some examples of hypothesis testing 1. Toss a coin 100 times and get 62 heads. Is this coin a fair coin? 2. Is the new treatment more effective than the old one? 3. Quality

More information

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t ) LECURE NOES 21 Lecture 4 7. Sufficient statistics Consider the usual statistical setup: the data is X and the paramter is. o gain information about the parameter we study various functions of the data

More information

Bayesian statistics: Inference and decision theory

Bayesian statistics: Inference and decision theory Bayesian statistics: Inference and decision theory Patric Müller und Francesco Antognini Seminar über Statistik FS 28 3.3.28 Contents 1 Introduction and basic definitions 2 2 Bayes Method 4 3 Two optimalities:

More information

ECE531 Lecture 13: Sequential Detection of Discrete-Time Signals

ECE531 Lecture 13: Sequential Detection of Discrete-Time Signals ECE531 Lecture 13: Sequential Detection of Discrete-Time Signals D. Richard Brown III Worcester Polytechnic Institute 30-Apr-2009 Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 1 / 32

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........

More information

Statistics of Small Signals

Statistics of Small Signals Statistics of Small Signals Gary Feldman Harvard University NEPPSR August 17, 2005 Statistics of Small Signals In 1998, Bob Cousins and I were working on the NOMAD neutrino oscillation experiment and we

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW

A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW Miguel A Gómez-Villegas and Beatriz González-Pérez Departamento de Estadística

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 31 (MWF) Review of test for independence and starting with linear regression Suhasini Subba

More information

Chapters 10. Hypothesis Testing

Chapters 10. Hypothesis Testing Chapters 10. Hypothesis Testing Some examples of hypothesis testing 1. Toss a coin 100 times and get 62 heads. Is this coin a fair coin? 2. Is the new treatment on blood pressure more effective than the

More information

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1 4 Hypothesis testing 4. Simple hypotheses A computer tries to distinguish between two sources of signals. Both sources emit independent signals with normally distributed intensity, the signals of the first

More information