2. I will provide two examples to contrast the two schools. Hopefully, in the end you will fall in love with the Bayesian approach.

Size: px
Start display at page:

Download "2. I will provide two examples to contrast the two schools. Hopefully, in the end you will fall in love with the Bayesian approach."

Transcription

1 Bayesian Statistics 1. In the new era of big data, machine learning and artificial intelligence, it is important for students to know the vocabulary of Bayesian statistics, which has been competing with the classical school (frequentists) throughout the history of statistics. 2. I will provide two examples to contrast the two schools. Hopefully, in the end you will fall in love with the Bayesian approach. Example I: Estimating Proportion 1. Imagine you are a millionaire planning to buy a lake. You love eating walleye, so you want to buy a lake with a lot of walleyes in it. The parameter of interest is the proportion of walleyes in fish population in a lake. 2. Suppose you have no prior information about the proportion of walleye. That means you believe the proportion could be 0, or 10%, or 20%,... or 100% with equal probabilities. The walleye density is low if proportion is 0.1, while the density is high if proportion is Using the jargon of Bayesian statistics, you think the proportion of walleyes is a random variable, and the prior distribution is flat (kind of like a uniform distribution). 4. Here comes the first difference between Frequentist and Bayes: Bayesian school treats any unknown parameter (here, proportion of walleyes) as a random variable, while Frequentist treats a parameter as an unknown constant. As a result, Bayesian school implies implicitly that we will never know for sure the unknown parameter (since it is a random variable). 5. The prior distribution is subjective. One person s prior distribution can differ from another person s. For the same person, the prior distribution can evolve over time as more information becomes available. 6. For example, suppose a friend tells you that there used to be a lot of walleyes in the lake, and you trust him. Then you may assign higher probabilities to those big proportions. Then the new prior distribution can be P (θ = 0.1) = 0.2, P (θ = 0.9) = Here comes the second difference between Frequentist and Bayes: Bayes thinks probability measures the degree of belief, while Frequentist treats probability as the long 1

2 run frequency. Probability of 0.8 indicates that you have strong faith in what your friend tells you. In that regard, probability is also subjective in the Bayesian world. 8. For Bayesian school, statistical inference amounts to using information (data) to update the belief. More explicitly, the Bayes theorem states that P (θ data) = P (θ)p (data θ) P (data) (Bayes theorem) (1) where (a) P (θ) is called prior distribution of the unknown parameter θ the belief you have about θ before seeing the data (b) P (data θ) is called likelihood the probability that you observe the given sample of data conditional on the parameter (c) P (data) is the unconditional or marginal probability of observing the given sample (d) Most importantly, P (θ data) is the posterior distribution the updated belief about θ after information has been digested. Simply put, Bayesian method is concerned with moving from P (θ) to P (θ data), or moving from prior distribution to posterior distribution, using Bayes theorem (1). You can find the discussion of Bayes theorem in any statistics book or on Internet. 9. We can show that P (data) is free of θ since it has been integrated out P (data) = θ P (data, θ) = θ P (θ)p (data θ) (2) where P (data, θ) is the joint distribution. That means for the purpose of understanding θ, we could ignore the denominator in (1) and write P (θ data) P (θ)p (data θ) (3) where represents being proportional to. In short, to obtain the updated belief (posterior), we may only need to figure out prior and likelihood. 10. A technical note: when the information set is big (as sample size goes to infinity), the central limit theorem implies that the likelihood typically converges to a 2

3 normal distribution. So in the limit, the likelihood (normal distribution) dominates the prior, and in general the posterior is a bell-shaped curve. 11. For Bayesian school, the most important result is delivered by plotting the posterior distribution. Next I will show you how to do so for the example I. 12. First we need data (sample). Suppose you spend a whole day catching 10 fishes in that lake, and 3 of them are walleyes. For Frequentist, the estimate for population proportion is just the sample proportion 3 = 30%. That s almost it! The Frequentist now 10 believes we almost know the proportion of walleye, and it is 30%. But the Frequentist admits that there can be sampling uncertainty (another Frequentist possibly catches 4 or even 5 walleyes out of 10 fishes). So they compute the standard error se, and report a confidence interval ( se, se). They tell you that with 95% probability the true proportion is inside that interval. That is it! Then the Frequentist goes to the party. 13. For Bayesian school, they are unhappy with just an interval they want to know the whole (posterior) distribution of θ (the possible values and corresponding probabilities) (a) For simplicity, assuming flat prior distribution P (θ = k) = 1, k = 0, 0.1, 0.2,..., In words, you believe equal probabilities for low and high densities of walleye population. (b) The likelihood or probability of getting 3 walleyes out of 10 fishes, for given θ, is given by a binomial distribution Likelihood = P (3 sucesses out of 10 trials) = C 3 10θ 3 (1 θ) 10 3 (4) where C denotes the number of combinations. Basically, if the probability of success (catching a walleye) is θ, then having m successes out of n trials is C m n θ m (1 θ) n m. Please google binomial distribution to learn more. (c) According to (1) and (2), next we need to multiply the prior by the likelihood and divide the sum of that product. The prior distribution for θ and its posterior distribution after we catch 3 walleyes out of 10 fishes are 3

4 Prior (Red) and Posterior (Blue) Distributions Distribution theta po pr where a green line is drawn to highlight the value of θ = 0.3 that occurs with highest probability. (a) You can think of that number as the Bayesian point estimate of the proportion ˆθ Bayesian = 0.3, which in this case is identical to the Frequentist point estimate ˆθ Frequentist = 0.3. (b) The posterior distribution clearly shows that other values are possible. For instance, either θ = 0.2 or θ = 0.4 can be true with substantial probability. Nevertheless, θ 0.1 or θ 0.6 are unlikely given this sample. (c) So the information (catching 3 walleyes out of 10) is used to update our belief about the proportion we move from the flat prior distribution (red one) to the bell-shaped posterior distribution (blue one). The bell shape confirms the dominance of likelihood. After showing this graph, the Bayesian person finally can join the party! 14. The stata codes are clear set obs 11 sca n = 10 sca m = 3 gen pv = ([_n]-1)*0.1 gen pr = 1/11 4

5 gen lv = binomial(n, m, pv)-binomial(n, (m-1), pv) gen po = lv*pr qui sum po qui replace po = po/r(sum) twoway (connected po pv, ms(th)) (connected pr pv, ms(oh)), ytitle("distribution") where the stata function binomial(n, m, pv) reports P (m or more sucesses out of n trials). 15. What if we catch 6 walleyes out of 10 fishes? You only need to change sca m = 6 in my codes and get Posteior Posterior Distribution theta Now the most likely proportion is 0.6! If I am that walleye-loving millionaire, I may decide to buy the lake. 16. Of course, to be safe, the millionaire can try to get a bigger sample (catch 20 fishes and count how many are walleyes). He can also use the current bell-shaped posterior distribution (other than the naive flat one) as the new prior distribution, and try to get the second-round posterior distribution after catching a bigger sample of fishes. The point is, the Bayes method typically is used in an iterative fashion. 17. To summarize, the Bayesian method uses information to keep updating the belief. After more information arrives, we can update the belief again and again (by plotting posterior distribution again and again). Then informed decision can be made based on the posterior distribution. Bayes statistics is on-going statistics. 5

6 Example II: Mission Impossible for Frequentist (played not by Tom Cruise) 1. There are some problems that the Frequentist simply cannot help. Consider instead of proportion of walleyes, the millionaire wants to know the total number of all fishes (not just walleye). He will only buy the lake if that number is greater than, say, 50. I don t think you have learned any classical statistical model to estimate the size of population. So this is a mission almost impossible for a Frequentist Don t forget those Bayesian guys. This is how Bayesian method works for this tricky problem. Let s first catch 10 fishes, put red paint on them (or tattoo them if you can) and send them back into the lake. After one day, let s catch another 10 fishes, and count how many have red paint. 3. Suppose we have 3 red ones out of the 10 fishes. Intuitively we can do the math 10 red fishes population = 3 10 population 33 This calculation assumes the red fishes spread evenly in the lake. What if they do not (a tattooed guy may like to hang out with another tattooed guy)? So there should be uncertainty associated with the estimate 33. In this case, there is no way a Frequentist can give you something like standard error. Only the Bayes method can be used to account for that inherent uncertainty. 4. Let the unknown parameter be the population size θ = n. The key insight is that the probability of catching red fishes depends on n : P (sucess) = 10 n. Hence we can still use the Binomial distribution to solve this problem: P (3 sucesses out of 10 trials) = C 3 10 ( ) 3 ( ) 10 3 (5) n n 5. For the prior, again let s use flat one P (n = k) = 0.1, k = 10, 20,..., 100 as a starting point. The posterior distribution for the population size after catching 3 red fishes out of 10 is plotted below 1 I say almost because maximum likelihood method can be used by Frequentist. 6

7 Prior (Red) and Posterior (Blue) Distributions Distribution n po pr where the green line marks the most likely population size n = 30. The stata codes are clear set obs 10 sca m = 3 gen nv = ([_n])*10 gen pv = 10/nv gen pr = 1/10 gen lv = binomial(10, m, pv)-binomial(10, (m-1), pv) gen po = lv*pr qui sum po qui replace po = po/r(sum) twoway (connected po nv, ms(th)) (connected pr nv, ms(oh)), ytitle("distribution") 6. Exercise: please modify my codes to do a finer search for population size n = 10, 11, 12,..., The possible population sizes and their corresponding probabilities are. list nv po nv po

8 Bayes is lovable 1. First of all, economists adore Bayes. For example, they use utility function to measure how happy the millionaire is after buying that lake (and eating all those poor fishes). Because the fish population is random, economists compute something called expected utility E(u(c)) = j u(c j )P (c j ) = 10(0) + 20( ) ( ) Here we assume the consumption is fish, and use square root function because it satisfies diminishing marginal utility. Because probability is readily available from the posterior distribution, Bayes result can be incorporated seamlessly into the consumer theory. 2. In fact, any theory or problem that involves uncertainty (probability) can use the help of Bayesian statistics. Alan M. Turing used Bayesian to crack the German encrypted military code during WWII; The British navy used Bayesian to narrow down the sea area when hunting German U-boat; Google engineers use Bayesian to guess whether a picture is dog or cat; Dr. Li uses Bayesian to show off in front of his young kids... If you believe the world is full of uncertainty so that we can never know the truth (which lies somewhere unknown in the middle), please give Bayes a serious thought. 3. To learn more about Bayesian theory, I recommend the book Doing Bayesian Data Analysis: A Tutorial with R and BUGS written by John K. Kruschke. That book gives a good introduction to R as well. 8

(1) Introduction to Bayesian statistics

(1) Introduction to Bayesian statistics Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they

More information

Eco411 Lab: Magic 1.96 and Numerical Method

Eco411 Lab: Magic 1.96 and Numerical Method Eco411 Lab: Magic 1.96 and Numerical Method 1. The goal of this lab session is to reinforce students understanding of the critical value 1.96 by applying the numerical method to evaluate a probability..

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Inference for a Population Proportion

Inference for a Population Proportion Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist

More information

Quantitative Understanding in Biology 1.7 Bayesian Methods

Quantitative Understanding in Biology 1.7 Bayesian Methods Quantitative Understanding in Biology 1.7 Bayesian Methods Jason Banfelder October 25th, 2018 1 Introduction So far, most of the methods we ve looked at fall under the heading of classical, or frequentist

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p). Sampling distributions and estimation. 1) A brief review of distributions: We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation,

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Chapter Three. Hypothesis Testing

Chapter Three. Hypothesis Testing 3.1 Introduction The final phase of analyzing data is to make a decision concerning a set of choices or options. Should I invest in stocks or bonds? Should a new product be marketed? Are my products being

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Uncertain Inference and Artificial Intelligence

Uncertain Inference and Artificial Intelligence March 3, 2011 1 Prepared for a Purdue Machine Learning Seminar Acknowledgement Prof. A. P. Dempster for intensive collaborations on the Dempster-Shafer theory. Jianchun Zhang, Ryan Martin, Duncan Ermini

More information

Bayesian Econometrics

Bayesian Econometrics Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence

More information

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Bayesian Inference. STA 121: Regression Analysis Artin Armagan Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Last few slides from last time

Last few slides from last time Last few slides from last time Example 3: What is the probability that p will fall in a certain range, given p? Flip a coin 50 times. If the coin is fair (p=0.5), what is the probability of getting an

More information

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D. probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 17 Probability and Statistics Prof. D. Joyce, Fall 014 I

More information

A primer on Bayesian statistics, with an application to mortality rate estimation

A primer on Bayesian statistics, with an application to mortality rate estimation A primer on Bayesian statistics, with an application to mortality rate estimation Peter off University of Washington Outline Subjective probability Practical aspects Application to mortality rate estimation

More information

Probability, Entropy, and Inference / More About Inference

Probability, Entropy, and Inference / More About Inference Probability, Entropy, and Inference / More About Inference Mário S. Alvim (msalvim@dcc.ufmg.br) Information Theory DCC-UFMG (2018/02) Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference

More information

7.1 What is it and why should we care?

7.1 What is it and why should we care? Chapter 7 Probability In this section, we go over some simple concepts from probability theory. We integrate these with ideas from formal language theory in the next chapter. 7.1 What is it and why should

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 12: Frequentist properties of estimators (v4) Ramesh Johari ramesh.johari@stanford.edu 1 / 39 Frequentist inference 2 / 39 Thinking like a frequentist Suppose that for some

More information

Probabilistic Reasoning

Probabilistic Reasoning Course 16 :198 :520 : Introduction To Artificial Intelligence Lecture 7 Probabilistic Reasoning Abdeslam Boularias Monday, September 28, 2015 1 / 17 Outline We show how to reason and act under uncertainty.

More information

Joint, Conditional, & Marginal Probabilities

Joint, Conditional, & Marginal Probabilities Joint, Conditional, & Marginal Probabilities The three axioms for probability don t discuss how to create probabilities for combined events such as P [A B] or for the likelihood of an event A given that

More information

Exact Inference by Complete Enumeration

Exact Inference by Complete Enumeration 21 Exact Inference by Complete Enumeration We open our toolbox of methods for handling probabilities by discussing a brute-force inference method: complete enumeration of all hypotheses, and evaluation

More information

2. A Basic Statistical Toolbox

2. A Basic Statistical Toolbox . A Basic Statistical Toolbo Statistics is a mathematical science pertaining to the collection, analysis, interpretation, and presentation of data. Wikipedia definition Mathematical statistics: concerned

More information

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION THOMAS MAILUND Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be

More information

Bayesian Learning Extension

Bayesian Learning Extension Bayesian Learning Extension This document will go over one of the most useful forms of statistical inference known as Baye s Rule several of the concepts that extend from it. Named after Thomas Bayes this

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Study and research skills 2009 Duncan Golicher. and Adrian Newton. Last draft 11/24/2008

Study and research skills 2009 Duncan Golicher. and Adrian Newton. Last draft 11/24/2008 Study and research skills 2009. and Adrian Newton. Last draft 11/24/2008 Inference about the mean: What you will learn Why we need to draw inferences from samples The difference between a population and

More information

Joint, Conditional, & Marginal Probabilities

Joint, Conditional, & Marginal Probabilities Joint, Conditional, & Marginal Probabilities Statistics 110 Summer 2006 Copyright c 2006 by Mark E. Irwin Joint, Conditional, & Marginal Probabilities The three axioms for probability don t discuss how

More information

STOR 435 Lecture 5. Conditional Probability and Independence - I

STOR 435 Lecture 5. Conditional Probability and Independence - I STOR 435 Lecture 5 Conditional Probability and Independence - I Jan Hannig UNC Chapel Hill 1 / 16 Motivation Basic point Think of probability as the amount of belief we have in a particular outcome. If

More information

Week 2 Quantitative Analysis of Financial Markets Bayesian Analysis

Week 2 Quantitative Analysis of Financial Markets Bayesian Analysis Week 2 Quantitative Analysis of Financial Markets Bayesian Analysis Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

P (E) = P (A 1 )P (A 2 )... P (A n ).

P (E) = P (A 1 )P (A 2 )... P (A n ). Lecture 9: Conditional probability II: breaking complex events into smaller events, methods to solve probability problems, Bayes rule, law of total probability, Bayes theorem Discrete Structures II (Summer

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite

More information

The Island Problem Revisited

The Island Problem Revisited The Island Problem Revisited Halvor Mehlum 1 Department of Economics, University of Oslo, Norway E-mail: halvormehlum@econuiono March 10, 2009 1 While carrying out this research, the author has been associated

More information

Bayes Theorem (10B) Young Won Lim 6/3/17

Bayes Theorem (10B) Young Won Lim 6/3/17 Bayes Theorem (10B) Copyright (c) 2017 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later

More information

Bayes Theorem (4A) Young Won Lim 3/5/18

Bayes Theorem (4A) Young Won Lim 3/5/18 Bayes Theorem (4A) Copyright (c) 2017-2018 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

ABE Math Review Package

ABE Math Review Package P a g e ABE Math Review Package This material is intended as a review of skills you once learned and wish to review before your assessment. Before studying Algebra, you should be familiar with all of the

More information

2. Probability. Chris Piech and Mehran Sahami. Oct 2017

2. Probability. Chris Piech and Mehran Sahami. Oct 2017 2. Probability Chris Piech and Mehran Sahami Oct 2017 1 Introduction It is that time in the quarter (it is still week one) when we get to talk about probability. Again we are going to build up from first

More information

Bayesian Inference for Binomial Proportion

Bayesian Inference for Binomial Proportion 8 Bayesian Inference for Binomial Proportion Frequently there is a large population where π, a proportion of the population, has some attribute. For instance, the population could be registered voters

More information

Probability theory basics

Probability theory basics Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables:

More information

Business Statistics. Lecture 5: Confidence Intervals

Business Statistics. Lecture 5: Confidence Intervals Business Statistics Lecture 5: Confidence Intervals Goals for this Lecture Confidence intervals The t distribution 2 Welcome to Interval Estimation! Moments Mean 815.0340 Std Dev 0.8923 Std Error Mean

More information

Confidence intervals CE 311S

Confidence intervals CE 311S CE 311S PREVIEW OF STATISTICS The first part of the class was about probability. P(H) = 0.5 P(T) = 0.5 HTTHHTTTTHHTHTHH If we know how a random process works, what will we see in the field? Preview of

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Classical and Bayesian inference

Classical and Bayesian inference Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11 The Prior Distribution Definition Suppose that one has a statistical model with parameter

More information

Chapter 8: An Introduction to Probability and Statistics

Chapter 8: An Introduction to Probability and Statistics Course S3, 200 07 Chapter 8: An Introduction to Probability and Statistics This material is covered in the book: Erwin Kreyszig, Advanced Engineering Mathematics (9th edition) Chapter 24 (not including

More information

Introduction to Bayesian Statistics 1

Introduction to Bayesian Statistics 1 Introduction to Bayesian Statistics 1 STA 442/2101 Fall 2018 1 This slide show is an open-source document. See last slide for copyright information. 1 / 42 Thomas Bayes (1701-1761) Image from the Wikipedia

More information

Bayesian Estimation An Informal Introduction

Bayesian Estimation An Informal Introduction Mary Parker, Bayesian Estimation An Informal Introduction page 1 of 8 Bayesian Estimation An Informal Introduction Example: I take a coin out of my pocket and I want to estimate the probability of heads

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

An AI-ish view of Probability, Conditional Probability & Bayes Theorem An AI-ish view of Probability, Conditional Probability & Bayes Theorem Review: Uncertainty and Truth Values: a mismatch Let action A t = leave for airport t minutes before flight. Will A 15 get me there

More information

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty. An AI-ish view of Probability, Conditional Probability & Bayes Theorem Review: Uncertainty and Truth Values: a mismatch Let action A t = leave for airport t minutes before flight. Will A 15 get me there

More information

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Eco517 Fall 2004 C. Sims MIDTERM EXAM Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering

More information

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

More information

Applied Bayesian Statistics STAT 388/488

Applied Bayesian Statistics STAT 388/488 STAT 388/488 Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago August 29, 207 Course Info STAT 388/488 http://math.luc.edu/~ebalderama/bayes 2 A motivating example (See

More information

Choosing priors Class 15, Jeremy Orloff and Jonathan Bloom

Choosing priors Class 15, Jeremy Orloff and Jonathan Bloom 1 Learning Goals Choosing priors Class 15, 18.05 Jeremy Orloff and Jonathan Bloom 1. Learn that the choice of prior affects the posterior. 2. See that too rigid a prior can make it difficult to learn from

More information

GCSE MATHEMATICS HELP BOOKLET School of Social Sciences

GCSE MATHEMATICS HELP BOOKLET School of Social Sciences GCSE MATHEMATICS HELP BOOKLET School of Social Sciences This is a guide to ECON10061 (introductory Mathematics) Whether this guide applies to you or not please read the explanation inside Copyright 00,

More information

Using Probability to do Statistics.

Using Probability to do Statistics. Al Nosedal. University of Toronto. November 5, 2015 Milk and honey and hemoglobin Animal experiments suggested that honey in a diet might raise hemoglobin level. A researcher designed a study involving

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Intro to Probability. Andrei Barbu

Intro to Probability. Andrei Barbu Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems

More information

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

Modeling Environment

Modeling Environment Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA

More information

Chapter 18. Sampling Distribution Models /51

Chapter 18. Sampling Distribution Models /51 Chapter 18 Sampling Distribution Models 1 /51 Homework p432 2, 4, 6, 8, 10, 16, 17, 20, 30, 36, 41 2 /51 3 /51 Objective Students calculate values of central 4 /51 The Central Limit Theorem for Sample

More information

Evidence with Uncertain Likelihoods

Evidence with Uncertain Likelihoods Evidence with Uncertain Likelihoods Joseph Y. Halpern Cornell University Ithaca, NY 14853 USA halpern@cs.cornell.edu Riccardo Pucella Cornell University Ithaca, NY 14853 USA riccardo@cs.cornell.edu Abstract

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Error analysis for efficiency

Error analysis for efficiency Glen Cowan RHUL Physics 28 July, 2008 Error analysis for efficiency To estimate a selection efficiency using Monte Carlo one typically takes the number of events selected m divided by the number generated

More information

Experiment 2 Random Error and Basic Statistics

Experiment 2 Random Error and Basic Statistics PHY9 Experiment 2: Random Error and Basic Statistics 8/5/2006 Page Experiment 2 Random Error and Basic Statistics Homework 2: Turn in at start of experiment. Readings: Taylor chapter 4: introduction, sections

More information

Probability models for machine learning. Advanced topics ML4bio 2016 Alan Moses

Probability models for machine learning. Advanced topics ML4bio 2016 Alan Moses Probability models for machine learning Advanced topics ML4bio 2016 Alan Moses What did we cover in this course so far? 4 major areas of machine learning: Clustering Dimensionality reduction Classification

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Just Enough Likelihood

Just Enough Likelihood Just Enough Likelihood Alan R. Rogers September 2, 2013 1. Introduction Statisticians have developed several methods for comparing hypotheses and for estimating parameters from data. Of these, the method

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models Copyright 2010, 2007, 2004 Pearson Education, Inc. Normal Model When we talk about one data value and the Normal model we used the notation: N(μ, σ) Copyright 2010,

More information

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Analysis for Natural Language Processing Lecture 2 Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion

More information

Bayesian Analysis (Optional)

Bayesian Analysis (Optional) Bayesian Analysis (Optional) 1 2 Big Picture There are two ways to conduct statistical inference 1. Classical method (frequentist), which postulates (a) Probability refers to limiting relative frequencies

More information

Bayesian data analysis using JASP

Bayesian data analysis using JASP Bayesian data analysis using JASP Dani Navarro compcogscisydney.com/jasp-tute.html Part 1: Theory Philosophy of probability Introducing Bayes rule Bayesian reasoning A simple example Bayesian hypothesis

More information

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Solution: chapter 2, problem 5, part a:

Solution: chapter 2, problem 5, part a: Learning Chap. 4/8/ 5:38 page Solution: chapter, problem 5, part a: Let y be the observed value of a sampling from a normal distribution with mean µ and standard deviation. We ll reserve µ for the estimator

More information

Bayesian Concept Learning

Bayesian Concept Learning Learning from positive and negative examples Bayesian Concept Learning Chen Yu Indiana University With both positive and negative examples, it is easy to define a boundary to separate these two. Just with

More information

Bayesian Inference for Normal Mean

Bayesian Inference for Normal Mean Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) Exercises Introduction to Machine Learning SS 2018 Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) LAS Group, Institute for Machine Learning Dept of Computer Science, ETH Zürich Prof

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 1 Theory This section explains the theory of conjugate priors for exponential families of distributions,

More information

MCMC notes by Mark Holder

MCMC notes by Mark Holder MCMC notes by Mark Holder Bayesian inference Ultimately, we want to make probability statements about true values of parameters, given our data. For example P(α 0 < α 1 X). According to Bayes theorem:

More information

What is proof? Lesson 1

What is proof? Lesson 1 What is proof? Lesson The topic for this Math Explorer Club is mathematical proof. In this post we will go over what was covered in the first session. The word proof is a normal English word that you might

More information

Computational Perception. Bayesian Inference

Computational Perception. Bayesian Inference Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics

More information

Name: Period: Date: Ocean to Continental Convergent Plate Boundary Continental to Continental Convergent Plate Boundary

Name: Period: Date: Ocean to Continental Convergent Plate Boundary Continental to Continental Convergent Plate Boundary Name: Period: Date: Plate Tectonics Over the past few weeks in Earth Science, we have been studying about Continental drift, Seafloor Spreading and Plate tectonics. You will now use all that you have learned

More information

Experiment 2 Random Error and Basic Statistics

Experiment 2 Random Error and Basic Statistics PHY191 Experiment 2: Random Error and Basic Statistics 7/12/2011 Page 1 Experiment 2 Random Error and Basic Statistics Homework 2: turn in the second week of the experiment. This is a difficult homework

More information

CHINESE REMAINDER THEOREM

CHINESE REMAINDER THEOREM CHINESE REMAINDER THEOREM MATH CIRCLE AT WASHINGTON UNIVERSITY IN ST. LOUIS, APRIL 19, 2009 Baili MIN In a third-centry A. D. Chinese book Sun Tzu s Calculation Classic one problem is recorded which can

More information

Experiment 1: The Same or Not The Same?

Experiment 1: The Same or Not The Same? Experiment 1: The Same or Not The Same? Learning Goals After you finish this lab, you will be able to: 1. Use Logger Pro to collect data and calculate statistics (mean and standard deviation). 2. Explain

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference

More information

CS 188: Artificial Intelligence. Our Status in CS188

CS 188: Artificial Intelligence. Our Status in CS188 CS 188: Artificial Intelligence Probability Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Our Status in CS188 We re done with Part I Search and Planning! Part II: Probabilistic Reasoning

More information

Probability and (Bayesian) Data Analysis

Probability and (Bayesian) Data Analysis Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/ brewer/ Where to get everything To get all of the material (slides, code, exercises): git clone --recursive https://github.com/eggplantbren/madrid

More information