Bayes Rule: A Tutorial Introduction ( www45w9klk)
|
|
- Randall Emory Harvey
- 5 years ago
- Views:
Transcription
1 Bayes Rule: A Tutorial Introduction ( www45w9klk) by JV Stone 1 Introduction All decisions are based on data, but the best decisions are also based on previous experience. In essence, Bayes rule provides a method for making use of previous experience in order to arrive at the best decision in interpreting data. The example given below is based on speech, but Bayes rule can be applied to any situation where there is uncertainty regarding the value of a measured quantity (eg how much light is hitting each part of your retina, the identity of the person you are speaking to). As all measured quantities inevtiably include some uncertainty, Bayes rule has a wide range of applicability. If you walk into a hardware store and said, Have you got four candles? then you probably would be surprised to be asked How many fork handles do you want? (see Comedy clip on YouTube by Two Ronnies)*. Even though the two phrases: 1) Have you got four candles?, 2) Have you got fork handles?, are acoustically almost identical, the shop assistant knows that he sells many more candles than fork handles. This in turn, means that he probably does not even hear the words fork handles, but instead hears four candles. What has this got to do with Bayes rule? Figure 1: The Reverend Thomas Bayes trying to make sense of a London accent. The acoustic data that corresponds to the sounds spoken by you, the customer, are equally consistent with two interpretations, but the assistant s brain assigns a much higher weighting to one of these interpretations. This weighting is based on prior experience, so the assistant knows that customers are much more likely to ask for candles than handles. In other words, the experience of the assistant allows him to hear what was most probably said by the customer, even though the acoustic signal uttered by the customer was pretty ambiguous.
2 Without knowing it, the assistant has applied Bayes rule (or something that approximates Bayes rule) in order to hear what the customer most probably said. Here s how. 2 Conditional Probability, Likelihood, and Asking the Wrong Question If we define the two possible phrases as phrase1 = fork handles phrase2 = four candles, then we can formalise this scenario by considering the probability of the acoustic data given each of the two possible phrases. As both phrases are equally consistent with the acoustic data, the probability of the data is the same in both cases. That is, the probability of the data given that four candles was spoken is the same as the probability of the data given that the phrase fork handles was spoken. In both cases, probability of the acoustic data depends on the words spoken, and this dependence is made explicit as two probabilities: the probability of the acoustic data given that four candles was spoken, the probability of the acoustic data given that fork handles was spoken. A short-hand way of writing these is: p(data four candles ), p(data fork handles ), (1) where p stands for probability, and the vertical bar stands for given that. These are known as conditional probabilities because each probability is conditional (depends) on something else (in this case, which phrase that was spoken). More specifically, these particular conditional probabilities are known as likelihoods. For reasons that will become clear later, the expression p(data four candles ) is interpreted as the likelihood that the phrase spoken was four candles. We have already established that, to all intents and purposes, the two likelihoods are equal, that is: p(data four candles ) = p(data fork handles ) (2) As the data is consistent with both phrases, let s assume that the likelihoods are both 0.8: p(data four candles ) = 0.8 p(data fork handles ) = 0.8 (3) This gives a clear indication of the intrinsic ambiguity of many acoustic signals. Knowing these two likelihoods is insufficient to decide what the customer said, not only because (in this instance) these likelihoods are equal, but for a more in general reason. Indeed, these likelihoods provide an answer, but it is an answer to the wrong question. The likelihoods above provide an answer to the question:
3 What is the probability of the observed acoustic data given that each of two possible phrases spoken?. 3 Asking the right question: Posterior probability The right question, the question to which we (and our brains) would really like an answer is: What is the probability that each of the two possible phrases was spoken given the observed acoustic data? The answer to this, the right question, is implicit in two new conditional probabilities, the posterior probabilities p( four candles data), p( fork handles data) (4) Notice the subtle, but important, difference between the pairs of equations (1) and (4). Equations (1) tells us the likelihoods, the probability of the data given two possible phrases, which turns out to be the same for both phrases in this example. Equations (4) tells us the posterior probabilities, the probability of each phrase given the acoustic data. Crucially, each likelihood tells us the probability of the data given a particular phrase, but takes no account of how often that phrase has been given (ie has been encountered) in the past. In contrast, the posterior depends, not only on the data (in the form of the likelihood), but also on how frequently each phrase has been encountered in the past; that is, on prior experience. As the likelihood depends only on the data, it is easier to evaluate than the posterior, which depends on the data and on previous experience. For the sake of brevity, let s assume that we, and the assistant s brain, can evaluate the likelihood. So, what we want is the posterior, but what we have is the likelihood. Fortunately, Bayes rule provides a means of getting from the likelihood, to the posterior, by making use of extra knowledge in the form of prior experience. 4 Prior probability Let s suppose that the assistant has been asked for four candles a total of 90 times in the past, whereas he has been asked for fork handles only 10 times. To keep matters simple, let s also assume that the next customer will ask either for four candles or fork candles (we ll revist this simplification later). Thus, before the customer has uttered a single word, the assistant estimates that the probability that he will say each of the two phrases is p( four candles ) = 90/100 = 0.9 p( fork handles ) = 10/100 = 0.1 (5)
4 These two prior probabilities represent the prior knowledge of the assistant, based on his previous experience of what customers buy. When confronted with an acoustic signal that could mean either of the two interpretations, the assistant naturally interprets this as four candles, because, according to his past experience, this is what such an ambiguous acoustic data usually means in practice. To put it another way, his brain takes the two equally probable likelihood values, and assigns a weighting to each one, a weighting that depends on past experience. 5 Bayesian inference Figure 2. A schematic account of Bayes rule. One obvious way to implement this weighting is to simply multiply each likelihod by its corresponding prior probabilty, to estimate the posterior probaility p( four candles data) = p(data four candles ) x p( four candles ) p( fork handles data) = p(data fork handles ) x p( fork handles ) (6) If we put the likelihood and prior probability values defined in equations (3) and (5) in (6) then we obtain p( four candles data) = p(data four candles ) x p( four candles ) = 0.8 x 0.9 = 0.72 (7) p( fork handles data) = p(data fork handles ) x p( fork handles ) = 0.8 x 0.1 = (8) As these two posterior probabilities represent the answer to the right question, we can now see that the probability that the customer said four candles is 0.72 whereas the probability that the customer said fork handles was As four candles is associated with the highest value of the posterior probability, it is known as the maximum a posteriori (MAP) estimate of the phrase that was spoken.
5 We have just applied Bayes rule. The result is our best guess at which phrase was spoken given an ambiguous acoustic signal, and is known as Bayesian inference. The line of reasoning given above tells us that Bayes rule seems to implement a plausible strategy, but plausibity is not mathematical proof. That proof is not given here (even though it is not especially complicated), but it was first published posthumously in a paper by the Reverend Thomas Bayes in 1763 (in fact, the modern form of Bayes rule was actually derived by Laplace). 6 Bayes rule in full If we define h as a specific hypothesis (eg the phrase h= four candles ) and d as some data (eg d=acoustic data) then we can write the full form of Bayes rule as p(hypothesis data) = p(data hypothesis) x p(hypothesis) / p(data). or (more succinctly) p(h d) = p(d h) x p(h) / p(d) (9) where p(h d) is the posterior probability, p(d h) is the likelihood, p(h) is the prior probability of the hypothesis under consideration, and p(d) is the probability of the data, also know as the evidence. This term is a constant because it refers to the probability of observing the data in the first place. As there is only one observed set of data, its probability does not affect the particular h that yields the highest value of the posterior, and so its value is usually set to unity (one) for convenience. It s value can be obtained by marginalising over the likelihood, but this is a detail too far for this tutorial. 7 Maximum likelihood estimate (MLE) At this point, we will back up a little to consider maximum likelihood estimation. If the likelihoods are not equal then the best guess at the value of h ( four candles or fork handles ) can be based on those likelihood values. For example, let s assume that likelihoods are p(d h= four candles ) = 0.6 p(d h= fork handles ) = 0.8 (10) That is, the acoustic data is slightly more consistent with the phrase h= fork handles that the phrase h= four candles. If we have no prior experince then we effectively do not have a prior to help us decide which h is more probable. In such cases, we are forced to rely on the data alone, and to use the likelihood values to choose one of the possible hypotheses (phrases). In this example, we would choose h= fork handles. As this is the value of h associated with the maximum value of the likelihood, it is known as the maximum likelihood estimate (MLE) of the phrase that was spoken. Notice that, decision based on the MLE can be over-ruled if we had access to prior probabilities. For example, if the prior probabilities in Equations (5) and the unequal likelihoods in (10) were to be substituted in (6) then the decision (fork handles) based on the MLE would be over-ruled by the decision (four candles) based on the MAP. The decision based on the MAP is based on both data and on prior experience, so it seems intuitively
6 obvious that this decision should be better than one (the MLE) based on the data alone, and indeed this is provably true. Notes Note 1: The term prior seems to suggest previous experience, but it can be interpreted to mean any information that is not based on the current data. So the prior can be based on instinct, or just a guess at what the underlying probabilites of various hypotheses are. Note 2: The likelihood, prior and posterior are assumed to be one of a finite number of values in the above example. More generally, each of these can be derived from a probability density function (pdf). However, the logic that underpins Bayes rule is the same whether we are dealing with probabilities or probability densities. Note 3: Bayes rule is also called Bayes theorem. The word theorem is a mathematical statement which has been proven to be true (ie Bayes theorem is, by definition, true). Note 4: A London accent effectively removes the h sound from words like handle, so fork handles sounds just like four candles (think of a Michael Caine accent rather than a Hugh Grant accent). Note 5: Two lectures on Bayesian perception, which are part of a third year lecture course in psychology at Sheffield University, can be downloaded from here. L07_bayes_v2SmallJVStone.pdf L08_bayes_v2SmallJVStone.pdf *See YouTube clip of four candles, a comedy sketch by The Two Ronnies (Barker and Corbett) which inspired the example used here. JV Stone 11th December j.v.stone [at] sheffield.ac.uk Please cite this document as: Stone, JV, Bayes Rule: A Tutorial Introduction, University of Sheffield, Psychology Technical Report Number 31417, January Based on: Stone JV, Bayes Rule: A Tutorial Introduction to Bayesian Analysis, 2012.
Bayes Rule With R. A Tutorial Introduction to Bayesian Analysis. James V Stone
Bayes Rule With R A Tutorial Introduction to Bayesian Analysis James V Stone Title: Bayes Rule With R A Tutorial Introduction to Bayesian Analysis Author: James V Stone Published by Sebtel Press All rights
More informationConditional probabilities and graphical models
Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within
More informationFitting a Straight Line to Data
Fitting a Straight Line to Data Thanks for your patience. Finally we ll take a shot at real data! The data set in question is baryonic Tully-Fisher data from http://astroweb.cwru.edu/sparc/btfr Lelli2016a.mrt,
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationComputational Perception. Bayesian Inference
Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters
More informationCS 361: Probability & Statistics
October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite
More informationTutorial on Mathematical Induction
Tutorial on Mathematical Induction Roy Overbeek VU University Amsterdam Department of Computer Science r.overbeek@student.vu.nl April 22, 2014 1 Dominoes: from case-by-case to induction Suppose that you
More informationIntroduction to Algebra: The First Week
Introduction to Algebra: The First Week Background: According to the thermostat on the wall, the temperature in the classroom right now is 72 degrees Fahrenheit. I want to write to my friend in Europe,
More informationBayesian Learning Extension
Bayesian Learning Extension This document will go over one of the most useful forms of statistical inference known as Baye s Rule several of the concepts that extend from it. Named after Thomas Bayes this
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationSolving Equations by Adding and Subtracting
SECTION 2.1 Solving Equations by Adding and Subtracting 2.1 OBJECTIVES 1. Determine whether a given number is a solution for an equation 2. Use the addition property to solve equations 3. Determine whether
More informationQuantitative Understanding in Biology 1.7 Bayesian Methods
Quantitative Understanding in Biology 1.7 Bayesian Methods Jason Banfelder October 25th, 2018 1 Introduction So far, most of the methods we ve looked at fall under the heading of classical, or frequentist
More informationA Primer on Statistical Inference using Maximum Likelihood
A Primer on Statistical Inference using Maximum Likelihood November 3, 2017 1 Inference via Maximum Likelihood Statistical inference is the process of using observed data to estimate features of the population.
More informationUni- and Bivariate Power
Uni- and Bivariate Power Copyright 2002, 2014, J. Toby Mordkoff Note that the relationship between risk and power is unidirectional. Power depends on risk, but risk is completely independent of power.
More informationProbability is related to uncertainty and not (only) to the results of repeated experiments
Uncertainty probability Probability is related to uncertainty and not (only) to the results of repeated experiments G. D Agostini, Probabilità e incertezze di misura - Parte 1 p. 40 Uncertainty probability
More informationProbability, Entropy, and Inference / More About Inference
Probability, Entropy, and Inference / More About Inference Mário S. Alvim (msalvim@dcc.ufmg.br) Information Theory DCC-UFMG (2018/02) Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference
More informationLecture Notes on Certifying Theorem Provers
Lecture Notes on Certifying Theorem Provers 15-317: Constructive Logic Frank Pfenning Lecture 13 October 17, 2017 1 Introduction How do we trust a theorem prover or decision procedure for a logic? Ideally,
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationDesigning Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way
EECS 16A Designing Information Devices and Systems I Fall 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate it
More informationNatural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Maximum Entropy Models I Welcome back for the 3rd module
More informationThe dark energ ASTR509 - y 2 puzzl e 2. Probability ASTR509 Jasper Wal Fal term
The ASTR509 dark energy - 2 puzzle 2. Probability ASTR509 Jasper Wall Fall term 2013 1 The Review dark of energy lecture puzzle 1 Science is decision - we must work out how to decide Decision is by comparing
More informationBiol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016
Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016 By Philip J. Bergmann 0. Laboratory Objectives 1. Learn what Bayes Theorem and Bayesian Inference are 2. Reinforce the properties
More informationRecall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem
Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationSolution: chapter 2, problem 5, part a:
Learning Chap. 4/8/ 5:38 page Solution: chapter, problem 5, part a: Let y be the observed value of a sampling from a normal distribution with mean µ and standard deviation. We ll reserve µ for the estimator
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 4: Probabilistic Retrieval Models April 29, 2010 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig
More informationStatistical Methods in Particle Physics Lecture 1: Bayesian methods
Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan
More information1.7: Bayes Theorem. Jiakun Pan. Feb 4, 2019
1.7: Bayes Theorem Jiakun Pan Feb 4, 2019 Bayesian Filtering For experiments of two parts, Bayesian filtering is an idea of predicting the second part based on the first part: Bayesian Filtering For experiments
More informationJoint, Conditional, & Marginal Probabilities
Joint, Conditional, & Marginal Probabilities The three axioms for probability don t discuss how to create probabilities for combined events such as P [A B] or for the likelihood of an event A given that
More informationHestenes lectures, Part 5. Summer 1997 at ASU to 50 teachers in their 3 rd Modeling Workshop
Hestenes lectures, Part 5. Summer 1997 at ASU to 50 teachers in their 3 rd Modeling Workshop WHAT DO WE TEACH? The question What do we teach? has to do with What do we want to learn? A common instructional
More informationBayesian Inference. 2 CS295-7 cfl Michael J. Black,
Population Coding Now listen to me closely, young gentlemen. That brain is thinking. Maybe it s thinking about music. Maybe it has a great symphony all thought out or a mathematical formula that would
More informationIntroduction to Bayesian Statistics 1
Introduction to Bayesian Statistics 1 STA 442/2101 Fall 2018 1 This slide show is an open-source document. See last slide for copyright information. 1 / 42 Thomas Bayes (1701-1761) Image from the Wikipedia
More informationTDA231. Logistic regression
TDA231 Devdatt Dubhashi dubhashi@chalmers.se Dept. of Computer Science and Engg. Chalmers University February 19, 2016 Some data 5 x2 0 5 5 0 5 x 1 In the Bayes classifier, we built a model of each class
More informationNaive Bayes classification
Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationCommunication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi
Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 41 Pulse Code Modulation (PCM) So, if you remember we have been talking
More informationFrequentist Statistics and Hypothesis Testing Spring
Frequentist Statistics and Hypothesis Testing 18.05 Spring 2018 http://xkcd.com/539/ Agenda Introduction to the frequentist way of life. What is a statistic? NHST ingredients; rejection regions Simple
More informationLecture 3: Probabilistic Retrieval Models
Probabilistic Retrieval Models Information Retrieval and Web Search Engines Lecture 3: Probabilistic Retrieval Models November 5 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme
More informationINTRODUCTION TO ANALYSIS OF VARIANCE
CHAPTER 22 INTRODUCTION TO ANALYSIS OF VARIANCE Chapter 18 on inferences about population means illustrated two hypothesis testing situations: for one population mean and for the difference between two
More informationIntroduction. Introductory Remarks
Introductory Remarks This is probably your first real course in quantum mechanics. To be sure, it is understood that you have encountered an introduction to some of the basic concepts, phenomenology, history,
More informationDo students sleep the recommended 8 hours a night on average?
BIEB100. Professor Rifkin. Notes on Section 2.2, lecture of 27 January 2014. Do students sleep the recommended 8 hours a night on average? We first set up our null and alternative hypotheses: H0: μ= 8
More informationIntroduction. Introductory Remarks
Introductory Remarks This is probably your first real course in quantum mechanics. To be sure, it is understood that you have encountered an introduction to some of the basic concepts, phenomenology, history,
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationMITOCW watch?v=rwzg8ieoc8s
MITOCW watch?v=rwzg8ieoc8s PROFESSOR: ih bar d psi dt equal E psi where E hat is equal to p squared over 2m, the operator. That is the Schrodinger equation. The free particle Schrodinger equation-- you
More informationCSC321 Lecture 7 Neural language models
CSC321 Lecture 7 Neural language models Roger Grosse and Nitish Srivastava February 1, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 7 Neural language models February 1, 2015 1 / 19 Overview We
More informationRelationship between Least Squares Approximation and Maximum Likelihood Hypotheses
Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Steven Bergner, Chris Demwell Lecture notes for Cmpt 882 Machine Learning February 19, 2004 Abstract In these notes, a
More informationRecap on Data Assimilation
Concluding Thoughts Recap on Data Assimilation FORECAST ANALYSIS Kalman Filter Forecast Analysis Analytical projection of the ANALYSIS mean and cov from t-1 to the FORECAST mean and cov for t Update FORECAST
More informationBiol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference
Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference By Philip J. Bergmann 0. Laboratory Objectives 1. Learn what Bayes Theorem and Bayesian Inference are 2. Reinforce the properties of Bayesian
More informationMath 300 Introduction to Mathematical Reasoning Autumn 2017 Proof Templates 1
Math 300 Introduction to Mathematical Reasoning Autumn 2017 Proof Templates 1 In its most basic form, a mathematical proof is just a sequence of mathematical statements, connected to each other by strict
More informationJoint, Conditional, & Marginal Probabilities
Joint, Conditional, & Marginal Probabilities Statistics 110 Summer 2006 Copyright c 2006 by Mark E. Irwin Joint, Conditional, & Marginal Probabilities The three axioms for probability don t discuss how
More informationProbability and Statistics
Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT
More informationNP-Completeness I. Lecture Overview Introduction: Reduction and Expressiveness
Lecture 19 NP-Completeness I 19.1 Overview In the past few lectures we have looked at increasingly more expressive problems that we were able to solve using efficient algorithms. In this lecture we introduce
More informationBayesian Regression (1/31/13)
STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationDesigning Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way
EECS 16A Designing Information Devices and Systems I Spring 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate
More informationQuiz 1. Name: Instructions: Closed book, notes, and no electronic devices.
Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices. 1. What is the difference between a deterministic model and a probabilistic model? (Two or three sentences only). 2. What is the
More informationConfidence intervals CE 311S
CE 311S PREVIEW OF STATISTICS The first part of the class was about probability. P(H) = 0.5 P(T) = 0.5 HTTHHTTTTHHTHTHH If we know how a random process works, what will we see in the field? Preview of
More informationLogarithms and Exponentials
Logarithms and Exponentials Steven Kaplan Department of Physics and Astronomy, Rutgers University The Basic Idea log b x =? Whoa...that looks scary. What does that mean? I m glad you asked. Let s analyze
More information1 Using standard errors when comparing estimated values
MLPR Assignment Part : General comments Below are comments on some recurring issues I came across when marking the second part of the assignment, which I thought it would help to explain in more detail
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting
More informationForward algorithm vs. particle filtering
Particle Filtering ØSometimes X is too big to use exact inference X may be too big to even store B(X) E.g. X is continuous X 2 may be too big to do updates ØSolution: approximate inference Track samples
More informationBayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)
Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) Background: Conditional probability A P (B A) = A,B P (A,
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationIntro to Bayesian Methods
Intro to Bayesian Methods Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Lecture 1 1 Course Webpage Syllabus LaTeX reference manual R markdown reference manual Please come to office
More informationBefore you get started, make sure you ve read Chapter 1, which sets the tone for the work we will begin doing here.
Chapter 2 Mathematics and Logic Before you get started, make sure you ve read Chapter 1, which sets the tone for the work we will begin doing here. 2.1 A Taste of Number Theory In this section, we will
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: A Bayesian model of concept learning Chris Lucas School of Informatics University of Edinburgh October 16, 218 Reading Rules and Similarity in Concept Learning
More informationModule 03 Lecture 14 Inferential Statistics ANOVA and TOI
Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module
More informationUncertainty. Michael Peters December 27, 2013
Uncertainty Michael Peters December 27, 20 Lotteries In many problems in economics, people are forced to make decisions without knowing exactly what the consequences will be. For example, when you buy
More information6.042/18.062J Mathematics for Computer Science. Induction I
6.04/18.06J Mathematics for Computer Science Srini Devadas and Eric Lehman February 8, 005 Lecture Notes Induction I 1 Induction A professor brings to class a bottomless bag of assorted miniature candy
More informationStatistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests
Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests http://benasque.org/2018tae/cgi-bin/talks/allprint.pl TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics
More informationAdvanced Probabilistic Modeling in R Day 1
Advanced Probabilistic Modeling in R Day 1 Roger Levy University of California, San Diego July 20, 2015 1/24 Today s content Quick review of probability: axioms, joint & conditional probabilities, Bayes
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationCOMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!
COMP61011 Probabilistic Classifiers Part 1, Bayes Theorem Reverend Thomas Bayes, 1702-1761 p ( T W ) W T ) T ) W ) Bayes Theorem forms the backbone of the past 20 years of ML research into probabilistic
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationStatistical Methods for Particle Physics (I)
Statistical Methods for Particle Physics (I) https://agenda.infn.it/conferencedisplay.py?confid=14407 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan
More information( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of
Factoring Review for Algebra II The saddest thing about not doing well in Algebra II is that almost any math teacher can tell you going into it what s going to trip you up. One of the first things they
More informationThe Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1
The Exciting Guide To Probability Distributions Part 2 Jamie Frost v. Contents Part 2 A revisit of the multinomial distribution The Dirichlet Distribution The Beta Distribution Conjugate Priors The Gamma
More informationProbability Intro Part II: Bayes Rule
Probability Intro Part II: Bayes Rule Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314) Spring, 2016 lecture 13 Quick recap Random variable X takes on different values according to a probability
More informationL14. 1 Lecture 14: Crash Course in Probability. July 7, Overview and Objectives. 1.2 Part 1: Probability
L14 July 7, 2017 1 Lecture 14: Crash Course in Probability CSCI 1360E: Foundations for Informatics and Analytics 1.1 Overview and Objectives To wrap up the fundamental concepts of core data science, as
More informationwhere Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.
Notes on regression analysis 1. Basics in regression analysis key concepts (actual implementation is more complicated) A. Collect data B. Plot data on graph, draw a line through the middle of the scatter
More informationToday. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?
Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates
More informationBayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014
Bayes Formula MATH 07: Finite Mathematics University of Louisville March 26, 204 Test Accuracy Conditional reversal 2 / 5 A motivating question A rare disease occurs in out of every 0,000 people. A test
More informationMachine Learning. Classification. Bayes Classifier. Representing data: Choosing hypothesis class. Learning: h:x a Y. Eric Xing
Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Naïve Bayes Classifier Eric Xing Lecture 3, January 23, 2006 Reading: Chap. 4 CB and handouts Classification Representing data: Choosing hypothesis
More informationIntroducing Proof 1. hsn.uk.net. Contents
Contents 1 1 Introduction 1 What is proof? 1 Statements, Definitions and Euler Diagrams 1 Statements 1 Definitions Our first proof Euler diagrams 4 3 Logical Connectives 5 Negation 6 Conjunction 7 Disjunction
More informationIntroduction to Basic Proof Techniques Mathew A. Johnson
Introduction to Basic Proof Techniques Mathew A. Johnson Throughout this class, you will be asked to rigorously prove various mathematical statements. Since there is no prerequisite of a formal proof class,
More informationContingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878
Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each
More informationProblem Solving Strategies: Sampling and Heuristics. Kevin H. Knuth Department of Physics University at Albany Albany NY, USA
Problem Solving Strategies: Sampling and Heuristics Department of Physics University at Albany Albany NY, USA Outline Methodological Differences Inverses vs. Inferences Problem Transformation From Inference
More informationMATH 22 FUNCTIONS: ORDER OF GROWTH. Lecture O: 10/21/2003. The old order changeth, yielding place to new. Tennyson, Idylls of the King
MATH 22 Lecture O: 10/21/2003 FUNCTIONS: ORDER OF GROWTH The old order changeth, yielding place to new. Tennyson, Idylls of the King Men are but children of a larger growth. Dryden, All for Love, Act 4,
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationMarkov Chain Monte Carlo
Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/~brewer/ Emphasis I will try to emphasise the underlying ideas of the methods. I will not be teaching specific software
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationUncertain Inference and Artificial Intelligence
March 3, 2011 1 Prepared for a Purdue Machine Learning Seminar Acknowledgement Prof. A. P. Dempster for intensive collaborations on the Dempster-Shafer theory. Jianchun Zhang, Ryan Martin, Duncan Ermini
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationBayesian Updating with Discrete Priors Class 11, Jeremy Orloff and Jonathan Bloom
1 Learning Goals ian Updating with Discrete Priors Class 11, 18.05 Jeremy Orloff and Jonathan Bloom 1. Be able to apply theorem to compute probabilities. 2. Be able to define the and to identify the roles
More informationCONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 4 Problems with small populations 9 II. Why Random Sampling is Important 10 A myth,
More informationThe Inductive Proof Template
CS103 Handout 24 Winter 2016 February 5, 2016 Guide to Inductive Proofs Induction gives a new way to prove results about natural numbers and discrete structures like games, puzzles, and graphs. All of
More informationBayesian Inference. Chris Mathys Wellcome Trust Centre for Neuroimaging UCL. London SPM Course
Bayesian Inference Chris Mathys Wellcome Trust Centre for Neuroimaging UCL London SPM Course Thanks to Jean Daunizeau and Jérémie Mattout for previous versions of this talk A spectacular piece of information
More informationLecture 5: Bayes pt. 1
Lecture 5: Bayes pt. 1 D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Bayes Probabilities
More information