Introduction to Bayesian Statistics

Size: px
Start display at page:

Download "Introduction to Bayesian Statistics"

Transcription

1 School of Computing & Communication, UTS January, 207

2 Random variables Pre-university: A number is just a fixed value. When we talk about probabilities: When X is a continuous random variable, it has a probability density function (pdf) When X is a discrete random variable, it has a probability mass function (pmf) p(x) p(x x) means that: The probability when a random variable X is equal to a fixed number x, i.e., the probablity that number of machine learning participants 20

3 Mean or Expectation discrete case: continous case: µ E(X) N x i N i µ E(X) xp(x)dx x S can also measure the expecation of a function: E(f (X)) f (x)p(x)dx x S For example, E(cos(X)) cos(x)p(x)dx E(X 2 ) x 2 p(x)dx x S x S What about f (E(X)): Discuss later when we discuss Jensens Equality in Expecation-Maximization

4 Variances an intuitive explanation You have data X {2,,, 2,, 4}, i.e., x 2, x 2,... x 4 You have the mean: µ The variance is then: VAR(data) σ 2 N N (x i µ) 2 i Division by N is intuitive. Otherwise, more data implies more variance Also think about what kind of values can VAR and σ take? - we will look at what kind of distribuiton is required for them.

5 Two alternative expression: People sometimes use: You have data X {2,,, 2,, 4}, i.e., x 2, x 2,... x 4 VAR(X) σ 2 N (x i µ) 2 N i N x 2 i N 2x i µ + N µ 2 N N N i i i N x 2 i 2µ N x i +µ 2 N N i i }{{} µ ( ) N x 2 i µ 2 N i Other times, people use: You have data X {, 2,, 4}, and P(X ) 2, P(X 2) 2, P(X ) and P(X 4). Discrete :VAR(X) σ 2 x X(x µ) 2 p(x) Continous :VAR(X) σ 2 (x µ) 2 p(x) x X VAR(X) x X x 2 p(x) 2 x X x X x 2 p(x) 2µ x X x X ( x 2 2µx + µ 2) p(x) µxp(x) + x X µ 2 p(x) xp(x) +µ 2 p(x) x X x 2 p(x) µ 2 x X } {{ } µ } {{ } It s easy to verify that both sides are the same

6 Numerical example First version X {2,,, 2,, 4}, i.e., x 2, x 2,... x 4 ( ) N VAR(X) x 2 i µ 2 N i (2 2.5)2 + ( 2.5) 2 + ( 2.5) 2 + (2 2.5) 2 + ( 2.5) 2 + (4 2.5) Both sides are the same Second version X {, 2,, 4}, and P(X ) 2, P(X 2) 2, P(X ) and P(X 4). Discrete :VAR(X) σ 2 x X(x µ) 2 p(x) Continous :VAR(X) σ 2 x X VAR(X) x 2 p(x) µ 2 x X (x µ) 2 p(x) }{{} f (x) ( 2.5) 2 + (2 2.5)2 2 + ( 2.5)2 2 + (4 2.5)

7 Important fact of the Variances VAR(X) E[(X E(X)) 2 ] (x µ) 2 p(x)dx x S x S E(X 2 ) (E(X)) 2 x 2 p(x)dx 2µ xp(x)dx + µ 2 xp(x)dx x S x S Think VAR(X) as mean-subtracted second order moment of random variable X.

8 Joint distributions The following is a tablet form of joint density Pr(X, Y ): Y 0 Y Y 2 Total X X X Total This table shows Pr(X, Y ) or Pr(X x, Y y). For example, p(x, Y ) 5 : exercise what is the probablity that X 2, Y? exercise what is the probablity that X, Y 2? exercise what is the value of: 2 2 Pr(X i, Y j)? i0 j0

9 Marginal distributions Y 0 Y Y 2 Total X X X Total Using sum rule, the marginal distribution tells us that: Pr(X) Pr(x, y) or p(x) p(x, y)dy y S y S y y For example: Pr(Y ) 2 2 i0 j0 exercise what is Pr(X 2) and Pr(X )? p(x i, y )

10 Conditional distributions Y 0 Y Y 2 Total X X X Total Conditional density: p(x Y ) p(x, Y ) p(y ) p(y X)p(X) p(y ) What about p(x Y y)? Pick an example: p(y X)p(X) X p(y X)p(X) p(x Y ) p(x, Y ) p(y ) /5 9/5 2

11 Conditional distributions: Exercise Y 0 Y Y 2 Total X X X Total The formulation for conditional density: p(x Y ) p(x, Y ) p(y ) exercise: What is p(x 2 Y )? exercise: What is p(x Y 2)? p(y X)p(X) p(y ) p(y X)p(X) X p(y X)p(X)

12 Independence If X and Y are independent: p(x Y ) p(x) p(x, Y ) P(X)P(Y ) Both factors are related when A and B are independent: p(x Y ) p(x, Y ) p(y ) p(x)p(y ) p(y ) p(x) Y 0 Y Y 2 Total X X X Total X and Y are NOT independent X X X Total 5 Y 0 Y Y 2 Total X and Y are independent

13 Conditional Independence Imagine we have three random variables: X, Y and Z : Once we know Z, then knowing Y does NOT tell us any additional information about X Therefore: Pr(X Y, Z ) Pr(X Z ) This means that X is conditionally independent of Y given Z. If Pr(X Y, Z ) Pr(X Z ), then what about Pr(X, Y Z )? Pr(X, Y Z ) Pr(X, Y, Z ) Pr(Z ) Pr(X Y, Z ) Pr(Y Z ) Pr(X Z ) Pr(Y Z ) Pr(X Y, Z ) Pr(Y, Z ) Pr(Z )

14 An example of Conditional Independence We will study Dynamic model later. x t x t x t+ y t y t y t+ From this model, we can see: p(x t x,..., x t, y,..., y t ) p(x t x t ) p(y t x,..., x t, x t, y,..., y t ) p(y t x t ) Right now, think of if a given variable is the only item that blocks the path between two (or more) variables.

15 Another Example: Bayesian Linear Regression We have data pairs: Input: X x,... x N Output: Y y,... y N Each pair, x i and y i are related through model equation: y i f (x i w) + N (0, σ 2 ) Input alone isn t going to tell you model parameter: p(w X) p(w) Output alone isn t going to tell you model parameter: p(w Y ) p(w) Obviously: p(w X, Y ) p(w) Posterior over parameter w: p(w x, y) p(y w, x)p(w x)p(x) p(y x)p(x) p(y w, x)p(w) p(y x) p(y w, x)p(w) w p(y x, w)p(w)

16 Expectation of Joint probabilities Given that X, Y is a two-dimensional random variable: Continous case: E[f (X, Y )] f (x, y)p(x, y)dxdy y S y x S x Discrete case: N i N j E[f (X, Y )] f (X i, Y j)p(x i, Y j) i j

17 Numerical Example: Y Y 2 Y X X X p (X,Y) Y Y 2 Y X 7 8 X 2 2 X 8 f (X,Y) N N i j E[f (X, Y )] f (X i, Y j)p(x i, Y j) i j

18 Conditional Expectation It s a useful property for later E(Y ) X X E(Y X)p(X)dx yp(y X)dy Y }{{} ( y p(y, X)dx Y X yp(y )dy E(Y ) Y p(x)dx yp(y, X)dydx X Y ) dy

19 Bayesian Predictive distribution Put marginal distribution and Conditional Independence into a test: Very often, in machine learning, you want to compute the probability of new data y given training data Y, i.e., p(y Y ). You assume there are some model explains both Y and y. The model parameter is θ: p(y Y ) p(y θ)p(θ Y )dθ θ Excercise, tell me why the above works?

20 Revisit Bayes Theorem Instead of using arbitrary random variable symbols, we now use: θ for model parameter and X x,... x n for dataset: p(θ X) }{{} posterior p(x θ) p(θ) }{{}}{{} likelihood prior p(x) }{{} normalization constant p(x θ)p(θ) θ p(x θ)p(θ)

21 An Intrusion Detection System (IDS) Example The setting: Imagine out of all the TCP connections (say millons), % of which are intrusions: When there is an intrusion, the probability of system sends alarm is 87%. When there is no intrusion, the probability of system sends alarm is %. Prior probability: % of which are intrusions p(θ intrusion) 0.0 p(θ no intrusion) 0.99 Likelihood probability: given intrusion occur, probability of system sends alarm is 87% p(x alarm θ intrusion) 0.87 p(x no alarm θ intrusion) 0. given there is no intrusion, the probability of system sends alarm is %: p(x alarm θ no intrusion) 0.0 p(x no alarm θ no intrusion) 0.94

22 Posterior We are interested in posterior probability: Pr(θ X): There 2 two possible values for parameter θ and 2 possible observation X Therefore, there are 4 rates we need to compute: True Positive When system sends alarm, probability of an intrusion occurs: Pr(θ intrusion X alarm) False Positive When system sends alarm, probability that there is no intrusion: Pr(θ no intrusion X alarm) True Negative When system sends no alarm, probability that there is no intrusion: Pr(θ no intrusion X no alarm) False Negative When system sends no alarm, probability that an intrusion occurs: Pr(θ intrusion X no alarm) Question which are the two probabilities you d like to maximise?

23 Apply Bayes Theorem in this setting Pr(X θ) Pr(θ) Pr(θ X) θ Pr(X θ) Pr(θ) Pr(X θ) Pr(θ) Pr(X θ Intrusion) Pr(θ Intrusion) + Pr(X θ no intrusion) Pr(θ no intrusion)

24 Apply Bayes Theorem in this setting True Positive rate When system sends alarm, what is the probability of an intrusion occurs: Pr(θ intrusion X alarm) Pr(X alarm θ intrusion) Pr(θ intrusion) Pr(X alarm θ Intrusion) Pr(θ Intrusion) + Pr(X alarm θ no intrusion) Pr(θ Intrusion) False Positive rate When system sends alarm, what is the probability that there is no intrusion: Pr(θ no intrusion X alarm) Pr(X alarm θ no intrusion) Pr(θ no intrusion) Pr(X alarm θ no intrusion) Pr(θ no intrusion) + Pr(X alarm θ no intrusion) Pr(θ no intrusion)

25 Apply Bayes Theorem in this setting False Negative When system sends no alarm, what is the probability that an intrusion occurs? Pr(θ intrusion X no alarm) Pr(X no alarm θ intrusion)p(θ intrusion) Pr(X no alarm θ Intrusion) Pr(θ Intrusion) + Pr(X no alarm θ no intrusion) Pr(θ no Intrusion) True Negative When system sends no alarm, what is the probability that there is no intrusion? Pr(θ no intrusion X no alarm) Pr(X no alarm θ no intrusion) Pr(θ no intrusion) Pr(X no alarm θ no intrusion) Pr(θ no intrusion) + Pr(X no alarm θ no intrusion) Pr(θ no intrusion)

26 Statistics way to think about Posterior Inference The posterior inference is to find the best q(θ) to approximate p(θ X), such that: { inf q(θ) Q KL(q(θ) p(θ)) Eθ q(θ) ln(p(x θ) } { inf q(θ) Q ln q(θ) } θ p(θ) q(θ) ln(p(x θ)q(θ) θ { } inf q(θ) Q [ln q(θ) (ln p(θ) + ln p(x θ))] q(θ) θ { [ ] } q(θ) inf q(θ) Q ln q(θ) θ p(θ)p(x θ) { [ p(x) inf q(θ) Q ln q(θ) ] } q(θ) θ p(θ X) inf q(θ) Q {KL(q(θ) p(θ X))}

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

Statistical Machine Learning Lectures 4: Variational Bayes

Statistical Machine Learning Lectures 4: Variational Bayes 1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

(3) Review of Probability. ST440/540: Applied Bayesian Statistics

(3) Review of Probability. ST440/540: Applied Bayesian Statistics Review of probability The crux of Bayesian statistics is to compute the posterior distribution, i.e., the uncertainty distribution of the parameters (θ) after observing the data (Y) This is the conditional

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Probability and Statistical Decision Theory

Probability and Statistical Decision Theory Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Probability and Statistical Decision Theory Many slides attributable to: Erik Sudderth (UCI) Prof. Mike Hughes

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Statistical Learning Theory

Statistical Learning Theory Statistical Learning Theory Part I : Mathematical Learning Theory (1-8) By Sumio Watanabe, Evaluation : Report Part II : Information Statistical Mechanics (9-15) By Yoshiyuki Kabashima, Evaluation : Report

More information

Intro to Probability. Andrei Barbu

Intro to Probability. Andrei Barbu Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Bivariate distributions

Bivariate distributions Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

PROBABILITY THEORY REVIEW

PROBABILITY THEORY REVIEW PROBABILITY THEORY REVIEW CMPUT 466/551 Martha White Fall, 2017 REMINDERS Assignment 1 is due on September 28 Thought questions 1 are due on September 21 Chapters 1-4, about 40 pages If you are printing,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

Probability Review. September 25, 2015

Probability Review. September 25, 2015 Probability Review September 25, 2015 We need a tool to 1) Formulate a model of some phenomenon. 2) Learn an instance of the model from data. 3) Use it to infer outputs from new inputs. Why Probability?

More information

Problem Y is an exponential random variable with parameter λ = 0.2. Given the event A = {Y < 2},

Problem Y is an exponential random variable with parameter λ = 0.2. Given the event A = {Y < 2}, ECE32 Spring 25 HW Solutions April 6, 25 Solutions to HW Note: Most of these solutions were generated by R. D. Yates and D. J. Goodman, the authors of our textbook. I have added comments in italics where

More information

Exercise 1: Basics of probability calculus

Exercise 1: Basics of probability calculus : Basics of probability calculus Stig-Arne Grönroos Department of Signal Processing and Acoustics Aalto University, School of Electrical Engineering stig-arne.gronroos@aalto.fi [21.01.2016] Ex 1.1: Conditional

More information

Lecture 11. Probability Theory: an Overveiw

Lecture 11. Probability Theory: an Overveiw Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Classical and Bayesian inference

Classical and Bayesian inference Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11 The Prior Distribution Definition Suppose that one has a statistical model with parameter

More information

Introduction into Bayesian statistics

Introduction into Bayesian statistics Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists

More information

DD Advanced Machine Learning

DD Advanced Machine Learning Modelling Carl Henrik {chek}@csc.kth.se Royal Institute of Technology November 4, 2015 Who do I think you are? Mathematically competent linear algebra multivariate calculus Ok programmers Able to extend

More information

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2 You can t see this text! Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2 Eric Zivot Spring 2015 Eric Zivot (Copyright 2015) Probability Review - Part 2 1 /

More information

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

SDS 321: Introduction to Probability and Statistics

SDS 321: Introduction to Probability and Statistics SDS 321: Introduction to Probability and Statistics Lecture 17: Continuous random variables: conditional PDF Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin

More information

Review: mostly probability and some statistics

Review: mostly probability and some statistics Review: mostly probability and some statistics C2 1 Content robability (should know already) Axioms and properties Conditional probability and independence Law of Total probability and Bayes theorem Random

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours MATH2750 This question paper consists of 8 printed pages, each of which is identified by the reference MATH275. All calculators must carry an approval sticker issued by the School of Mathematics. c UNIVERSITY

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Lecture 13 Fundamentals of Bayesian Inference

Lecture 13 Fundamentals of Bayesian Inference Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up

More information

Some Concepts of Probability (Review) Volker Tresp Summer 2018

Some Concepts of Probability (Review) Volker Tresp Summer 2018 Some Concepts of Probability (Review) Volker Tresp Summer 2018 1 Definition There are different way to define what a probability stands for Mathematically, the most rigorous definition is based on Kolmogorov

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August

More information

Computational Perception. Bayesian Inference

Computational Perception. Bayesian Inference Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.

More information

Random Variables and Expectations

Random Variables and Expectations Inside ECOOMICS Random Variables Introduction to Econometrics Random Variables and Expectations A random variable has an outcome that is determined by an experiment and takes on a numerical value. A procedure

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

1 Expectation Maximization

1 Expectation Maximization Introduction Expectation-Maximization Bibliographical notes 1 Expectation Maximization Daniel Khashabi 1 khashab2@illinois.edu 1.1 Introduction Consider the problem of parameter learning by maximizing

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Fourier and Stats / Astro Stats and Measurement : Stats Notes Fourier and Stats / Astro Stats and Measurement : Stats Notes Andy Lawrence, University of Edinburgh Autumn 2013 1 Probabilities, distributions, and errors Laplace once said Probability theory is nothing

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n JOINT DENSITIES - RANDOM VECTORS - REVIEW Joint densities describe probability distributions of a random vector X: an n-dimensional vector of random variables, ie, X = (X 1,, X n ), where all X is are

More information

STA 256: Statistics and Probability I

STA 256: Statistics and Probability I Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. There are situations where one might be interested

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Bristol Machine Learning Reading Group

Bristol Machine Learning Reading Group Bristol Machine Learning Reading Group Introduction to Variational Inference Carl Henrik Ek - carlhenrik.ek@bristol.ac.uk November 25, 2016 http://www.carlhenrik.com Introduction Ronald Aylmer Fisher 1

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

CS 109 Review. CS 109 Review. Julia Daniel, 12/3/2018. Julia Daniel

CS 109 Review. CS 109 Review. Julia Daniel, 12/3/2018. Julia Daniel CS 109 Review CS 109 Review Julia Daniel, 12/3/2018 Julia Daniel Dec. 3, 2018 Where we re at Last week: ML wrap-up, theoretical background for modern ML This week: course overview, open questions after

More information

INTRODUCTION TO BAYESIAN METHODS II

INTRODUCTION TO BAYESIAN METHODS II INTRODUCTION TO BAYESIAN METHODS II Abstract. We will revisit point estimation and hypothesis testing from the Bayesian perspective.. Bayes estimators Let X = (X,..., X n ) be a random sample from the

More information

Practice Examination # 3

Practice Examination # 3 Practice Examination # 3 Sta 23: Probability December 13, 212 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use a single

More information

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework

More information

SDS 321: Introduction to Probability and Statistics

SDS 321: Introduction to Probability and Statistics SDS 321: Introduction to Probability and Statistics Lecture 13: Expectation and Variance and joint distributions Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin

More information

Machine Learning Srihari. Probability Theory. Sargur N. Srihari

Machine Learning Srihari. Probability Theory. Sargur N. Srihari Probability Theory Sargur N. Srihari srihari@cedar.buffalo.edu 1 Probability Theory with Several Variables Key concept is dealing with uncertainty Due to noise and finite data sets Framework for quantification

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

STAT 430/510: Lecture 16

STAT 430/510: Lecture 16 STAT 430/510: Lecture 16 James Piette June 24, 2010 Updates HW4 is up on my website. It is due next Mon. (June 28th). Starting today back at section 6.7 and will begin Ch. 7. Joint Distribution of Functions

More information

Fundamentals of Machine Learning for Predictive Data Analytics

Fundamentals of Machine Learning for Predictive Data Analytics Fundamentals of Machine Learning for Predictive Data Analytics Chapter 6: Probability-based Learning Sections 6.1, 6.2, 6.3 John Kelleher and Brian Mac Namee and Aoife D Arcy john.d.kelleher@dit.ie brian.macnamee@ucd.ie

More information

Probability Review. Chao Lan

Probability Review. Chao Lan Probability Review Chao Lan Let s start with a single random variable Random Experiment A random experiment has three elements 1. sample space Ω: set of all possible outcomes e.g.,ω={1,2,3,4,5,6} 2. event

More information

Chapter 4 Multiple Random Variables

Chapter 4 Multiple Random Variables Review for the previous lecture Definition: n-dimensional random vector, joint pmf (pdf), marginal pmf (pdf) Theorem: How to calculate marginal pmf (pdf) given joint pmf (pdf) Example: How to calculate

More information

Introduction to the regression problem. Luca Martino

Introduction to the regression problem. Luca Martino Introduction to the regression problem Luca Martino 2017 2018 1 / 30 Approximated outline of the course 1. Very basic introduction to regression 2. Gaussian Processes (GPs) and Relevant Vector Machines

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20

Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20 Problem Set MAS 6J/.6J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 0 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain a

More information

ECE531 Homework Assignment Number 6 Solution

ECE531 Homework Assignment Number 6 Solution ECE53 Homework Assignment Number 6 Solution Due by 8:5pm on Wednesday 3-Mar- Make sure your reasoning and work are clear to receive full credit for each problem.. 6 points. Suppose you have a scalar random

More information

An introduction to Variational calculus in Machine Learning

An introduction to Variational calculus in Machine Learning n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information