Statistical Machine Learning Lecture 1: Motivation

Size: px
Start display at page:

Download "Statistical Machine Learning Lecture 1: Motivation"

Transcription

1 1 / 65 Statistical Machine Learning Lecture 1: Motivation Melih Kandemir Özyeğin University, İstanbul, Turkey

2 2 / 65 What is this course about? Using the science of statistics to build machine learning models Training such models Inference Applications will be mainly on deep neural nets. What a surprise =) CS 458/558 Statistical Machine Learning Statistical Probabilistic Bayesian Probabilistic? Bayesian?

3 3 / 65 What is it NOT about? Introduction to machine learning Introduction to deep learning Introduction to probability and statistics Advanced probability and statistics Advanced Bayesian theory

4 4 / 65 Textbook None. The field is evolving in lightspeed! A brand new text book is already outdated. A rather new one is already archaic! Read scientific articles, book sections, and course slides.

5 5 / 65 Primadonnas

6 6 / 65 Grading protocol for 458 Four programming assignments (10% each) Implement a model, test on data, write one-page report. Python, TensorFlow Midterm exam (20%) - Open-book! Final exam (40%) - Open-book! Open-book means: You can keep during the exam your Lecture slides - Yes! Textbooks - Yes! Any written text of your own - Yes! Article print outs - Yes! Electronic devices - No! No free lunch No free points for memorizing material Cheap lunch granted Learn from examples, apply to similar cases

7 7 / 65 Grading protocol for 558 Four Three programming assignments (10% each) Implement a toy model, test on data, write one-page report. Python, TensorFlow Midterm exam (10%) - Open-book! Final exam (30%) - Open-book! Project (30%) Implement the main idea of a scientific paper (simplifications might be allowed), test on data, write four-page report. Python, TensorFlow

8 A data scientist is like a medic HAS TO learn new tools in every couple of years HAS TO be one step ahead of the crowd HAS TO understand the sources of diseases, not only the prescriptions of the tools. (Med Pharmacy) Hence, NO MATTER IF THE POSITION IS ACADEMIC OR NOT, HAS TO FOLLOW THE LITERATURE VERY CLOSELY! predictive-analytics/ 8 / 65

9 9 / 65 The grand slams of machine learning 1 - International Conference on Machine Learning (ICML)

10 10 / 65 The grand slams of machine learning 2 - Neural Information Processing Systems (NIPS)

11 11 / 65 The grand slams of machine learning 3 - Uncertainty in Artificial Intelligence (UAI)

12 12 / 65 The grand slams of machine learning 4 - Artificial Intelligence and Statistics (AISTATS)

13 13 / 65 The grand slams of machine learning Quality: ICML NIPS UAI AISTATS >> others Scale: ICML NIPS > UAI AISTATS

14 14 / 65 Why probabilistic machine learning? Learn a lot from few cases, just like the human brain!

15 Why probabilistic machine learning? Charming results! This is the paper that made neural networks shake the world A. Krizhevsky et al., NIPS, / 65

16 16 / 65 Why probabilistic machine learning? with unbelievably good results on the very challenging ILSRVC data set

17 Why probabilistic machine learning? using a method called Dropout introduced by yet another paper N. Srivastava, Journal of Machine Learning Research, / 65

18 18 / 65 Why probabilistic machine learning? which builds on uncertainty of a neuron to be active or inactive Mystery: Why such a simple trick works that well NOT mystery: Why accounting for uncertainty leverages more robust predictions Probability Theory!

19 19 / 65 Why probabilistic machine learning? The more advanced the uncertainty account is, the better the predictions are D. Kingma et al., NIPS, 2015

20 20 / 65 Definitions Sample space (Ω): A collection of all possible outcomes of a random experiment. Event (E): A question about the experiment with a yes/no answer. A subset of the sample space. Probability measure: A function that assigns a number P (A) to each event A.

21 21 / 65 Axioms of probability Axiom 1: Probability of an event is a non-negative real number: P (E) R, P (E) 0, E Ω Axiom 2: Probability of the entire sample space is 1: P (Ω) = 1. Axiom 3: P (E 1 E 2 ) = P (E 1 ) + P (E 2 ), where E 1 E 2 =.

22 22 / 65 Consequences Sum rule: P (E 1 E 2 ) = P (E 1 ) + P (E 2 ) P (E 1 E 2 ) P ( ) = 0 All set theory is applicable. Most of the Boolean algebra is applicable.

23 Conditional probability Kolmogorov s definition: P (A B) = P (A B) P (B) a.k.a product rule. De Finetti introduces this formulation as an axiom. Consider the following example 1 : / 65

24 24 / 65 Definitions (2) Probability density function (PDF): P r[a x b] = Cumulative distribution function (CDF): F x (x) = PDF - CDF relationship: x b a p(x)dx p(x)dx P r[a x b] = F x (b) F x (a)

25 25 / 65 Definitions(3) Expected value: E p(x) [x] = Variance: x p(x)dx V ar p(x) [x] = E p(x) [(x E p(x) [x]) 2 ] = E p(x) [x 2 ] (E p(x) [x]) 2 Standard Deviation: σ(x) = V ar p(x) [x]

26 26 / 65 Definitions(4) Joint PDF P r[a x b c y d] = Covariance: b d a c cov(x, y) = E[(x E[x])(y E[y])] p xy (x, y)dydx Marginal probability (sum rule): p(x) = p(x, y)dy

27 27 / 65 Normal distribution PDF: N (x µ, σ 2 ) = 1 (x µ) 2 σ 2π e 2σ 2 CDF: [ ( )] 1 x µ 1 + erf 2 2σ 2 where erf(x) = 1 x π x e t2 dt. Mean: µ Variance: σ 2

28 Normal distribution (2) PDF CDF Std. Dev / 65

29 29 / 65 Multivariate normal distribution PDF: N (x µ, Σ) = (2π) D 2 Σ 1 2 e 1 2 (x µ)t Σ 1 (x µ) CDF: N/A. Mean: µ Covariance: Σ

30 Multivariate normal distribution (2) / 65

31 31 / 65 Why normal distribution? Central limit theorem Let x 1, x 2,, x N be N random variables with E[x n ] = µ and V ar[x n ] = σ 2 <. Then as N approaches infinity, the random variables N(ˆµ n µ) converge to be distributed as N (0, σ 2 ), where ˆµ n = (x 1, x 2,, x n )/n is the sample mean for the first n random variables.

32 32 / 65 Independence and Conditional Independence Independence: P (E 1 E 2 ) = P (E 1 )P (E 2 ) Conditional independence: P (E 1 E 2 E 3 ) = P (E 1 E 3 )P (E 2 E 3 )

33 33 / 65 Independent and identically distributedness (i.i.d) Let X = {x 1, x 2,, x N } be a set of N random variables corresponding to N observations of an experiment. They are defined to be independent and identically distributed (i.i.d) random variables if: All random variables x i have the same probability distribution. All pairs of observation events are independent. Hence, the likelihood of an i.i.d. data set can be written as P (X θ) = N n=1 p(x n θ).

34 34 / 65 Exchangeability The random variables (x 1, x 2,, x N ) are exchangeable if for any permutation π, the following equality holds p(x 1, x 2,, x N ) = p(x π1, x π2,, x πn ).

35 35 / 65 What is probability?

36 36 / 65 What is probability? Is probability an objective or a subjective measure?

37 37 / 65 What if probability is objective? p(e) = n e lim n + n n e : Number of times the event of interest occurs n: Number of trials

38 38 / 65 Can probability really be purely objective? How shall we handle +? Sample set is limited. How do we know that our sample set is not biased? How do we know that the dice is fair? Is not making fairness or biasedness assumptions a subjective guess? Then why not quantify subjectivity? asks a Bayesian, like de Finetti: The classical view, based on physical considerations of symmetry, in which one should be obliged to give the same probability to such symmetric cases. But which symmetry? And, in any case, why? The original sentence becomes meaningful if reversed: the symmetry is probabilistically significant, in someone s opinion, if it leads him to assign the probabilities to such events. de Finetti, 1970/74, Preface,xi-xii

39 39 / 65 Thomas Bayes the legend ( ) p(h X) = p(x H)p(H) p(x) H: Hypothesis X: Measurement

40 40 / 65 Bayes Theorem p(θ x) = p(x θ)p(θ) p(x) x X is an observable in the sample space X. θ is a set of model parameters. It is an index to a frequentist, and a random variable for a Bayesian. p(x θ): likelihood (how do model parameters describe data?) p(θ): prior (what is our prior belief about model parameters?) p(x): evidence (what is the likelihood of data regardless of the model parameters?) p(θ x): posterior (how do model parameters distribute after observations are taken into account?)

41 41 / 65 Prior? What does it really mean? Who do you expect to win the tennis game and why?

42 42 / 65 What does it mean to be Bayesian in machine learning?

43 43 / 65 Motivation 1: De Finetti s Theorem A sequence of random variables (x 1, x 2,, x N ) is infinitely exchangeable iff, for any N, p(x 1, x 2,, x N ) = N i=1 p(x i θ)p (dθ) Here, P (dθ) = p(θ)dθ if θ has a density. Implications: Exchangeability can be checked from right hand side. There must exist a parameter θ! There must exist a likelihood p(x θ)! There must exist a distribution P on θ

44 44 / 65 Motivation 2: Statistical Decision Theory Loss function: l(θ, δ(x)) where δ(x) is a decision based on data x. Determines the penalty for predicting δ(x) if θ is the true parameter. e.g. Squared loss: l(θ, δ(x)) = (θ δ(x)) 2. However, δ(x) does not have to be an estimate of θ.

45 45 / 65 Frequentist risk R(θ, δ) = E X [l(θ, δ(x))] = for a fixed θ and different x X. x X l(θ, δ(x))p(x θ)dx

46 46 / 65 How to decide which loss function is best 1. Admissibility: Never dominated everywhere by another decision. Not practical, a decision rarely dominates another in real cases. courses/260-spring10/lectures/lecture2.pdf

47 47 / 65 How to decide which loss function is best 2. Restricted classes of procedures: For instance, we can restrict ourselves to the unbiased case (i.e. E θ [ˆθ] = θ). Many good procedures are biased. Moreover, some unbiased procedures are inadmissible.

48 48 / 65 How to decide which loss function is best 3. Minimax: Choose the one with lower maximum worst-case risk. courses/260-spring10/lectures/lecture2.pdf

49 49 / 65 Bayesian decision theory Posterior risk: ρ(π, δ(x)) = l(θ, δ(x))p(θ x)dθ where p(θ x) p(x θ)π(θ). The Bayes action δ (x) for any fixed x is the decision δ(x) that minimizes the posterior risk.

50 Bayesian decision theory (2) For example, let us calculate the posterior risk for l(θ, δ(x)) = (θ δ(x)) 2 : ρ = (θ δ(x)) 2 p(θ x)dθ = δ(x) 2 2δ(x) θp(θ x)dθ + θ 2 p(θ x)dθ and the Bayes action ρ δ(x) = 2δ(x) 2 θp(θ x)dθ = 0, δ (x) = θp(θ x)dθ turns out to be the posterior mean! For l(θ, δ(x)) = θ δ(x), the optimal decision is to choose the posterior median. 50 / 65

51 51 / 65 Wrap up Frequentist risk R(θ, δ) = E X [l(θ, δ(x))] = x X l(θ, δ(x))p(x θ)dx Bayesian risk ρ(π, δ(x)) = E θ [l(θ, δ(x))] l(θ, δ(x))p(θ x)dθ

52 52 / 65 Motivation 3: Posterior predictive distribution Given a posterior p(θ x) and a new observation x, the posterior predictive distribution is p(x x) = p(x θ)p(θ x)dθ = E p(θ x) [p(x θ)] This distribution takes into account all possible values of θ with importance proportional to the probability of their occurrence. This virtue is called model averaging and exists only in Bayesian models!

53 53 / 65 The model selection problem We are given two hypotheses that claim to explain a certain data set. Both give similar prediction quality. Which one should we choose? Which one explains the data better?

54 54 / 65 Motivation 4: Bayes quantifies model selection Hypothesis 1 (H 1 ): Likelihood: p H1 (x θ 1 ), Prior: p H1 (x θ 1 ) Hypothesis 2 (H 2 ): Likelihood: p H2 (x θ 2 ), Prior: p H2 (x θ 2 ) We can alternatively treat the hypothesis as a random variable H = {1, 2} that determines the type of the distribution p( ): p H1 (x θ 1 ) = p(x θ 1, H = 1) p H2 (x θ 2 ) = p(x θ 2, H = 2) Let us place a prior on also on the hypothesis variable. Unless we have a good reason, we are agnostic to both hypotheses: P (H = 1) = P (H = 2).

55 55 / 65 Motivation 4: Bayes quantifies model selection Now let us take into account all possible model parameter realizations for both hypotheses (i.e. calculate the evidence): p(x H = 1) = p(x θ 1, H = 1)p(θ 1 H = 1)dθ 1 p(x H = 2) = p(x θ 2, H = 2)p(θ 2 H = 2)dθ 2 This operation is called MARGINALIZING OUT! Nuisance Variable: A variable that we are not interested for our current analysis of interest. Rule of Thumb: Marginalize out nuisance variables as much as you can!

56 Motivation 4: Bayes quantifies model selection Now apply Bayes theorem to calculate the posterior on hypotheses P (H x) = p(x H)P (H) p(x) Choose the hypothesis with higher posterior probability. Compare p(h = 1 x) and p(h = 2 x). Since p(x) does not depend on H, its magnitude does not have an effect on the comparison. Since we chose a uniform prior on the hypotheses (P (H = 1) = P (H = 2)), the magnitude of P (H) also does not have an effect. Hence, it suffices to calculate p(x H = 1)/p(x H = 2). This metric is called the Bayes factor [Kass and Raftery, 1995]. Choose H 1 if Bayes factor is greater than 1, choose H 2 otherwise. The model evidence serves as a quantitative metric for model selection in the Bayesian setting. 56 / 65

57 Supervised learning Given a set of observations: x 1, x 2,, x N and the corresponding outcomes (labels) y 1, y 2,, y N, learn a function y = f(x) A naive solution is linear regression 4 : y = w T x / 65

58 58 / 65 Types of supervised learning Classification: y a, b, c,, k Regression: y R Semi-supervised learning: A (large) subset of the training set does not have labels. Active learning: The model asks labels of the most important observations. Structured output learning: y is a structure (e.g. a graph).

59 Unsupervised learning Given a set of observations: x 1, x 2,, x N, learn a model that does X. A commonplace X is to infer data chunks, called clusters. This problem is called clustering / 65

60 60 / 65 Discriminative versus Generative models Joint model: p(x, y). Generative model: p(y x) = p(y)p(x y). p(x) Discriminative model deals directly with p(y x).

61 61 / 65 Parametric and nonparametric models Parametric model: The structure of the training data is stored in a predetermined set of parameters. These parameters are sufficient for prediction, no need to store the training data. Non-parametric model: Number of parameters in the model grows with the training data size. Training data also has to be stored for prediction.

62 62 / 65 Take home 1: The Bayesian data analysis pipeline 1 Given i.i.d. data X = {x 1,, x N }. 2 How do you parameterize its generation process? Design a likelihood density p(x n θ) 3 What is your prior belief about model parameters? Design a prior density p(θ) 4 Do learning. Infer the posterior: p(θ X ) = N n=1 p(x n θ)p(θ)/p(x ). 5 Do prediction. The likelihood of a new observation x is p(x X ) = p(x θ)p(θ X )dθ

63 63 / 65 Wait...all is well but... How can we calculate the posterior p(θ X ) = N n=1 p(x n θ)p(θ)/p(x ) especially p(x )? In many cases you cannot. You can only approximate it. And this is what this course is all about!

64 64 / 65 Take home 2 Read Bishop, Sections 1.2.3, 1.2.4, Michael I. Jordan s lecture notes: courses/260-spring10/lectures/lecture2.pdf

65 65 / 65 Take home 3: Assignment 1 =) Implement the variational inference scheme for Bayesian linear regression using TensorFlow API under Python and run it on the UCI Boston Housing data set. Deadline: , 23:59:59 İstanbul Time. Submit a python script that trains the model on a randomly-chosen 90% of the data set and predicts on the rest, repeats this procedure 10 times, and reports the root mean square error (RMSE) averaged across 10 trials. Submit a half-page single-column report detailing your comments on the outcome (e.g. How far did we go to solve the problem? What kind of end products can we develop based on this model?) Hint: See Bishop, Section 10.3.

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Statistical Machine Learning Lectures 4: Variational Bayes

Statistical Machine Learning Lectures 4: Variational Bayes 1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

Machine Learning. Instructor: Pranjal Awasthi

Machine Learning. Instructor: Pranjal Awasthi Machine Learning Instructor: Pranjal Awasthi Course Info Requested an SPN and emailed me Wait for Carol Difrancesco to give them out. Not registered and need SPN Email me after class No promises It s a

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples Machine Learning Bayes Basics Bayes, probabilities, Bayes theorem & examples Marc Toussaint U Stuttgart So far: Basic regression & classification methods: Features + Loss + Regularization & CV All kinds

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com 1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix

More information

Data Mining Techniques. Lecture 3: Probability

Data Mining Techniques. Lecture 3: Probability Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 3: Probability Jan-Willem van de Meent (credit: Zhao, CS 229, Bishop) Project Vote 1. Freeform: Develop your own project proposals 30% of

More information

Introduction to Bayesian Inference

Introduction to Bayesian Inference Introduction to Bayesian Inference p. 1/2 Introduction to Bayesian Inference September 15th, 2010 Reading: Hoff Chapter 1-2 Introduction to Bayesian Inference p. 2/2 Probability: Measurement of Uncertainty

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Lecture 6: Model Checking and Selection

Lecture 6: Model Checking and Selection Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which

More information

STAT J535: Introduction

STAT J535: Introduction David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 Chapter 1: Introduction to Bayesian Data Analysis Bayesian statistical inference uses Bayes Law (Bayes Theorem) to combine prior information

More information

CS 109 Review. CS 109 Review. Julia Daniel, 12/3/2018. Julia Daniel

CS 109 Review. CS 109 Review. Julia Daniel, 12/3/2018. Julia Daniel CS 109 Review CS 109 Review Julia Daniel, 12/3/2018 Julia Daniel Dec. 3, 2018 Where we re at Last week: ML wrap-up, theoretical background for modern ML This week: course overview, open questions after

More information

9 Bayesian inference. 9.1 Subjective probability

9 Bayesian inference. 9.1 Subjective probability 9 Bayesian inference 1702-1761 9.1 Subjective probability This is probability regarded as degree of belief. A subjective probability of an event A is assessed as p if you are prepared to stake pm to win

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Intro to Probability. Andrei Barbu

Intro to Probability. Andrei Barbu Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

Review of Probabilities and Basic Statistics

Review of Probabilities and Basic Statistics Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?

Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture? Bayes Rule CS789: Machine Learning and Neural Network Bayesian learning P (Y X) = P (X Y )P (Y ) P (X) Jakramate Bootkrajang Department of Computer Science Chiang Mai University P (Y ): prior belief, prior

More information

Statistical Methods in Particle Physics. Lecture 2

Statistical Methods in Particle Physics. Lecture 2 Statistical Methods in Particle Physics Lecture 2 October 17, 2011 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2011 / 12 Outline Probability Definition and interpretation Kolmogorov's

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Probability Theory Review

Probability Theory Review Probability Theory Review Brendan O Connor 10-601 Recitation Sept 11 & 12, 2012 1 Mathematical Tools for Machine Learning Probability Theory Linear Algebra Calculus Wikipedia is great reference 2 Probability

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Some Concepts of Probability (Review) Volker Tresp Summer 2018

Some Concepts of Probability (Review) Volker Tresp Summer 2018 Some Concepts of Probability (Review) Volker Tresp Summer 2018 1 Definition There are different way to define what a probability stands for Mathematically, the most rigorous definition is based on Kolmogorov

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

2 Belief, probability and exchangeability

2 Belief, probability and exchangeability 2 Belief, probability and exchangeability We first discuss what properties a reasonable belief function should have, and show that probabilities have these properties. Then, we review the basic machinery

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet

More information

Bayesian Inference: What, and Why?

Bayesian Inference: What, and Why? Winter School on Big Data Challenges to Modern Statistics Geilo Jan, 2014 (A Light Appetizer before Dinner) Bayesian Inference: What, and Why? Elja Arjas UH, THL, UiO Understanding the concepts of randomness

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Learning with Probabilities

Learning with Probabilities Learning with Probabilities CS194-10 Fall 2011 Lecture 15 CS194-10 Fall 2011 Lecture 15 1 Outline Bayesian learning eliminates arbitrary loss functions and regularizers facilitates incorporation of prior

More information

Lecture 8 October Bayes Estimators and Average Risk Optimality

Lecture 8 October Bayes Estimators and Average Risk Optimality STATS 300A: Theory of Statistics Fall 205 Lecture 8 October 5 Lecturer: Lester Mackey Scribe: Hongseok Namkoong, Phan Minh Nguyen Warning: These notes may contain factual and/or typographic errors. 8.

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

Single Maths B: Introduction to Probability

Single Maths B: Introduction to Probability Single Maths B: Introduction to Probability Overview Lecturer Email Office Homework Webpage Dr Jonathan Cumming j.a.cumming@durham.ac.uk CM233 None! http://maths.dur.ac.uk/stats/people/jac/singleb/ 1 Introduction

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Lecture 11. Probability Theory: an Overveiw

Lecture 11. Probability Theory: an Overveiw Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007 MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Econ 2140, spring 2018, Part IIa Statistical Decision Theory

Econ 2140, spring 2018, Part IIa Statistical Decision Theory Econ 2140, spring 2018, Part IIa Maximilian Kasy Department of Economics, Harvard University 1 / 35 Examples of decision problems Decide whether or not the hypothesis of no racial discrimination in job

More information

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Hypothesis Testing Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Suppose the family of population distributions is indexed

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Introduction. Basic Probability and Bayes Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 26/9/2011 (EPFL) Graphical Models 26/9/2011 1 / 28 Outline

More information

Nearest Neighbor Pattern Classification

Nearest Neighbor Pattern Classification Nearest Neighbor Pattern Classification T. M. Cover and P. E. Hart May 15, 2018 1 The Intro The nearest neighbor algorithm/rule (NN) is the simplest nonparametric decisions procedure, that assigns to unclassified

More information

Lecture 13 and 14: Bayesian estimation theory

Lecture 13 and 14: Bayesian estimation theory 1 Lecture 13 and 14: Bayesian estimation theory Spring 2012 - EE 194 Networked estimation and control (Prof. Khan) March 26 2012 I. BAYESIAN ESTIMATORS Mother Nature conducts a random experiment that generates

More information

Introduction to Statistical Methods for High Energy Physics

Introduction to Statistical Methods for High Energy Physics Introduction to Statistical Methods for High Energy Physics 2011 CERN Summer Student Lectures Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Introduction to Bayesian Inference

Introduction to Bayesian Inference University of Pennsylvania EABCN Training School May 10, 2016 Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ

More information

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2 Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 2 1 / 32 Learning a generative model We are given a training

More information

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015 Probability Refresher Kai Arras, University of Freiburg Winter term 2014/2015 Probability Refresher Introduction to Probability Random variables Joint distribution Marginalization Conditional probability

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

15-780: Grad AI Lecture 17: Probability. Geoff Gordon (this lecture) Tuomas Sandholm TAs Erik Zawadzki, Abe Othman

15-780: Grad AI Lecture 17: Probability. Geoff Gordon (this lecture) Tuomas Sandholm TAs Erik Zawadzki, Abe Othman 15-780: Grad AI Lecture 17: Probability Geoff Gordon (this lecture) Tuomas Sandholm TAs Erik Zawadzki, Abe Othman Review: probability RVs, events, sample space Ω Measures, distributions disjoint union

More information

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning, Midterm Exam: Spring 2009 SOLUTION 10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of

More information

Math Review Sheet, Fall 2008

Math Review Sheet, Fall 2008 1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the

More information

Business Statistics 41000: Homework # 2 Solutions

Business Statistics 41000: Homework # 2 Solutions Business Statistics 4000: Homework # 2 Solutions Drew Creal February 9, 204 Question #. Discrete Random Variables and Their Distributions (a) The probabilities have to sum to, which means that 0. + 0.3

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

18.05 Practice Final Exam

18.05 Practice Final Exam No calculators. 18.05 Practice Final Exam Number of problems 16 concept questions, 16 problems. Simplifying expressions Unless asked to explicitly, you don t need to simplify complicated expressions. For

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information