Statistical Machine Learning Lecture 1: Motivation
|
|
- Eugenia Austin
- 5 years ago
- Views:
Transcription
1 1 / 65 Statistical Machine Learning Lecture 1: Motivation Melih Kandemir Özyeğin University, İstanbul, Turkey
2 2 / 65 What is this course about? Using the science of statistics to build machine learning models Training such models Inference Applications will be mainly on deep neural nets. What a surprise =) CS 458/558 Statistical Machine Learning Statistical Probabilistic Bayesian Probabilistic? Bayesian?
3 3 / 65 What is it NOT about? Introduction to machine learning Introduction to deep learning Introduction to probability and statistics Advanced probability and statistics Advanced Bayesian theory
4 4 / 65 Textbook None. The field is evolving in lightspeed! A brand new text book is already outdated. A rather new one is already archaic! Read scientific articles, book sections, and course slides.
5 5 / 65 Primadonnas
6 6 / 65 Grading protocol for 458 Four programming assignments (10% each) Implement a model, test on data, write one-page report. Python, TensorFlow Midterm exam (20%) - Open-book! Final exam (40%) - Open-book! Open-book means: You can keep during the exam your Lecture slides - Yes! Textbooks - Yes! Any written text of your own - Yes! Article print outs - Yes! Electronic devices - No! No free lunch No free points for memorizing material Cheap lunch granted Learn from examples, apply to similar cases
7 7 / 65 Grading protocol for 558 Four Three programming assignments (10% each) Implement a toy model, test on data, write one-page report. Python, TensorFlow Midterm exam (10%) - Open-book! Final exam (30%) - Open-book! Project (30%) Implement the main idea of a scientific paper (simplifications might be allowed), test on data, write four-page report. Python, TensorFlow
8 A data scientist is like a medic HAS TO learn new tools in every couple of years HAS TO be one step ahead of the crowd HAS TO understand the sources of diseases, not only the prescriptions of the tools. (Med Pharmacy) Hence, NO MATTER IF THE POSITION IS ACADEMIC OR NOT, HAS TO FOLLOW THE LITERATURE VERY CLOSELY! predictive-analytics/ 8 / 65
9 9 / 65 The grand slams of machine learning 1 - International Conference on Machine Learning (ICML)
10 10 / 65 The grand slams of machine learning 2 - Neural Information Processing Systems (NIPS)
11 11 / 65 The grand slams of machine learning 3 - Uncertainty in Artificial Intelligence (UAI)
12 12 / 65 The grand slams of machine learning 4 - Artificial Intelligence and Statistics (AISTATS)
13 13 / 65 The grand slams of machine learning Quality: ICML NIPS UAI AISTATS >> others Scale: ICML NIPS > UAI AISTATS
14 14 / 65 Why probabilistic machine learning? Learn a lot from few cases, just like the human brain!
15 Why probabilistic machine learning? Charming results! This is the paper that made neural networks shake the world A. Krizhevsky et al., NIPS, / 65
16 16 / 65 Why probabilistic machine learning? with unbelievably good results on the very challenging ILSRVC data set
17 Why probabilistic machine learning? using a method called Dropout introduced by yet another paper N. Srivastava, Journal of Machine Learning Research, / 65
18 18 / 65 Why probabilistic machine learning? which builds on uncertainty of a neuron to be active or inactive Mystery: Why such a simple trick works that well NOT mystery: Why accounting for uncertainty leverages more robust predictions Probability Theory!
19 19 / 65 Why probabilistic machine learning? The more advanced the uncertainty account is, the better the predictions are D. Kingma et al., NIPS, 2015
20 20 / 65 Definitions Sample space (Ω): A collection of all possible outcomes of a random experiment. Event (E): A question about the experiment with a yes/no answer. A subset of the sample space. Probability measure: A function that assigns a number P (A) to each event A.
21 21 / 65 Axioms of probability Axiom 1: Probability of an event is a non-negative real number: P (E) R, P (E) 0, E Ω Axiom 2: Probability of the entire sample space is 1: P (Ω) = 1. Axiom 3: P (E 1 E 2 ) = P (E 1 ) + P (E 2 ), where E 1 E 2 =.
22 22 / 65 Consequences Sum rule: P (E 1 E 2 ) = P (E 1 ) + P (E 2 ) P (E 1 E 2 ) P ( ) = 0 All set theory is applicable. Most of the Boolean algebra is applicable.
23 Conditional probability Kolmogorov s definition: P (A B) = P (A B) P (B) a.k.a product rule. De Finetti introduces this formulation as an axiom. Consider the following example 1 : / 65
24 24 / 65 Definitions (2) Probability density function (PDF): P r[a x b] = Cumulative distribution function (CDF): F x (x) = PDF - CDF relationship: x b a p(x)dx p(x)dx P r[a x b] = F x (b) F x (a)
25 25 / 65 Definitions(3) Expected value: E p(x) [x] = Variance: x p(x)dx V ar p(x) [x] = E p(x) [(x E p(x) [x]) 2 ] = E p(x) [x 2 ] (E p(x) [x]) 2 Standard Deviation: σ(x) = V ar p(x) [x]
26 26 / 65 Definitions(4) Joint PDF P r[a x b c y d] = Covariance: b d a c cov(x, y) = E[(x E[x])(y E[y])] p xy (x, y)dydx Marginal probability (sum rule): p(x) = p(x, y)dy
27 27 / 65 Normal distribution PDF: N (x µ, σ 2 ) = 1 (x µ) 2 σ 2π e 2σ 2 CDF: [ ( )] 1 x µ 1 + erf 2 2σ 2 where erf(x) = 1 x π x e t2 dt. Mean: µ Variance: σ 2
28 Normal distribution (2) PDF CDF Std. Dev / 65
29 29 / 65 Multivariate normal distribution PDF: N (x µ, Σ) = (2π) D 2 Σ 1 2 e 1 2 (x µ)t Σ 1 (x µ) CDF: N/A. Mean: µ Covariance: Σ
30 Multivariate normal distribution (2) / 65
31 31 / 65 Why normal distribution? Central limit theorem Let x 1, x 2,, x N be N random variables with E[x n ] = µ and V ar[x n ] = σ 2 <. Then as N approaches infinity, the random variables N(ˆµ n µ) converge to be distributed as N (0, σ 2 ), where ˆµ n = (x 1, x 2,, x n )/n is the sample mean for the first n random variables.
32 32 / 65 Independence and Conditional Independence Independence: P (E 1 E 2 ) = P (E 1 )P (E 2 ) Conditional independence: P (E 1 E 2 E 3 ) = P (E 1 E 3 )P (E 2 E 3 )
33 33 / 65 Independent and identically distributedness (i.i.d) Let X = {x 1, x 2,, x N } be a set of N random variables corresponding to N observations of an experiment. They are defined to be independent and identically distributed (i.i.d) random variables if: All random variables x i have the same probability distribution. All pairs of observation events are independent. Hence, the likelihood of an i.i.d. data set can be written as P (X θ) = N n=1 p(x n θ).
34 34 / 65 Exchangeability The random variables (x 1, x 2,, x N ) are exchangeable if for any permutation π, the following equality holds p(x 1, x 2,, x N ) = p(x π1, x π2,, x πn ).
35 35 / 65 What is probability?
36 36 / 65 What is probability? Is probability an objective or a subjective measure?
37 37 / 65 What if probability is objective? p(e) = n e lim n + n n e : Number of times the event of interest occurs n: Number of trials
38 38 / 65 Can probability really be purely objective? How shall we handle +? Sample set is limited. How do we know that our sample set is not biased? How do we know that the dice is fair? Is not making fairness or biasedness assumptions a subjective guess? Then why not quantify subjectivity? asks a Bayesian, like de Finetti: The classical view, based on physical considerations of symmetry, in which one should be obliged to give the same probability to such symmetric cases. But which symmetry? And, in any case, why? The original sentence becomes meaningful if reversed: the symmetry is probabilistically significant, in someone s opinion, if it leads him to assign the probabilities to such events. de Finetti, 1970/74, Preface,xi-xii
39 39 / 65 Thomas Bayes the legend ( ) p(h X) = p(x H)p(H) p(x) H: Hypothesis X: Measurement
40 40 / 65 Bayes Theorem p(θ x) = p(x θ)p(θ) p(x) x X is an observable in the sample space X. θ is a set of model parameters. It is an index to a frequentist, and a random variable for a Bayesian. p(x θ): likelihood (how do model parameters describe data?) p(θ): prior (what is our prior belief about model parameters?) p(x): evidence (what is the likelihood of data regardless of the model parameters?) p(θ x): posterior (how do model parameters distribute after observations are taken into account?)
41 41 / 65 Prior? What does it really mean? Who do you expect to win the tennis game and why?
42 42 / 65 What does it mean to be Bayesian in machine learning?
43 43 / 65 Motivation 1: De Finetti s Theorem A sequence of random variables (x 1, x 2,, x N ) is infinitely exchangeable iff, for any N, p(x 1, x 2,, x N ) = N i=1 p(x i θ)p (dθ) Here, P (dθ) = p(θ)dθ if θ has a density. Implications: Exchangeability can be checked from right hand side. There must exist a parameter θ! There must exist a likelihood p(x θ)! There must exist a distribution P on θ
44 44 / 65 Motivation 2: Statistical Decision Theory Loss function: l(θ, δ(x)) where δ(x) is a decision based on data x. Determines the penalty for predicting δ(x) if θ is the true parameter. e.g. Squared loss: l(θ, δ(x)) = (θ δ(x)) 2. However, δ(x) does not have to be an estimate of θ.
45 45 / 65 Frequentist risk R(θ, δ) = E X [l(θ, δ(x))] = for a fixed θ and different x X. x X l(θ, δ(x))p(x θ)dx
46 46 / 65 How to decide which loss function is best 1. Admissibility: Never dominated everywhere by another decision. Not practical, a decision rarely dominates another in real cases. courses/260-spring10/lectures/lecture2.pdf
47 47 / 65 How to decide which loss function is best 2. Restricted classes of procedures: For instance, we can restrict ourselves to the unbiased case (i.e. E θ [ˆθ] = θ). Many good procedures are biased. Moreover, some unbiased procedures are inadmissible.
48 48 / 65 How to decide which loss function is best 3. Minimax: Choose the one with lower maximum worst-case risk. courses/260-spring10/lectures/lecture2.pdf
49 49 / 65 Bayesian decision theory Posterior risk: ρ(π, δ(x)) = l(θ, δ(x))p(θ x)dθ where p(θ x) p(x θ)π(θ). The Bayes action δ (x) for any fixed x is the decision δ(x) that minimizes the posterior risk.
50 Bayesian decision theory (2) For example, let us calculate the posterior risk for l(θ, δ(x)) = (θ δ(x)) 2 : ρ = (θ δ(x)) 2 p(θ x)dθ = δ(x) 2 2δ(x) θp(θ x)dθ + θ 2 p(θ x)dθ and the Bayes action ρ δ(x) = 2δ(x) 2 θp(θ x)dθ = 0, δ (x) = θp(θ x)dθ turns out to be the posterior mean! For l(θ, δ(x)) = θ δ(x), the optimal decision is to choose the posterior median. 50 / 65
51 51 / 65 Wrap up Frequentist risk R(θ, δ) = E X [l(θ, δ(x))] = x X l(θ, δ(x))p(x θ)dx Bayesian risk ρ(π, δ(x)) = E θ [l(θ, δ(x))] l(θ, δ(x))p(θ x)dθ
52 52 / 65 Motivation 3: Posterior predictive distribution Given a posterior p(θ x) and a new observation x, the posterior predictive distribution is p(x x) = p(x θ)p(θ x)dθ = E p(θ x) [p(x θ)] This distribution takes into account all possible values of θ with importance proportional to the probability of their occurrence. This virtue is called model averaging and exists only in Bayesian models!
53 53 / 65 The model selection problem We are given two hypotheses that claim to explain a certain data set. Both give similar prediction quality. Which one should we choose? Which one explains the data better?
54 54 / 65 Motivation 4: Bayes quantifies model selection Hypothesis 1 (H 1 ): Likelihood: p H1 (x θ 1 ), Prior: p H1 (x θ 1 ) Hypothesis 2 (H 2 ): Likelihood: p H2 (x θ 2 ), Prior: p H2 (x θ 2 ) We can alternatively treat the hypothesis as a random variable H = {1, 2} that determines the type of the distribution p( ): p H1 (x θ 1 ) = p(x θ 1, H = 1) p H2 (x θ 2 ) = p(x θ 2, H = 2) Let us place a prior on also on the hypothesis variable. Unless we have a good reason, we are agnostic to both hypotheses: P (H = 1) = P (H = 2).
55 55 / 65 Motivation 4: Bayes quantifies model selection Now let us take into account all possible model parameter realizations for both hypotheses (i.e. calculate the evidence): p(x H = 1) = p(x θ 1, H = 1)p(θ 1 H = 1)dθ 1 p(x H = 2) = p(x θ 2, H = 2)p(θ 2 H = 2)dθ 2 This operation is called MARGINALIZING OUT! Nuisance Variable: A variable that we are not interested for our current analysis of interest. Rule of Thumb: Marginalize out nuisance variables as much as you can!
56 Motivation 4: Bayes quantifies model selection Now apply Bayes theorem to calculate the posterior on hypotheses P (H x) = p(x H)P (H) p(x) Choose the hypothesis with higher posterior probability. Compare p(h = 1 x) and p(h = 2 x). Since p(x) does not depend on H, its magnitude does not have an effect on the comparison. Since we chose a uniform prior on the hypotheses (P (H = 1) = P (H = 2)), the magnitude of P (H) also does not have an effect. Hence, it suffices to calculate p(x H = 1)/p(x H = 2). This metric is called the Bayes factor [Kass and Raftery, 1995]. Choose H 1 if Bayes factor is greater than 1, choose H 2 otherwise. The model evidence serves as a quantitative metric for model selection in the Bayesian setting. 56 / 65
57 Supervised learning Given a set of observations: x 1, x 2,, x N and the corresponding outcomes (labels) y 1, y 2,, y N, learn a function y = f(x) A naive solution is linear regression 4 : y = w T x / 65
58 58 / 65 Types of supervised learning Classification: y a, b, c,, k Regression: y R Semi-supervised learning: A (large) subset of the training set does not have labels. Active learning: The model asks labels of the most important observations. Structured output learning: y is a structure (e.g. a graph).
59 Unsupervised learning Given a set of observations: x 1, x 2,, x N, learn a model that does X. A commonplace X is to infer data chunks, called clusters. This problem is called clustering / 65
60 60 / 65 Discriminative versus Generative models Joint model: p(x, y). Generative model: p(y x) = p(y)p(x y). p(x) Discriminative model deals directly with p(y x).
61 61 / 65 Parametric and nonparametric models Parametric model: The structure of the training data is stored in a predetermined set of parameters. These parameters are sufficient for prediction, no need to store the training data. Non-parametric model: Number of parameters in the model grows with the training data size. Training data also has to be stored for prediction.
62 62 / 65 Take home 1: The Bayesian data analysis pipeline 1 Given i.i.d. data X = {x 1,, x N }. 2 How do you parameterize its generation process? Design a likelihood density p(x n θ) 3 What is your prior belief about model parameters? Design a prior density p(θ) 4 Do learning. Infer the posterior: p(θ X ) = N n=1 p(x n θ)p(θ)/p(x ). 5 Do prediction. The likelihood of a new observation x is p(x X ) = p(x θ)p(θ X )dθ
63 63 / 65 Wait...all is well but... How can we calculate the posterior p(θ X ) = N n=1 p(x n θ)p(θ)/p(x ) especially p(x )? In many cases you cannot. You can only approximate it. And this is what this course is all about!
64 64 / 65 Take home 2 Read Bishop, Sections 1.2.3, 1.2.4, Michael I. Jordan s lecture notes: courses/260-spring10/lectures/lecture2.pdf
65 65 / 65 Take home 3: Assignment 1 =) Implement the variational inference scheme for Bayesian linear regression using TensorFlow API under Python and run it on the UCI Boston Housing data set. Deadline: , 23:59:59 İstanbul Time. Submit a python script that trains the model on a randomly-chosen 90% of the data set and predicts on the rest, repeats this procedure 10 times, and reports the root mean square error (RMSE) averaged across 10 trials. Submit a half-page single-column report detailing your comments on the outcome (e.g. How far did we go to solve the problem? What kind of end products can we develop based on this model?) Hint: See Bishop, Section 10.3.
Lecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationDecision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over
Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationShould all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?
Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationGenerative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul
Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationStatistical Machine Learning Lectures 4: Variational Bayes
1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference
More informationBayesian Decision and Bayesian Learning
Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability
More informationLecture 2: Basic Concepts of Statistical Decision Theory
EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture
More informationMachine Learning. Instructor: Pranjal Awasthi
Machine Learning Instructor: Pranjal Awasthi Course Info Requested an SPN and emailed me Wait for Carol Difrancesco to give them out. Not registered and need SPN Email me after class No promises It s a
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationProbabilistic Graphical Models for Image Analysis - Lecture 1
Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationMachine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples
Machine Learning Bayes Basics Bayes, probabilities, Bayes theorem & examples Marc Toussaint U Stuttgart So far: Basic regression & classification methods: Features + Loss + Regularization & CV All kinds
More informationTime Series and Dynamic Models
Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationPreliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com
1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix
More informationData Mining Techniques. Lecture 3: Probability
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 3: Probability Jan-Willem van de Meent (credit: Zhao, CS 229, Bishop) Project Vote 1. Freeform: Develop your own project proposals 30% of
More informationIntroduction to Bayesian Inference
Introduction to Bayesian Inference p. 1/2 Introduction to Bayesian Inference September 15th, 2010 Reading: Hoff Chapter 1-2 Introduction to Bayesian Inference p. 2/2 Probability: Measurement of Uncertainty
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationLecture 6: Model Checking and Selection
Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which
More informationSTAT J535: Introduction
David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 Chapter 1: Introduction to Bayesian Data Analysis Bayesian statistical inference uses Bayes Law (Bayes Theorem) to combine prior information
More informationCS 109 Review. CS 109 Review. Julia Daniel, 12/3/2018. Julia Daniel
CS 109 Review CS 109 Review Julia Daniel, 12/3/2018 Julia Daniel Dec. 3, 2018 Where we re at Last week: ML wrap-up, theoretical background for modern ML This week: course overview, open questions after
More information9 Bayesian inference. 9.1 Subjective probability
9 Bayesian inference 1702-1761 9.1 Subjective probability This is probability regarded as degree of belief. A subjective probability of an event A is assessed as p if you are prepared to stake pm to win
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationIntro to Probability. Andrei Barbu
Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationIntroduction to Machine Learning
What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes
More informationReview of Probabilities and Basic Statistics
Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationBayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?
Bayes Rule CS789: Machine Learning and Neural Network Bayesian learning P (Y X) = P (X Y )P (Y ) P (X) Jakramate Bootkrajang Department of Computer Science Chiang Mai University P (Y ): prior belief, prior
More informationStatistical Methods in Particle Physics. Lecture 2
Statistical Methods in Particle Physics Lecture 2 October 17, 2011 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2011 / 12 Outline Probability Definition and interpretation Kolmogorov's
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More information01 Probability Theory and Statistics Review
NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationProbability Theory Review
Probability Theory Review Brendan O Connor 10-601 Recitation Sept 11 & 12, 2012 1 Mathematical Tools for Machine Learning Probability Theory Linear Algebra Calculus Wikipedia is great reference 2 Probability
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationSome Concepts of Probability (Review) Volker Tresp Summer 2018
Some Concepts of Probability (Review) Volker Tresp Summer 2018 1 Definition There are different way to define what a probability stands for Mathematically, the most rigorous definition is based on Kolmogorov
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More information2 Belief, probability and exchangeability
2 Belief, probability and exchangeability We first discuss what properties a reasonable belief function should have, and show that probabilities have these properties. Then, we review the basic machinery
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationReview (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology
Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationBayesian Nonparametrics
Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet
More informationBayesian Inference: What, and Why?
Winter School on Big Data Challenges to Modern Statistics Geilo Jan, 2014 (A Light Appetizer before Dinner) Bayesian Inference: What, and Why? Elja Arjas UH, THL, UiO Understanding the concepts of randomness
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationLearning with Probabilities
Learning with Probabilities CS194-10 Fall 2011 Lecture 15 CS194-10 Fall 2011 Lecture 15 1 Outline Bayesian learning eliminates arbitrary loss functions and regularizers facilitates incorporation of prior
More informationLecture 8 October Bayes Estimators and Average Risk Optimality
STATS 300A: Theory of Statistics Fall 205 Lecture 8 October 5 Lecturer: Lester Mackey Scribe: Hongseok Namkoong, Phan Minh Nguyen Warning: These notes may contain factual and/or typographic errors. 8.
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II
More informationSingle Maths B: Introduction to Probability
Single Maths B: Introduction to Probability Overview Lecturer Email Office Homework Webpage Dr Jonathan Cumming j.a.cumming@durham.ac.uk CM233 None! http://maths.dur.ac.uk/stats/people/jac/singleb/ 1 Introduction
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationLecture 11. Probability Theory: an Overveiw
Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationHST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007
MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationToday. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion
Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence
More informationEcon 2140, spring 2018, Part IIa Statistical Decision Theory
Econ 2140, spring 2018, Part IIa Maximilian Kasy Department of Economics, Harvard University 1 / 35 Examples of decision problems Decide whether or not the hypothesis of no racial discrimination in job
More informationDefinition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.
Hypothesis Testing Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Suppose the family of population distributions is indexed
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Introduction. Basic Probability and Bayes Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 26/9/2011 (EPFL) Graphical Models 26/9/2011 1 / 28 Outline
More informationNearest Neighbor Pattern Classification
Nearest Neighbor Pattern Classification T. M. Cover and P. E. Hart May 15, 2018 1 The Intro The nearest neighbor algorithm/rule (NN) is the simplest nonparametric decisions procedure, that assigns to unclassified
More informationLecture 13 and 14: Bayesian estimation theory
1 Lecture 13 and 14: Bayesian estimation theory Spring 2012 - EE 194 Networked estimation and control (Prof. Khan) March 26 2012 I. BAYESIAN ESTIMATORS Mother Nature conducts a random experiment that generates
More informationIntroduction to Statistical Methods for High Energy Physics
Introduction to Statistical Methods for High Energy Physics 2011 CERN Summer Student Lectures Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan
More informationIntroduction to Bayesian Inference
University of Pennsylvania EABCN Training School May 10, 2016 Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ
More informationMachine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier
Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationRepresentation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2
Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 2 1 / 32 Learning a generative model We are given a training
More informationHuman-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015
Probability Refresher Kai Arras, University of Freiburg Winter term 2014/2015 Probability Refresher Introduction to Probability Random variables Joint distribution Marginalization Conditional probability
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More information15-780: Grad AI Lecture 17: Probability. Geoff Gordon (this lecture) Tuomas Sandholm TAs Erik Zawadzki, Abe Othman
15-780: Grad AI Lecture 17: Probability Geoff Gordon (this lecture) Tuomas Sandholm TAs Erik Zawadzki, Abe Othman Review: probability RVs, events, sample space Ω Measures, distributions disjoint union
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationMath Review Sheet, Fall 2008
1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the
More informationBusiness Statistics 41000: Homework # 2 Solutions
Business Statistics 4000: Homework # 2 Solutions Drew Creal February 9, 204 Question #. Discrete Random Variables and Their Distributions (a) The probabilities have to sum to, which means that 0. + 0.3
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationIntroduction to Machine Learning
Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB
More information18.05 Practice Final Exam
No calculators. 18.05 Practice Final Exam Number of problems 16 concept questions, 16 problems. Simplifying expressions Unless asked to explicitly, you don t need to simplify complicated expressions. For
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More information