Lecture 1 October 9, 2013

Size: px
Start display at page:

Download "Lecture 1 October 9, 2013"

Transcription

1 Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: Introduction Problem To model complex data, one is confronted with two main questions: How to manage the complexity of the data to be processed? How to infer global properties from local models? These questions lead to 3 types of problems: the representation of data or how to obtain a global model from a local model, the inference of the distributions how to use the model, and the learning of the models what are the parameters of the models? Examples Image: consider a pixels monochromatic image. If each pixel is modelled by a discrete random variable so there are of them, then the image can be modelled using a grid of the form: Bioinformatics: consider a long sequence of ADN bases. If each base of this sequence is modelled by a discrete random variable that, in general, can take values in {A, C, G, T }, then the sequence can be modelled by a Markov chain: 1-1

2 Finance: consider the evolution of stock prices in discrete time, where we have values at time n. It is reasonable to postulate that the change of price of a stock at time n only depends only on its price or the price of other stocks at time n 1. For only two stocks, a possible simplified model is the following dependency graph: Speech processing: consider the syllables of a word and the way they are interpreted by a human ear or by a computer. Each syllable can be represented by a random sound. The objective is then to retrieve the word from the sequence of sounds heard or recorded. In this case, we can use a hidden Markov model: Text: consider a text with words. The text is modelled by a vector such that each of its components equals to the number of times each keyword appears. This is usually called the bag of words model. This model seems to be weak, as it does not take the order of the words into account. However, it works quite well in practice. A so-called naive Bayes classifier can be used for classification for example spam vs non spam. It is clear that models which ignore the dependence among variables are too simple for real-world problems. On the other hand, models in which every random variable is dependent all or too many other ones are doomed both for statistical lack of data and computational reasons. Therefore, in practice, one has to make suitable assumptions to design models with 1-2

3 the right level of complexity, so that the models obtained are able to generalize well from a statistical point of view and lead to tractable computations from an algorithmic perspective. 1.2 Basic notations and properties In this section we recall some basic notations and properties of random variables. Convention: Mathematically, the probability that a random variable X takes the value x is denoted px = x. In this document, we simply write px to denote a distribution over the random variable X, or px to denote the distribution evaluated for the particular value x. It is similar for more variables. Fundamental rules. For two random variables X, Y we have Sum rule: px = Y px, Y. Product rule: px, Y = py XpX. Independence. Two random variables X and Y are said to be independent if and only if P X, Y = P XP Y. Conditional independence. Let X, Y, Z be random variables. We define X and Y to be conditionally independent given Z if and only if P X, Y Z = P X ZP Y Z. Property: If X and Y are conditionally independent given Z, then P X Y, Z = P X Z. Independent and identically distributed. A set of random variables is independent and identically distributed i.i.d. if each random variable has the same probability distribution as the others and all are mutually independent. Bayes formula. For two random variables X, Y we have P X Y = P Y XP X. P Y Note that Bayes formula is not a Bayesian formula in the sense of Bayesian statistics. 1-3

4 1.3 Statistical models Definition 1.1 Statistical model A parametric statistical model P Θ is a collection of probability distributions or a collection of probability density functions 1 defined on the same space and parameterized by parameters θ belonging to a set Θ R p. Formally: P Θ = {p θ θ Θ} Bernoulli model Consider a binary random variable X that can take the value 0 or 1. parametrized by θ [0, 1]: { PX = 1 = θ PX = 0 = 1 θ then a probability distribution of the Bernoulli model can be written as If px = 1 is and we can write px = x; θ = θ x 1 θ 1 x 1.1 X Berθ. 1.2 The Bernoulli model is the collection of these distributions for θ Θ = [0, 1] Binomial model A binomial random variable Binθ, N is defined as the value of the sum of n i.i.d. Bernoulli r.v. with parameter θ. The distribution of a binomial random variable N is n PN = k = θ k 1 θ n k. k The set Θ is the same as for the Bernoulli model Multinomial model Consider a discrete random variable C that can take one of K possible values {1, 2,..., K}. The random variable C can be represented by a K-dimensional random variable X = X 1, X 2,..., X K T for which the event {C = k} corresponds to the event in R d {X k = 1 and X l = 0, l k}. 1 In which case, they are all defined with respect to the same base measure, such as the Lebesgue measure 1-4

5 If we parametrize PC = k by a parameter π k [0, 1], then by definition we also have PX k = 1 = π k k = 1, 2,..., K, with K π k = 1. The probability distribution over x = x 1,..., x k can be written as px; π = K π x k k 1.3 where π = π 1, π 2,..., π K T. We will denote M1, π 1,..., π K such a discrete distribution. The corresponding set of parameters is Θ = {π R + K π = 1}. Now if we consider n independent observations of a M1, π multinomial random variable X, and we denote by N k the number of observations for which x k = 1, then the joint distribution of N 1, N 2,..., N K is called a multinomial Mn, π distribution. It takes the form: n! K pn 1, n 2,..., n K ; π, n = π n k k 1.4 n 1!n 2!... n K! and we can write N 1,..., N K MN, π 1, π 2,..., π K. 1.5 The multinomial Mn, π is to the M1, π distribution, as the binomial distribution is to the Bernoulli distribution. In the rest of this course, when we will talk about multinomial distributions, we will always refer to a M1, π distribution Gaussian models The Gaussian distribution is also known as the normal distribution. In the case of a scalar variable X, the Gaussian distribution can be written in the form N x; µ, σ 2 1 x µ2 = exp 1.6 2πσ 2 1/2 2σ 2 where µ is the mean and σ 2 is the variance. For a d-dimensional vector x, the multivariate Gaussian distribution takes the form 1 1 N x µ, Σ = exp 1 2π d/2 Σ 1/2 2 x µt Σ 1 x µ 1.7 where µ is a d-dimensional vector, Σ is a d d symmetric positive definite matrix, and Σ denotes the determinant of Σ. It is a well-known property that the parameter µ is equal to the expectation of X and that the matrix Σ is the covariance matrix of X, which means that Σ ij = E[X i µ i X j µ j ]. 1-5

6 1.4 Parameter estimation by maximum likelihood Definition Maximum likelihood estimation MLE is a method of estimating the parameters of a statistical model. Suppose we have a sample x 1, x 2,..., x n of n independent and identically distributed observations, coming from a distribution px 1, x 2,..., x n ; θ where θ is an unknown parameter both x i and θ can be vectors. As the name suggests, the MLE finds the parameter ˆθ under which the data x 1, x 2,..., x n are most likely: ˆθ = argmax px 1, x 2,..., x n ; θ 1.8 θ The probability on the right-hand side in the above equation can be seen as a function of θ and can be denoted by Lθ: Lθ = px 1, x 2,..., x n ; θ 1.9 This function is called the likelihood. As x 1, x 2,..., x n are independent and identically distributed, we have n Lθ = px i ; θ 1.10 In practice it is often more convenient to work with the logarithm of the likelihood function, called the log-likelihood: n lθ = log Lθ = log px i ; θ 1.11 = log px i ; θ 1.12 Next, we will apply this method for the models presented previously. We assume that all the observations are independent and identically distributed in all of the remainder of this lecture MLE for the Bernoulli model Consider n observations x 1, x 2,..., x n of a binary random variable X following a Bernoulli distribution Berθ. From 1.12 and 1.1 we have lθ = log px i ; θ = log θ x i 1 θ 1 x i = N logθ + n N log1 θ 1-6

7 where N = n x i. As lθ is strictly concave, it has a unique maximizer, and since the function is in addition differentiable, its maximizer ˆθ is the zero of its gradient lθ: lθ = θ lθ = N θ n N 1 θ. It is easy to show that lθ = 0 θ = N. Therefore we have n MLE for the multinomial model ˆθ = N n = x 1 + x x n n Consider N observations X 1, X 2,..., X N of a discrete random variable X following a multinomial distribution M1, π, where π = π 1, π 2,..., π K T. We denote x i i = 1, 2,..., N the K-dimensional vectors of 0s and 1s representing X i, as presented in Section From 1.12 and 1.3 we have N lπ = log px i ; π = = = N K log N π x ik k K x ik log π k K n k log π k where n k = N x ik n k is therefore the number of observations of x k = 1. We need to maximize this quantity subject to the constraint K π k = 1. Brief review on Lagrange duality Lagrangian. Consider the following convex optimization problem: min fx, subject to Ax = b 1.14 x X where f is a convex function, X R p is a convex set included in the domain 2 of f, A R n p, b R n. The Lagrangian associated with this optimization problem is defined as 2 The domain of a function is the set on which the function is finite. Lx, λ = fx + λ T Ax b

8 The vector λ R n is called the Lagrange multiplier vector. Lagrange dual function. The Lagrange dual function is defined as gλ = min x Lx, λ 1.16 The problem of maximizing gλ with respect to λ is known as the Lagrange dual problem. Max-min inequality. For any f : R n R m and any w R n and z R m, we have fw, z max fw, z = min fw, z min fw, z z Z w W w W z Z 1.17 = max fw, z min fw, z. z Z w W w W z Z 1.18 The last inequality is known as the max-min inequality. Duality. It is easy to show that Which gives us max Lx, λ = λ min x { fx fx = min x Now from 1.16, 1.18 and 1.19 we have max λ gλ = max λ min x Lx, λ min if Ax = b + otherwise. max Lx, λ 1.19 λ x max λ Lx, λ = min fx 1.20 This inequality says that the optimal value d of the Lagrange dual problem always lower-bounds the optimal value p of the original problem. This property is called the weak duality. If the equality d = p holds, then we say that the strong duality holds. Strong duality means that the order of the minimization over x and the maximization over λ can be switched without affecting the result. Slater s constraint qualification lemma. If there exists an x in the relative interior of X {Ax = b} then strong duality holds. Note that by definition X is included in the domain of f so that if x X then fx <. Note that all the above notions and results are stated for the problem 1.14 only. For a more general problem and more details about Lagrange duality, please refer to [9] chapter x

9 Back to our problem We need to minimize subject to the constraint 1 T π = 1. The Lagrangian of this problem is fπ = lπ = Lπ, λ = K n k log π k 1.21 K K n k log π k + λ π k Clearly, as n k 0 k = 1, 2,..., K, f is convex and this problem is a convex optimization problem. Moreover, it is trivial that there exist π 1, π 2,..., π K such that π k > 0 k = 1, 2,..., K and K π k = 1, so by Slater s constraint qualification, the problem has strong duality property. Therefore, we have min π fπ = max min Lπ, λ 1.23 λ π As Lπ, λ is convex with respect to π, to find min π Lπ, λ, it suffices to take derivatives with respect to π k. This yields L = n k + λ = 0, k = 1, 2,..., K. π k π k or π k = n k, k = 1, 2,..., K λ Substituting these into the constraint K π k = 1 we get K n k = λ, yielding λ = N. From this and 1.24 we get finally ˆπ k = n k, k = 1, 2,..., K N Remark: ˆπ k is the fraction of the N observations for which x k = MLE for the univariate Gaussian model Consider n observations x 1, x 2,..., x n of a random variable X following a Gaussian distribution N µ, σ 2. From 1.12 and 1.6 we have lµ, σ 2 = log px i ; µ, σ 2 [ 1 = log exp x ] i µ 2 2πσ 2 1/2 2σ 2 = n 2 log2π n 2 logσ2 1 x i µ 2. 2 σ 2 1-9

10 We need to maximize this quantity with respect to µ and σ 2. By taking derivative with respect to µ and then σ 2, it is easy to obtain that the pair ˆµ, ˆσ 2, defined by ˆµ = 1 n ˆσ 2 = 1 n x i 1.26 x i ˆµ is the only stationary point of the likelihood. One can actually check for example computing the Hessian w.r.t. µ, σ 2 that this actually a maximum. We will have a confirmation of this in the lecture on exponential families MLE for the multivariate Gaussian model Let X R d be a Gaussian random vector, with mean vector µ R d and a covariance matrix Σ R d d positive definite: px µ, Σ = 1 1 x µ Σ 1 x µ exp 2π k 2 det Σ 2 Let x 1,..., x n be a i.i.d. sample. The log-likelihood is given by: l µ, Σ = log p x 1,..., x n ; µ, Σ n = log p x i µ, Σ = nd 2 log2π + n 2 log det Σ x i µ Σ 1 x i µ In this case, one should be careful that these log-likelihoods are not concave with respect to the pair of parameters µ, Σ. They are concave w.r.t. µ when Σ is fixed but they are not even concave with respect to Σ when µ is fixed Digression: review on differentials Differentiable function. A function f is differentiable at x R d if there exists a linear form df such that: fx + h fx = dfh + o h Since R d is a Hilbert space, we know that there exists g R d such that df x h = g, h. We call g the gradient of f : g = fx. 1-10

11 Example 1 : if f a x + b then we have : and thus fx + h fx = a h fx = a. Example 2 : if f 1 2 x Ax then we have : fx + h fx = 1 2 x + ht Ax + h 1 2 x Ax = 1 2 x Ah + h Ax + o h The gradient is then : fx = 1 2 Ax + A x Let us first differentiate l µ, Σ w.r.t. µ. We need to differentiate : µ x i µ Σ 1 x i µ Which is equal to f g where : and f : R d R y y Σ 1 y g : R d R d µ µ x i Reminder : Composition of differentials df g = df gx dg x h = df gx dg x h 1-11

12 The differential of f is : df y h = fy, h = 1 2 as Σ 1 is symmetric. The function l : µ x i µσ 1 x i µ. We have : We deduce the gradient of l : dl µ h = Σ 1 µ x i, h lµ = Σ 1 µ x i Σ 1 y + Σ 1 y, h = Σ 1 y, h Back to the MLE for the multivariate Gaussian Remember that the function we want to differentiate is : nd l µ, Σ = 2 log2π + n 2 log det Σ + 1 x i µ Σ 1 x i µ 2 According to the results above, we have : µ l µ, Σ 1 = Σ 1 µ x i = Σ 1 nµ = Σ 1 nµ nx where x = 1 n n x i The gradient is equal to 0 iff : x i µ = 1 n x i Let us now differentiate l w.r.t. Σ 1. Let A = Σ 1. We have : nd l µ, Σ = 2 log2π n 2 log det A + 1 x i µ A x i µ 2 The last term is a real number, so it equal to its trace. Thus : 1-12

13 nd l µ, Σ = 2 log2π n 2 log det A + 1 Trace x i µ Ax i µ 2 nd = 2 log2π n 2 logdet A + n 2 TraceA Σ where Σ = 1 n is the empirical covariance matrix. Let f : A A Σ ntrace. 2 x i µx i µ We have : fa + H fa = n H 2 Trace Σ The gradient of the last term of l is then : and : fa = n 2 Σ df A H = fa, H = Trace fa H Let us focus on the second term. We have : where H = log deta + H = log A 1 2 I + A 1 2 HA 1 2 A 1 2 = log A + log deti + H A 1 2 HA 1 2. Let g : A log deta where A = I + H. We have : 1-13

14 log deti + H log deti = H is symmetric, so it can be written as : d log1 + λ j d λ j + o H = Trace H + o H H = UΛU where U is an orthogonal matrix and Λ = diagλ 1,..., λ d. d log det A H = Trace A 1 2 HA 1 2 = TraceHA 1 We deduce the gradient of log deta: And the gradient of l w.r.t. A is : It is equal to zero iff : when Σ is invertible. Finally we have shown that the pair log deta = A 1 A l = n 2 A 1 + n 2 Σ Σ = Σ ˆµ = x = 1 n x i andˆσ = 1 n x i xx i x is the only stationary point of the likelihood. One can actually check for example computing the Hessian w.r.t. µ, Σ that this actually a maximum. We will have a confirmation of this in the lecture on exponential families. 1-14

15 Bibliography [1] J.M Amigo and M.B. Kennel. Variance estimators for the Lempel-Ziv entropy rate estimator. Chaos, 16:043102, [2] Aim Fuchs Dominique Foata. Calcul des probabilités. 2ème édition. Dunod, [3] Gilbert Saporta. Probabilités, analyses des données et statistiques. Technip, [4] Frédéric Bonnans. Optimisation continue, Cours et problèmes corrigés. Dunod, [5] Michael Jordan. An introduction to graphical models. In preparation. [6] de lagrange. [7] S.J.D. Prince. Computer Vision: Models Learning and Inference. Cambridge University Press, [8] Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc., Secaucus, NJ, USA, [9] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, New York, NY, USA,

Maximum likelihood estimation

Maximum likelihood estimation Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Machine Learning. Option Data Science Mathématiques Appliquées & Modélisation. Alexandre Aussem

Machine Learning. Option Data Science Mathématiques Appliquées & Modélisation. Alexandre Aussem Machine Learning Option Data Science Mathématiques Appliquées & Modélisation Alexandre Aussem LIRIS UMR 5205 CNRS Data Mining & Machine Learning Group (DM2L) University of Lyon 1 Web: perso.univ-lyon1.fr/alexandre.aussem

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Learning in graphical models & MC methods Fall Cours 8 November 18

Learning in graphical models & MC methods Fall Cours 8 November 18 Learning in graphical models & MC methods Fall 2015 Cours 8 November 18 Enseignant: Guillaume Obozinsi Scribe: Khalife Sammy, Maryan Morel 8.1 HMM (end) As a reminder, the message propagation algorithm

More information

Gaussian Models (9/9/13)

Gaussian Models (9/9/13) STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Convex Functions Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Definition convex function Examples

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro

More information

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2 Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 2 1 / 32 Learning a generative model We are given a training

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels

More information

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx. Two hours To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER CONVEX OPTIMIZATION - SOLUTIONS xx xxxx 27 xx:xx xx.xx Answer THREE of the FOUR questions. If

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics) COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24 Administrivia This course Topics http://www.satyenkale.com/coms4771/ 1. Supervised learning Core

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 12, 2007 D. Blei ProbStat 01 1 / 42 Who wants to scribe? D. Blei ProbStat 01 2 / 42 Random variable Probability is about

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my

More information

Introduction: exponential family, conjugacy, and sufficiency (9/2/13)

Introduction: exponential family, conjugacy, and sufficiency (9/2/13) STA56: Probabilistic machine learning Introduction: exponential family, conjugacy, and sufficiency 9/2/3 Lecturer: Barbara Engelhardt Scribes: Melissa Dalis, Abhinandan Nath, Abhishek Dubey, Xin Zhou Review

More information

Chapter 4: Asymptotic Properties of the MLE (Part 2)

Chapter 4: Asymptotic Properties of the MLE (Part 2) Chapter 4: Asymptotic Properties of the MLE (Part 2) Daniel O. Scharfstein 09/24/13 1 / 1 Example Let {(R i, X i ) : i = 1,..., n} be an i.i.d. sample of n random vectors (R, X ). Here R is a response

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Inf2b Learning and Data

Inf2b Learning and Data Inf2b Learning and Data Lecture 13: Review (Credit: Hiroshi Shimodaira Iain Murray and Steve Renals) Centre for Speech Technology Research (CSTR) School of Informatics University of Edinburgh http://www.inf.ed.ac.uk/teaching/courses/inf2b/

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Machine Learning - MT Classification: Generative Models

Machine Learning - MT Classification: Generative Models Machine Learning - MT 2016 7. Classification: Generative Models Varun Kanade University of Oxford October 31, 2016 Announcements Practical 1 Submission Try to get signed off during session itself Otherwise,

More information

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016 Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information