Part 4: Conditional Random Fields

Size: px
Start display at page:

Download "Part 4: Conditional Random Fields"

Transcription

1 Part 4: Conditional Random Fields Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June / 39

2 Problem (Probabilistic Learning) Let d(y x) be the (unknown) true conditional distribution. Let D = {(x 1, y 1 ),..., (x N, y N )} be i.i.d. samples from d(x, y). or Find a distribution p(y x) that we can use as a proxy for d(y x). Given a parametrized family of distributions, p(y x, w), find the parameter w making p(y x, w) closest to d(y x). Open questions: What do we mean by closest? What s a good candidate for p(y x, w)? How to actually find w? conceptually, and numerically 2 / 39

3 Principle of Parsimony (Parsimoney, aka Occam s razor) Pluralitas non est ponenda sine neccesitate. William of Ockham We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances. Isaac Newton Make everything as simple as possible, but not simpler. Albert Einstein Use the simplest explanation that covers all the facts. what we ll use 3 / 39

4 1) Define what aspects we consider relevant facts about the data. 2) Pick the simplest distribution reflecting that. Definition (Simplicity Entropy) The simplicity of a distribution p is given by its entropy: H(p) = z Z p(z) log p(z) Definition (Relevant Facts Feature Functions) By φ i : Z R for i = 1,..., D we denote a set of feature functions that express everything we want to be able to model about our data. For example: the grayvalue of a pixel, a bag-of-words histogram of an image, the time of day an image was taken, a flag if a pixel is darker than half of its neighbors. 4 / 39

5 Principle (Maximum Entropy Principle) Let z 1,..., z N be samples from a distribution d(z). Let φ 1,..., φ D be feature functions, and denote by µ i := 1 N n φ i(z n ) their means over the sample set. The maximum entropy distribution, p, is the solution to max H(p) p is a prob.distr. }{{} be as simple as possible Theorem (Exponential Family Distribution) subject to E z p(z) {φ i (z)} = µ i. }{{} be faithful to what we know Under some very reasonable conditions, the maximum entropy distribution has the form p(z) = 1 Z exp ( i w iφ i (z) ) for some parameter vector w = (w 1,..., w D ) and constant Z. 5 / 39

6 Example: Let Z = R, φ 1 (z) = z, φ 2 (z) = z 2. The exponential family distribution is p(z) = 1 Z(w) exp( w 1z + w 2 z 2 ) = b2 a Z(a, b) exp( a( z b ) 2 ) for a = w2, b = w 1 w 2. It s a Gaussian! Given examples z 1,..., z N, we can compute a and b, and derive w. 6 / 39

7 Example: Let Z = {1,..., K}, φ k (z) = z = k, for k = 1,..., K. The exponential family distribution is p(z) = 1 Z(w) exp( w k φ k (z) ) k exp(w 1 )/Z for z = 1, exp(w 2 )/Z for z = 2, =... exp(w K )/Z for z = K. with Z = exp(w 1 ) + + exp(w K ). It s a Multinomial! 7 / 39

8 Example: Let Z = {0, 1} N M image grid, φ i (y) := y i for each pixel i, φ NM (y) = i j y i y j (summing over all 4-neighbor pairs) The exponential family distribution is p(z) = 1 Z(w) exp( w, φ(y) + w y i y j ) i,j It s a (binary) Markov Random Field! 8 / 39

9 Conditional Random Field Learning Assume: a set of i.i.d. samples D = {(x n, y n )},...,N, (x n, y n ) d(x, y) feature functions ( φ 1 (x, y),..., φ D (x, y) ) : φ(x, y) parametrized family p(y x, w) = 1 Z(x,w) exp( w, φ(x, y) ) Task: adjust w of p(y x, w) based on D. Many possible technique to do so: Expectation Matching Maximum Likelihood Best Approximation MAP estimation of w Punchline: they all turn out to be (almost) the same! 9 / 39

10 Maximum Likelihood Parameter Estimation Idea: maximize conditional likelihood of observing outputs y 1,..., y N for inputs x 1,..., x N w = argmax w R D p(y 1,..., y N x 1,..., x N, w) i.i.d. = argmax w R D log( ) = argmin w R D N p(y n x n, w) N log p(y n x n, w) }{{} negative conditional log-likelihood (of D) 10 / 39

11 Best Approximation Idea: find p(y x, w) that is closest to d(y x) Definition (Similarity between conditional distributions) For fixed x X : KL-divergence measure similarity KL cond (p d)(x) := y Y d(y x) log d(y x) p(y x, w) For x d(x), compute expectation: } KL tot (p d) : = E x d(x) {KL cond (p d)(x) = x X d(x, y) log d(y x) p(y x, w) y Y 11 / 39

12 Best Approximation Idea: find p(y x, w) of minimal KL tot -distance to d(y x) w = argmin w R D d(x, y) log x X y Y d(y x) p(y x, w) w R D (x,y) X Y drop const. = argmin d(x, y) log p(y x, w) N (x n,y n ) d(x,y) argmin w R D log p(y n x n, w) }{{} negative conditional log-likelihood (of D) 12 / 39

13 MAP Estimation of w Idea: Treat w as random variable; maximize posterior probability p(w D) p(w D) Bayes = p(x1, y 1,..., x n, y n w)p(w) i.i.d. = p(w) p(d) N p(w): prior belief on w (cannot be estimated from data). p(y n x n, w) p(y n x n ) w = argmax w R D = argmin w R D [ = argmin w R D [ ] p(w D) = argmin log p(w D) w R D [ N log p(w) log p(w) N ] log p(y n x n, w) + log p(y n x n ) }{{} indep. of w ] log p(y n x n, w) 13 / 39

14 Choices for p(w): p(w) : const. w [ = argmin log p(w) w R D w [ = argmin w R D N log p(y n x n, w) ] (uniform; in R D not really a distribution) N log p(y n x n, w) + const. ] }{{} negative conditional log-likelihood p(w) := const. e 1 2σ 2 w 2 w = argmin w R D [ 1 (Gaussian) N 2σ 2 w 2 + log p(y n x n, w) }{{} regularized negative conditional log-likelihood + const. ] 14 / 39

15 Probabilistic Models for Structured Prediction - Summary Negative (Regularized) Conditional Log-Likelihood (of D) L(w) = 1 2σ 2 w 2 N [ w, φ(x n, y n ) log e w,φ(xn,y) ] (σ 2 makes it unregularized) y Y Probabilistic parameter estimation or training means solving w = argmin L(w). w R D Same optimization problem as for multi-class logistic regression. 15 / 39

16 Negative Conditional Log-Likelihood (Toy Example) negative log likelihood σ 2 =0.01 negative log likelihood σ 2 = negative log likelihood σ 2 = negative log likelihood σ / 39

17 Steepest Descent Minimization minimize L(w) input tolerance ɛ > 0 1: w cur 0 2: repeat 3: v w L(w cur ) 4: η argmin η R L(w cur ηv) 5: w cur w cur ηv 6: until v < ɛ output w cur Alternatives: L-BFGS (second-order descent without explicit Hessian) Conjugate Gradient We always need (at least) the gradient of L. 17 / 39

18 L(w) = 1 2σ 2 w 2 N [ w, φ(x n, y n ) + log e w,φ(xn,y) ] y Y w L(w) = 1 N σ 2 w [ φ(x n, y n y Y ) e w,φ(xn,y) φ(x n, y) ] ȳ Y e w,φ(xn,ȳ) = 1 N σ 2 w [ φ(x n, y n ) p(y x n, w)φ(x n, y) ] y Y = 1 N σ 2 w [ φ(x n, y n ) E y p(y x n,w)φ(x n, y) ] L(w) = 1 N σ 2 Id D D + [E y p(y x n,w)φ(x n, y)][e y p(y x n,w)φ(x n, y)] 18 / 39

19 L(w) = 1 2σ 2 w 2 N [ w, φ(x n, y n ) + log e w,φ(xn,y) ] y Y C -differentiable on all R D. 19 / 39

20 w L(w) = 1 N σ 2 w [ φ(x n, y n ) E y p(y x n,w)φ(x n, y) ] For σ : E y p(y x n,w)φ(x n, y) = φ(x n, y n ) w L(w) = 0, criticial point of L (local minimum/maximum/saddle point). Interpretation: We aim for expectation matching: E y p φ(x, y) = φ(x, y obs ) but discriminatively: only for x {x 1,..., x n }. 20 / 39

21 L(w) = 1 N σ 2 Id D D + [E y p(y x n,w)φ(x n, y)][e y p(y x n,w)φ(x n, y)] positive definite Hessian matrix L(w) is convex w L(w) = 0 implies global minimum. 21 / 39

22 Milestone I: Probabilistic Training (Conditional Random Fields) p(y x, w) log-linear in w R D. Training: many probabilistic derivations lead to same optimization problem minimize negative conditional log-likelihood, L(w) L(w) is differentiable and convex, gradient descent will find global optimum with w L(w) = 0 Same structure as multi-class logistic regression. For logistic regression: this is where the textbook ends. we re done. For conditional random fields: we re not in safe waters, yet! 22 / 39

23 Task: Compute v = w L(w cur ), evaluate L(w cur + ηv): L(w) = 1 2σ 2 w 2 N [ w, φ(x n, y n ) + log e w,φ(xn,y) ] y Y w L(w) = 1 N σ 2 w [ φ(x n, y n ) p(y x n, w)φ(x n, y) ] y Y Problem: Y typically is very (exponentially) large: binary image segmentation: Y = ranking N images: Y = N!, e.g. N = 1000: Y We must use the structure in Y, or we re lost. 23 / 39

24 Solving the Training Optimization Problem Numerically w L(w) = 1 N σ 2 w [ φ(x n, y n ) E y p(y x n,w)φ(x n, y) ] Computing the Gradient (naive): O(K M ND) L(w) = 1 2σ 2 w 2 N [ w, φ(x n, y n ) + log Z(x n, w) ] Line Search (naive): O(K M ND) per evaluation of L N: number of samples D: dimension of feature space M: number of output nodes 100s to 1,000,000s K: number of possible labels of each output nodes 2 to 100s 24 / 39

25 Probabilistic Inference to the Rescue Remember: in a graphical model with factors F, the features decompose: φ(x, y) = ( ) φ F (x, y F ) F F ( ) E y p(y x,w) φ(x, y) = E y p(y x,w) φ F (x, y F ) F F = ( E yf p(y F x,w)φ F (x, y F ) ) F F E yf p(y F x,w)φ F (x, y F ) = Factor marginals µ F = p(y F x, w) p(y F x, w) φ }{{} F (x, y F ) y F Y F }{{} factor marginals K F terms are much smaller than complete joint distribution p(y x, w), can be computed/approximated, e.g., with (loopy) belief propagation. 25 / 39

26 Solving the Training Optimization Problem Numerically w L(w) = 1 N σ 2 w [ φ(x n, y n ) E y p(y x n,w)φ(x n, y) ] Computing the Gradient: O(K M nd), O(MK Fmax ND): L(w) = 1 2σ 2 w 2 N [ w, φ(x n, y n ) + log e w,φ(xn,y) ] y Y Line Search: O(K M nd), O(MK Fmax ND) per evaluation of L N: number of samples 10s to 1,000,000s D: dimension of feature space M: number of output nodes K: number of possible labels of each output nodes 26 / 39

27 What, if the training set D is too large (e.g. millions of examples)? Simplify Model Train simpler model (smaller factors) Less expressive power results might get worse rather than better Subsampling Create random subset D D. Train model using D Ignores all information in D \ D Parallelize Train multiple models in parallel. Merge the models. Follows the multi-core/gpu trend How to actually merge? (or?) Doesn t really save computation. 27 / 39

28 What, if the training set D is too large (e.g. millions of examples)? Stochastic Gradient Descent (SGD) Keep maximizing p(w D). In each gradient descent step: Create random subset D D, often just 1 3 elements! Follow approximate gradient w L(w) = 1 σ 2 w (x n,y n ) D [ φ(x n, y n ) E y p(y x n,w)φ(x n, y) ] Line search no longer possible. Extra parameter: stepsize η SGD converges to argmin w L(w)! (if η chosen right) SGD needs more iterations, but each one is much faster more: see L. Bottou, O. Bousquet: The Tradeoffs of Large Scale Learning, NIPS also: 28 / 39

29 Solving the Training Optimization Problem Numerically w L(w) = 1 N σ 2 w [ φ(x n, y n ) E y p(y x n,w)φ(x n, y) ] Computing the Gradient: O(K M nd), O(MK 2 ND) (if BP is possible): L(w) = 1 2σ 2 w 2 N [ w, φ(x n, y n ) + log e w,φ(xn,y) ] y Y Line Search: O(K M nd), O(MK 2 ND) per evaluation of L N: number of samples D: dimension of feature space: φ i,j 1 10s, φ i : 100s to 10000s M: number of output nodes K: number of possible labels of each output nodes 29 / 39

30 Typical feature functions in image segmentation: φ i (y i, x) R 1000 : local image features, e.g. bag-of-words w i, φ i (y i, x) : local classifier (like logistic-regression) φ i,j (y i, y j ) = y i = y j R 1 : test for same label w ij, φ ij (y i, y j ) : penalizer for label changes (if w ij > 0) combined: argmax y p(y x) is smoothed version of local cues original local classification local + smoothness 30 / 39

31 Typical feature functions in handwriting recognition: φ i (y i, x) R 1000 : image representation (pixels, gradients) w i, φ i (y i, x) : local classifier if x i is letter y i φ i,j (y i, y j ) = e yi e yj R : letter/letter indicator w ij, φ ij (y i, y j ) : encourage/suppress letter combinations combined: argmax y p(y x) is corrected version of local cues local classification local + correction 31 / 39

32 Typical feature functions in pose estimation: φ i (y i, x) R 1000 : local image representation, e.g. HoG w i, φ i (y i, x) : local confidence map φ i,j (y i, y j ) = good fit(y i, y j ) R 1 : test for geometric fit w ij, φ ij (y i, y j ) : penalizer for unrealistic poses together: argmax y p(y x) is sanitized version of local cues original local classification local + geometry [V. Ferrari, M. Marin-Jimenez, A. Zisserman: Progressive Search Space Reduction for Human Pose Estimation, CVPR 2008.] 32 / 39

33 Typical feature functions for CRFs in Computer Vision: φ i (y i, x): local representation, high-dimensional w i, φ i (y i, x) : local classifier φ i,j (y i, y j ): prior knowledge, low-dimensional w ij, φ ij (y i, y j ) : penalize outliers learning adjusts parameters: unary wi : learn local classifiers and their importance binary w ij : learn importance of smoothing/penalization argmax y p(y x) is cleaned up version of local prediction 33 / 39

34 Solving the Training Optimization Problem Numerically Idea: split learning of unary potentials into two parts: local classifiers, their importance. Two-Stage Training pre-train f y i (x) ˆ= log p(y i x) use φ i (y i, x) := f y i (x) RK (low-dimensional) keep φ ij (y i, y j ) are before perform CRF learning with φ i and φ ij Advantage: lower dimensional feature space during inference faster f y i (x) can be stronger classifiers, e.g. non-linear SVMs Disadvantage: if local classifiers are bad, CRF training cannot fix that. 34 / 39

35 Solving the Training Optimization Problem Numerically CRF training methods is based on gradient-descent optimization. The faster we can do it, the better (more realistic) models we can use: w L(w) = w σ 2 N [ φ(x n, y n ) y Y p(y x n, w) φ(x n, y) ] R D A lot of research on accelerating CRF training: problem solution method(s) Y too large exploit structure (loopy) belief propagation smart sampling contrastive divergence use approximate L e.g. pseudo-likelihood N too large mini-batches stochastic gradient descent D too large trained φ unary two-stage training 35 / 39

36 Summary CRF Learning Given: training set {(x 1, y 1 ),..., (x n, y n )} X Y, feature function φ : X R D. (x n, y n ) i.i.d. d(x, y) 1 Task: find parameter vector w such that Z exp( w, φ(x, y) ) d(y x). CRF solution derived by minimizing negative conditional log-likelihood: w = argmin w 1 2σ 2 w 2 N [ w, φ(x n, y n ) log e w,φ(xn,y) ] y Y convex optimization problem gradient descent works training needs repeated runs of probabilistic inference 36 / 39

37 Sebastian Nowozin and Christoph Lampert Structured Models in Computer Vision Part 4. Conditional Random Fields Extra I: Beyond Fully Supervised Learning So far, training was fully supervised, all variables were observed. In real life, some variables can be unobserved even during training. missing labels in training data latent variables, e.g. part location latent variables, e.g. part occlusion latent variables, e.g. viewpoint 37 / 39

38 Graphical model: three types of variables x X always observed, y Y observed only in training, z Z never observed (latent). Marginalization over Latent Variables Construct conditional likelihood as usual: p(y, z x, w) = Derive p(y x, w) by marginalizing over z: p(y x, w) = 1 exp( w, φ(x, y, z) ) Z(x, w) 1 Z(x, w) p(y, z x, w) z Z 38 / 39

39 Negative regularized conditional log-likelihood: L(w) = λ w 2 = λ w 2 = λ w 2 + N log p(y n x n, w) N log p(y n, z x n, w) z Z N log exp( w, φ(x n, y n, z) ) z Z N log z Z y Y exp( w, φ(x n, y, z) ) L is not convex in w can have local minima no agreed on best way for treating latent variables 39 / 39

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Learning with Structured Inputs and Outputs

Learning with Structured Inputs and Outputs Learning with Structured Inputs and Outputs Christoph H. Lampert IST Austria (Institute of Science and Technology Austria), Vienna ENS/INRIA Summer School, Paris, July 2013 Slides: http://www.ist.ac.at/~chl/

More information

Learning with Structured Inputs and Outputs

Learning with Structured Inputs and Outputs Learning with Structured Inputs and Outputs Christoph Lampert Institute of Science and Technology Austria Vienna (Klosterneuburg), Austria INRIA Summer School 2010 Learning with Structured Inputs and Outputs

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Logistic Regression. William Cohen

Logistic Regression. William Cohen Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Linear and logistic regression

Linear and logistic regression Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Introduction To Graphical Models

Introduction To Graphical Models Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, Tübingen, Germany ENS/INRIA Summer School, Paris, July 2013

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Algorithms for Predicting Structured Data

Algorithms for Predicting Structured Data 1 / 70 Algorithms for Predicting Structured Data Thomas Gärtner / Shankar Vembu Fraunhofer IAIS / UIUC ECML PKDD 2010 Structured Prediction 2 / 70 Predicting multiple outputs with complex internal structure

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday! Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun Y. LeCun: Machine Learning and Pattern Recognition p. 1/? MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun The Courant Institute, New York University http://yann.lecun.com

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015 Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Probabilistic Graphical Models in Computer Vision (IN2329)

Probabilistic Graphical Models in Computer Vision (IN2329) Probabilistic Graphical Models in Computer Vision (IN2329) Csaba Domokos Summer Semester 2017 11. Parameter learning............................................................................................

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Classification: The rest of the story

Classification: The rest of the story U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

Classification Logistic Regression

Classification Logistic Regression Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

CENG 793. On Machine Learning and Optimization. Sinan Kalkan

CENG 793. On Machine Learning and Optimization. Sinan Kalkan CENG 793 On Machine Learning and Optimization Sinan Kalkan 2 Now Introduction to ML Problem definition Classes of approaches K-NN Support Vector Machines Softmax classification / logistic regression Parzen

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses Bayesian Learning Two Roles for Bayesian Methods Probabilistic approach to inference. Quantities of interest are governed by prob. dist. and optimal decisions can be made by reasoning about these prob.

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data

Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data DD2424 March 23, 2017 Binary classification problem given labelled training data Have labelled training examples? Given

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

HOMEWORK #4: LOGISTIC REGRESSION

HOMEWORK #4: LOGISTIC REGRESSION HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2018 Due: Friday, February 23rd, 2018, 11:55 PM Submit code and report via EEE Dropbox You should submit a

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

Lecture 1: Supervised Learning

Lecture 1: Supervised Learning Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)

More information

CS60021: Scalable Data Mining. Large Scale Machine Learning

CS60021: Scalable Data Mining. Large Scale Machine Learning J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction

More information

HOMEWORK #4: LOGISTIC REGRESSION

HOMEWORK #4: LOGISTIC REGRESSION HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information