MIA - Master on Artificial Intelligence

Size: px
Start display at page:

Download "MIA - Master on Artificial Intelligence"

Transcription

1 MIA - Master on Artificial Intelligence

2 1 Introduction Unsupervised & semi-supervised approaches Supervised Algorithms Maximum Likelihood Estimation Maximum Entropy Modeling

3 Introduction 1 Introduction Unsupervised & semi-supervised approaches Supervised Algorithms Maximum Likelihood Estimation Maximum Entropy Modeling

4 Paradigms Introduction Supervised n-gram models. Parameter estimation: MLE & smoothing. algorithms: Naive Bayes, Decision Trees, SVMs, Adaboost, Perceptron, log-linear,... Unsupervised and semi-supervised Similarity models: Clustering, EBL. Prediction models: Expectation Maximization (EM). Bootstrapping Co-training Active learning...

5 Other relevant considerations Introduction Batch vs on-line ML algorithms and parameter tuning Train/development data Evaluation Test data N-fold cross validation Precision/Recall/F1

6 Unsupervised & semi-supervised approaches 1 Introduction Unsupervised & semi-supervised approaches Supervised Algorithms Maximum Likelihood Estimation Maximum Entropy Modeling

7 Clustering Unsupervised & semi-supervised approaches Single-link clustering of 22 frequent English words represented as a dendogram. be not he I it this the his a and but in on with for at from of to as is was

8 The EM algorithm Unsupervised & semi-supervised approaches Start with guess for values of your model parameters Step E Compute distribution of the missing/latent data given the observed data and your current guess of the model parameters. Use the missing/latent data distribution to compute the expectation of the likelihood function with respect to the unobserved variables. Step M Use the expected likelihood function with no unobserved variables to maximize the function as you would in the fully observed case, to get a new estimate of your model parameters. Repeat steps E-M until convergence (no further changes).

9 The EM algorithm - Example Unsupervised & semi-supervised approaches Three coins with probability of heads (λ, p 1, p 2 ). Hidden variable coin 0 (λ): Y = {H, T} Y = H flip coin 1 (p 1 ) three times Y = T flip coin 2 (p 2 ) three times Observed sequency: X = {HHT, HTT, TTT, HHH}

10 The EM algorithm - Example Unsupervised & semi-supervised approaches Start with a guess model µ = (λ, p 1, p 2 ) Step E - Expectation Use current model parameters µ to compute probability distribution of hidden data given the observations: P µ (H x i ) = P µ(x i,h) P µ (x i ) ; P µ(t x i ) = P µ(x i,t ) P µ (x i ) x i X where P(x i, H), P(x i, T), and P µ (x i ) are computed from current model: P µ (HHT, H) = λp 2 1 (1 p 1) P µ (HTT, T) = (1 λ)p 2 (1 p 2 ) 2... etc... P µ (x i ) = P µ (x i, H) + P µ (x i, T) x i X Compute expected number of occurrences for hidden variable values: E[Y = H] = i P(H x i) E[Y = T] = i P(T x i)

11 The EM algorithm - Example Unsupervised & semi-supervised approaches Step M - Maximization Use expectations computed above to compute new MLE estimates of model parameters given observations X = {HHT, HTT, TTT, HHH} λ = E[Y=H] N p 1 p 2 = 2 P(HHT,H)+1 P(HT T,H)+0 P(T T T,H)+3 P(HHH,H) E[Y=H] = 2 P(HHT,T )+1 P(HT T,T )+0 P(T T T,T )+3 P(HHH,T) E[Y=T ]

12 Bootstrapping: Self-training Unsupervised & semi-supervised approaches Input: L 0, a (small) set of labeled examples U, a (large) set of unlabelled examples Output: m, a learned model T = L 0 // Start with a reduced set of labelled examples while not convergence achieved() do m = learn(t) // Learn a model from available labeled examples n = label(u, m) // Use the learned model to label new examples n = filter(n, γ) // Filter labeled examples by confidence threshold T = T n endwhile // Add examples passing the filter to the training set Convergence may be defined as a fixed amount of iterations, or as a point where performance on a development set does not improve further.

13 Bootstrapping: Co-training Unsupervised & semi-supervised approaches Input: L 0, a (small) set of labeled examples U, a (large) set of unlabelled examples Output: m, a learned model T = L 0 // Start with a reduced set of labelled examples while not convergence achieved() do m 1 = learn(t, view 1 1) // Learn a model from available labeled examples m 2 = learn(t, view 2 ) // Learn a model from available labeled examples n 1 = label(u, m 1 ) // Use the learned model to label new examples n 2 = label(u, m 2 ) // Use the learned model to label new examples n = filter(n 1, n 2, γ) // Filter labeled examples by confidence threshold T = T n // Add new examples to the training set endwhile m = best(m 1, m 2 ) Both views must be conditionally independent and sufficient.

14 Active learning Unsupervised & semi-supervised approaches Input: L 0, a (small) set of labeled examples U, a (large) set of unlabelled examples oracle, a way to obtain the expected label for a given example Output: m, a learned model T = L 0 // Start with a reduced set of labelled examples while not convergence achieved() do m = learn(t) // Learn a model from available labeled examples n = label(u, m) // Use the learned model to label new examples n = select(n) // Select best examples to be labeled n = oracle(n) // Get supervised label for selected examples T = T n endwhile // Add new examples to the training set Different measures are used for example selection: Confidence of the model, error reduction, expected model change,... )

15 Supervised Algorithms 1 Introduction Unsupervised & semi-supervised approaches Supervised Algorithms Maximum Likelihood Estimation Maximum Entropy Modeling

16 Naive Bayes Supervised Algorithms Simplest probabilistic classifier NB generative model: y x1 x2 x3 xn x i is the i th feature of example x Features are conditionally independent given the class y

17 Naive Bayes (II) Supervised Algorithms P(y x 1,..., x n ) = (applying Bayes rule) = P(y) P(x 1,...,x n y) P(x 1,...,x n ) posterior = prior likelihood evidence

18 Naive Bayes (II) Supervised Algorithms P(y x 1,..., x n ) = (applying Bayes rule) = P(y) P(x 1,...,x n y) P(x 1,...,x n ) posterior = prior likelihood evidence NB(x) = argmax y posterior = argmax y P(y) P(x 1,...,x n y) P(x 1,...,x n )

19 Naive Bayes (II) Supervised Algorithms P(y x 1,..., x n ) = (applying Bayes rule) = P(y) P(x 1,...,x n y) P(x 1,...,x n ) posterior = prior likelihood evidence NB(x) = argmax y posterior = argmax y P(y) P(x 1,...,x n y) P(x 1,...,x n ) P(x 1,..., x n ) is a constant and features are conditionally independent given y, thus: NB(x) = argmax y P(y) n i=1 P(x i y)

20 Naive Bayes (III) Supervised Algorithms Training a NB classifier consists of estimating two probability distributions: P(y) and P(x i y) from training data Maximum likelihood estimates: P(y) = counts(y) num. examples P(x i y) = counts(x i, y) counts(y) In practice, smoothing is needed

21 Naive Bayes (III) Supervised Algorithms Training a NB classifier consists of estimating two probability distributions: P(y) and P(x i y) from training data Maximum likelihood estimates: P(y) = counts(y) num. examples P(x i y) = counts(x i, y) counts(y) In practice, smoothing is needed NB is simple and can train from small datasets (robustness)...but independence assumptions are not realistic

22 Decision Trees Supervised Algorithms Feature selection (information gain, Gini diversity, χ 2,... ) Stopping criterion Feature binarization, pruning, incremental learning,...

23 Linear Classifiers Supervised Algorithms Vector space in R n Define a hyperplane with a weight vector w and an offset (or threshold) b. Used as a classification rule: n +1 if x i w i + b > 0 h(x) = sign(w x + b) = i=1 1 otherwise

24 Linear Classifiers: Perceptron Supervised Algorithms Input: Training set {(x i, y i )} Output: Weight vector w w = 0 repeat for i = 1 to n do if y i (w x i + b) 0 then w = w + y i x i b = b + y i endif endfor until average(y i (w x i + b)) < ɛ On-line learning algorithm Additive error-driven updating. Convergence guaranteed if the training set is linearly separable

25 Linear Classifiers: SVM Supervised Algorithms Batch learning algorithm Margin maximization: w minimization, subject to constraints y i (w x i + b) 1 i

26 Linear Classifiers: Kernels What if the training set is not linearly separable? Supervised Algorithms

27 Linear Classifiers: Kernels Supervised Algorithms Mapping function to make data linearly separable Too costly to compute all f(x), but we actually need only f(x) f(y) Kernel functions efficiently compute K(x, y) = f(x) f(y)

28 Linear Classifiers: Kernels Supervised Algorithms Identity (linear kernel): K(x, y) = x y Polynomial kernel: K(x, y) = (x y + c) d Gaussian kernel (RBF): K(x, y) = exp(γ x z 2 ) Sigmoid kernel: K(x, y) = tanh(α(x z) + β)

29 Linear Classifiers: Kernels and dual problem Supervised Algorithms To use a kernel, we need to formulate the classifier in dual form, i.e. in terms of dot products between examples. Example: Perceptron. Classification rule: ŷ = sgn(w x + b) Due to update steps: w w + y i x i b w + y i We get: n w = α i y i x i b = i=1 n α i y i i=1 where α i is the number of misclassifications of x i

30 Linear Classifiers: Kernels and dual problem Supervised Algorithms Then, we can compute the perceptron prediction as: n n ŷ = sign(( α i y i x i ) x + α i y i ) = sgn( = sgn( i=1 n α i y i (x i x) + i=1 i=1 n α i y i ) i=1 n α i y i (x i x + 1)) i=1 Once the problem is formulated in terms of similarites (dot product) between examples, we can introduce the kernel: n ŷ = sgn( α i y i (K(x i, x) + 1)) i=1 Note that for K(x, y) = x y, this formulation is equivalent to the original perceptron.

31 Maximum Likelihood Estimation 1 Introduction Unsupervised & semi-supervised approaches Supervised Algorithms Maximum Likelihood Estimation Maximum Entropy Modeling

32 MLE & Smoothing Maximum Likelihood Estimation Estimate the probability of the target feature based on observed data. The prediction task can be reduced to having good estimations of the conditional distribution: P(Y X) = P(X, Y) P(X) MLE (Maximum Likelihood Estimation) P MLE (X) = count(x) N P MLE (Y X) = count(x,y) count(x) No probability mass for unseen events Unsuitable for NLP Data sparseness, Zipf s Law

33 Smoothing 1 - Adding Counts Maximum Likelihood Estimation Laplace s Law (adding one) P LAP (X) = count(x) + 1 N + B For large values of B too much probability mass is assigned to unseen events Lidstone s Law P LID (X) = count(x) + λ N + Bλ Usually λ = 0.5, Expected Likelihood Estimation. Equivalent to linear interpolation between MLE and uniform prior, with µ = N/(N + Bλ), P LID (X) = µ count(x) + (1 µ) 1 N B

34 Smoothing 2 - Discounting Counts Absolute Discounting Maximum Likelihood Estimation P ABS (X) = count(x) δ N if count(x) > 0 (B N 0 )δ/n 0 N otherwise Linear Discounting P LIN (X) = (1 α)count(x) N if count(x) > 0 α N 0 otherwise

35 Maximum Entropy Modeling 1 Introduction Unsupervised & semi-supervised approaches Supervised Algorithms Maximum Likelihood Estimation Maximum Entropy Modeling

36 Maximum Entropy / Log-linear Models Maximum Entropy Modeling Maximum Entropy: alternative estimation technique. Able to deal with different kinds of evidence ME principle: Do not assume anything about non-observed events. Find the most uniform (maximum entropy, less informed) probability distribution that matches the observations. Example: P(a, b) dans en à in?? 0.3 on??? total Observations P(a, b) dans en à in on total One possible p(a, b)

37 Maximum Entropy / Log-linear Models Maximum Entropy Modeling Maximum Entropy: alternative estimation technique. Able to deal with different kinds of evidence ME principle: Do not assume anything about non-observed events. Find the most uniform (maximum entropy, less informed) probability distribution that matches the observations. Example: P(a, b) dans en à in?? 0.3 on??? total Observations P(a, b) dans en à in on total Max.Entropy p(a, b)

38 ME Modeling Maximum Entropy Modeling Observed facts are constraints for the desired model p. Constraints take the form of feature functions: f i : ε {0, 1} The desired model must satisfy the constraints: p(x)f i (x) = p(x)f i (x) i x ε x ε that is, the expectation of each f i according to the model matches the actual observed expectation for f i

39 ME Modeling Example Maximum Entropy Modeling Example: ε = {in,on} {dans,en,à} p(a, b) dans en à in?? on?? total Observed fact: p(in,dans) + p(on,dans) = 0.4 Encoded as a constraint: E p (f 1 ) = 0.4 where: { 1 if b = dans f 1 (a, b) = 0 otherwise E p (f 1 ) = p(a, b)f 1 (a, b) (a,b) ε

40 ME Probability Model Maximum Entropy Modeling There is an infinite set P of probability models consistent with observations. We want to compute the maximum entropy model p = argmax H(p) p P H(p) = x ε p(x) log p(x)

41 Parameter Estimation Maximum Entropy Modeling Example: Maximum entropy model for translating prepositions from English to French No constraints P(a, b) dans en à in on total 1.0 With constraint p(dans) + p(en) = 0.4 P(a, b) dans en à in on total With constraints p(dans) + p(en) = 0.4; p(in) = Not so easy!

42 Parameter estimation Maximum Entropy Modeling Exponential models. (Lagrange multipliers optimization) p(a b) = 1 k Z(b) j=1 αf j(a,b) j α j > 0 Z(b) = a k i=1 αf i(a,b) i also formuled as p(a b) = 1 Z(b) exp( k j=1 λ jf j (a, b)) λ i = ln α i Each model parameter weights the influence of a feature. Several algorithms to compute optimal parameters: GIS, IIS, LM-BFGS,...

43 Improved Iterative Scaling (IIS) Maximum Entropy Modeling Input: Feature functions f 1... f n, empirical distribution p(f i ) Output: λ i : parameters for optimal model p Start with λ i = 0 for all i {1... n} Repeat For each i {1... n} do let λ i be the solution to p(b)p(a b)f i (a, b) exp( λ i a,b λ i λ i + λ i end for Until all λ i have converged n j=1 f j (a, b)) = p(f i )

44 Application to NLP Tasks Maximum Entropy Modeling Speech processing (Rosenfeld 94) Translation (Brown et al 90) Morphology (Della Pietra et al. 95) Clause boundary detection (Reynar & Ratnaparkhi 97) PP-attachment (Ratnaparkhi et al 94) PoS Tagging (Ratnaparkhi 96, Black et al 99) Partial Parsing (Skut & Brants 98) Full Parsing (Ratnaparkhi 97, Ratnaparkhi 99) Text Categorization (Nigam et al 99)

45 PoS Tagging (Ratnaparkhi 96) Maximum Entropy Modeling Probabilistic model over H T h i = (w i, w i+1, w i+2, w i 1, w i 2, t i 1, t i 2 ) { 1 if suffix(wi ) = ing t = VBG f j (h i, t) = 0 otherwise Compute p (h, t) using GIS Disambiguation algorithm: beam search argmax t 1...t n p(t h) = exp( j λ jf j (h, t)) Z(h) p(t 1... t n w 1... w n ) = argmax t 1...t n n p(t i h i ) i=1

46 Text Categorization (Nigam et al 99) Maximum Entropy Modeling Probabilistic model over W C d = (w 1, w 2... w N ) { N(d,w) f w,c (d, c) = N(d) if c = c 0 otherwise Compute p (c d) using IIS Disambiguation algorithm: Select class with highest probability argmax P(c d) c exp( i = argmax λ if i (d, c)) c Z(d) = argmax λ i f i (d, c) c i

47 Sentence Boundaries (Reynar and Ratnaparkhi 97) Maximum Entropy Modeling Feature Templates 1 The prefix 2 The suffix 3 The word previous 4 The word next 5 Whether prefix or suffix are in Abbreviations 6 Whether previous or next are in Abbreviations < b=no punc=. pref=mr suff= prev=2010. next=wayne > Two classes: y and n Disambiguation algorithm: Select class with highest probability argmax P(c d) c exp( i = argmax λ if i (d, c)) c Z(d) = argmax λ i f i (d, c) c i

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Text Categorization, Support Vector Machines Sameer Maskey Announcement Reading Assignments Will be posted online tonight Homework 1 Assigned and available from the course website

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science Neural Networks Prof. Dr. Rudolf Kruse Computational Intelligence Group Faculty for Computer Science kruse@iws.cs.uni-magdeburg.de Rudolf Kruse Neural Networks 1 Supervised Learning / Support Vector Machines

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Final Examination CS 540-2: Introduction to Artificial Intelligence

Final Examination CS 540-2: Introduction to Artificial Intelligence Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Maxent Models and Discriminative Estimation

Maxent Models and Discriminative Estimation Maxent Models and Discriminative Estimation Generative vs. Discriminative models (Reading: J+M Ch6) Introduction So far we ve looked at generative models Language models, Naive Bayes But there is now much

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Introduction to SVM and RVM

Introduction to SVM and RVM Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance

More information

Bowl Maximum Entropy #4 By Ejay Weiss. Maxent Models: Maximum Entropy Foundations. By Yanju Chen. A Basic Comprehension with Derivations

Bowl Maximum Entropy #4 By Ejay Weiss. Maxent Models: Maximum Entropy Foundations. By Yanju Chen. A Basic Comprehension with Derivations Bowl Maximum Entropy #4 By Ejay Weiss Maxent Models: Maximum Entropy Foundations By Yanju Chen A Basic Comprehension with Derivations Outlines Generative vs. Discriminative Feature-Based Models Softmax

More information

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

CS798: Selected topics in Machine Learning

CS798: Selected topics in Machine Learning CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Probabilistic Language Modeling

Probabilistic Language Modeling Predicting String Probabilities Probabilistic Language Modeling Which string is more likely? (Which string is more grammatical?) Grill doctoral candidates. Regina Barzilay EECS Department MIT November

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 23: Perceptrons 11/20/2008 Dan Klein UC Berkeley 1 General Naïve Bayes A general naive Bayes model: C E 1 E 2 E n We only specify how each feature depends

More information

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering CS 188: Artificial Intelligence Fall 2008 General Naïve Bayes A general naive Bayes model: C Lecture 23: Perceptrons 11/20/2008 E 1 E 2 E n Dan Klein UC Berkeley We only specify how each feature depends

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam. CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Support Vector Machine & Its Applications

Support Vector Machine & Its Applications Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

A Tutorial on Support Vector Machine

A Tutorial on Support Vector Machine A Tutorial on School of Computing National University of Singapore Contents Theory on Using with Other s Contents Transforming Theory on Using with Other s What is a classifier? A function that maps instances

More information

CS446: Machine Learning Fall Final Exam. December 6 th, 2016

CS446: Machine Learning Fall Final Exam. December 6 th, 2016 CS446: Machine Learning Fall 2016 Final Exam December 6 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 9, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

Graphical models for part of speech tagging

Graphical models for part of speech tagging Indian Institute of Technology, Bombay and Research Division, India Research Lab Graphical models for part of speech tagging Different Models for POS tagging HMM Maximum Entropy Markov Models Conditional

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

N-gram Language Modeling Tutorial

N-gram Language Modeling Tutorial N-gram Language Modeling Tutorial Dustin Hillard and Sarah Petersen Lecture notes courtesy of Prof. Mari Ostendorf Outline: Statistical Language Model (LM) Basics n-gram models Class LMs Cache LMs Mixtures

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are

More information

Material presented. Direct Models for Classification. Agenda. Classification. Classification (2) Classification by machines 6/16/2010.

Material presented. Direct Models for Classification. Agenda. Classification. Classification (2) Classification by machines 6/16/2010. Material presented Direct Models for Classification SCARF JHU Summer School June 18, 2010 Patrick Nguyen (panguyen@microsoft.com) What is classification? What is a linear classifier? What are Direct Models?

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Qualifier: CS 6375 Machine Learning Spring 2015

Qualifier: CS 6375 Machine Learning Spring 2015 Qualifier: CS 6375 Machine Learning Spring 2015 The exam is closed book. You are allowed to use two double-sided cheat sheets and a calculator. If you run out of room for an answer, use an additional sheet

More information

Sequential Supervised Learning

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev CS4705 Probability Review and Naïve Bayes Slides from Dragomir Radev Classification using a Generative Approach Previously on NLP discriminative models P C D here is a line with all the social media posts

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Classification: Maximum Entropy Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 24 Introduction Classification = supervised

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

CS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont)

CS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont) CS325 Artificial Intelligence Cengiz Spring 2013 Model Complexity in Learning f(x) x Model Complexity in Learning f(x) x Let s start with the linear case... Linear Regression Linear Regression price =

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning

More information

Information Extraction from Text

Information Extraction from Text Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information

More information

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer

More information

Mixtures of Gaussians continued

Mixtures of Gaussians continued Mixtures of Gaussians continued Machine Learning CSE446 Carlos Guestrin University of Washington May 17, 2013 1 One) bad case for k-means n Clusters may overlap n Some clusters may be wider than others

More information

Multivariate statistical methods and data mining in particle physics Lecture 4 (19 June, 2008)

Multivariate statistical methods and data mining in particle physics Lecture 4 (19 June, 2008) Multivariate statistical methods and data mining in particle physics Lecture 4 (19 June, 2008) RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information