2017 Predictive Analytics Symposium

Size: px
Start display at page:

Download "2017 Predictive Analytics Symposium"

Transcription

1 2017 Predictive Analytics Symposium Session 14, Introduction to Machine Learning Moderator: Robert Anders Larson, FSA, MAAA Presenter: Boyi Xie, Ph.D. SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer

2 Introduction to Machine Learning Boyi Xie, SOA Predictive Analytics Symposium, 09/14/2017

3 Outline What is Machine Learning Empirical Risk Minimization Cross validation Supervised Learning Unsupervised Learning Ranking Bayesian Models Applications 2

4 What is Machine Learning Interdisciplinary field: Computer Science Electrical Engineering Math Statistics Physics Operation Research Psychology etc A branch in Artificial Intelligence that focuses on theory of learning algorithms and their applications on problem solving 3

5 Empirical Risk Minimization Idea: minimize loss on the training data set Empirical: use training set to find the best fit Define a loss function of how good we fit a single point LL(yy, ff xx ) Empirical Risk: average loss over the dataset NN RR = 1 NN LL(yy ii, ff xx ii ) ii=1 Squared error LL yy ii, ff xx ii = 1 2 (yy ii ff xx ii ) 2 Absolute error LL yy ii, ff xx ii = yy ii ff xx ii 4

6 Empirical Risk Minimization Fitting a polynomial model 5

7 Regularized Risk Minimization We want to add a penalty to the complexity of the model, such as size of parameters This gives us the Regularized Risk 6

8 Evaluating Our Learned Function We minimized empirical risk to get θθ How well does ff(xx; θθ ) perform on future data? It should Generalize and have low True Risk: RR tttttttt (θθ) = PP xx, yy LL yy, ff xx; θθ dddddddd Can t compute true risk, instead use Testing Empirical Risk We randomly split data into training and testing portions { xx 1, yy 1,, (xx NN, yy NN } { xx NN+1, yy NN+1,, (xx NN+MM, yy NN+MM } Find θθ with training data: RR tttttttttt (θθ) = 1 NN LL(yy ii, ff xx ii ; θθ ) ii=1 NN Evaluate it with testing data: NN+MM RR tttttttt (θθ) = 1 MM LL(yy ii, ff xx ii ; θθ ) ii=nn+1 7

9 Empirical Risk Minimization Idea: minimize loss on the training data set Empirical: use training set to find the best fit RR rrrrrrrrrrrrrrrrrrrrrr θθ = RR eeeeeeeeeeeeeeeeee θθ + PPPPPPPPPPPPPP θθ Select Lambda which gives lowest cost Lambda measures simplicity of the model NN = 1 NN LL yy ii, ff xx ii ; θθ ii=1 + λλ θθ 2 2NN 8

10 Frequentists & Bayesians Frequentists (Neymann/ Pearson/ Wald) Classical/ objective view/ no priors Data are repeatable random sample there is a frequency Underlying parameters remain constant during this repeatable process Frequentist inference: estimate one best model parameters often use the ML estimator (unbiased & minimum variance) Bayesians (Bayes/ Laplace/ de Finetti) Unknown quantities are treated probabilistically and the state of the world can always be updated Data are observed from the realized sample Parameters are unknown and described probabilistically Put a distribution or pdf on all variables in the problem 9

11 Models of Interest Over Time A view on the development of machine learning ANOVA Naive Bayes classifier 1940 Perceptron Nearest Neighbor Algorithm CART Boltzmann machine Conditional Random Field Multinomial logistic regression Apriori algorithm Hidden Markov models CHAID Bootstrap (bagging) Support Vector Machines Gene expression programming Bayesian network Deep Learning Single-linkage clustering Fisher's linear discriminant K-means algorithm Random Forests Self-organizing map C4.5 MapReduce Quadratic classifiers Logistic regression Non-parametric Bayesian Boosting Backpropagation Word Embedding Expectation-maximization algorithm Latent Dirichlet Allocation

12 Supervised Learning Labels are given for data points. The goal is to maximize the log-likelihood of data given models. To develop a general rule based on inductive inference Example models Perceptron Logistic Regression Support Vector Machines K-NN Decision Tree Neural Networks/ Deep Learning 11

13 Perceptron Linear Discriminant Functions and Decision Hyperplanes gg xx = ww TT xx + ww 0 = 0 Assume there are two classes {-1}, {+1} are linearly separable. i.e. there exists a hyperplane, defined by ww TT xx = 0, such that wwtt xx > 0, xx {+1} ww TT xx < 0, xx { 1} We approach the problem as an optimization task Choose a cost function perceptron cost JJ ww = δδ xx ww TT xx xx YY Use gradient descent to iteratively minimize the cost function ww tt + 1 = ww tt ρρ tt ww ww 12

14 Perceptron Use gradient descent to find w by iteratively minimizing of the cost function ww ww tt + 1 = ww tt ρρ tt ww = ww tt ρρ tt δδ xx xx Algorithm xx YY Initialize, and choose a learning rate t=0 Repeat o Let o For i=1 to N o o Update w: o Adjust o t=t+1 Until ww 0 ρρ 0 YY = If δδ xxii ww(tt) TT xx ii 0 then YY = YY {xx ii } ρρ tt YY = ww tt + 1 = ww tt ρρ tt δδ xx xx xx YY 13

15 Logistic Regression In logistic regression, the logarithm of the likelihood ratios is modeled via linear functions ln PP(ww ii xx) PP(ww MM xx) = ww ii TT xx, for i=1, 2,, M-1 M is the number of classes We also need to ensure PP ww ii xx = 1 ii=1 Combining the above two equations, we have MM PP ww MM xx = MM 1 ii=1 exp(ww TT ii xx) The standard logistic function PP ww ii xx = exp(ww TT ii xx) 1 + MM 1 ii=1 exp(ww TT ii xx), for i=1, 2,, M-1 Also called multinomial logistic regression, or maximum entropy model 14

16 Support Vector Machines Search for the hyperplane that gives the maximum possible margin Linear kernel on Separable data Gaussian ke rne l on non-separable data using Soft-margin SVM for 15

17 Nearest Neighbors A non-parametric method used for classification and regression, where the decision is based on the k closest training examples in the feature space Example of k-nn classification with k = 3 (solid line circle). The majority vote is the predicted class. 16

18 Decision Tree A class of nonlinear classifiers where feature space is split into unique regions, corresponding to the classes, in a sequential manner. Key elements in designing a decision tree algorithm At each node, a set of candidate questions (or features) to be considered that would split into descendant nodes Splitting criterion, e.g. information gain, gini index Stop-splitting rule, e.g. min num of instances in a leaf A rule to assign each leaf to a specific class Example decision tree algorithms CART (Classification And Regression Tree) C4.5 CHAID 17

19 Ensemble Methods Bagging Boosting Can a weak learning algorithm be boosted into a strong algorithm? Choose a base classifier, i.e. a weak classifier A series of classifiers is then designed iteratively, employing each time the base classifier but using a different subset of the training set, according to a different weighting over the training samples to give emphasis to the hardest (incorrectly classified) samples The final classifier is obtained as a weighted average of the previous designed classifiers. Popular Models Random Forest Gradient Boosting Machine 18

20 Neural Network and Deep Learning A family of statistical learning models inspired by biological neural networks. The interconnection pattern between the different layers of neurons The learning process for updating the weights of the interconnections The activation function that converts a neuron's weighted input to its output A (shallow) neural network A deep network 19

21 Supervised Learning vs. Unsupervised Learning Recall in classification problem, we maximize the log-likelihood of data given models: NN ll = nn=1 log pp xx nn, yy nn ππ, μμ, Σ = NN nn=1 log ππ yynn NN( xx nn μμ yynn, Σ yynn ) If we don t know the class, treat it as a hidden variable, we maximize the log-likelihood with unlabeled data NN ll = nn=1 NN log pp xx nn ππ, μμ, Σ = nn=1 log yy=1 Instead of classification, we now have a clustering problem KK pp( xx nn, yy ππ, μμ, Σ) 20

22 K-Means Clustering K-Means solves a Chicken-and-Egg problem: if we knew classes, we can get the model (e.g. max likelihood) if we knew the model, we can predict the classes K-Means: guess a model, use it to classify the data, use classified data as labeled data to update the model, repeat. to minimize cost function 1. Input dataset min μμ min zz JJ μμ 1,, μμ KK, zz 1,, zz NN { xx 1,, xx NN } NN = nn=1 KK 2 zz nn (ii) xx nn μμ ii ii=1 2. Randomly initialize means 5. If any z has changed, go to 3 μμ 1,, μμ KK 3. Find closest mean for each point zz nn ii = 1, iiii ii = arg min 2 jj xx nn μμ jj 0, ooooooooooooooooo 4. Update means μμ ii = NN nn=1 xx nn zz nn (ii) NN zz nn (ii) nn=1 21

23 Expectation-Maximization (EM) EM is a soft version of K-Means zz nn ii = 1, iiii ii = arg min jj xx nn μμ jj 2 = arg maxjj NN xx nn μμ nn, II = arg max jj pp( xx nn μμ jj ) 0, ooooooooooooooooo Instead, consider soft percentage assignment of data points Expectation: soft class assignment (tt) (tt) ππ (tt) ii NN xx nn μμ nn, Σii ττ nn,ii = (tt) (tt) jj ππ jj NN xx nn μμ nn, Σii Maximization: (tt) (tt+1) nn ττ nn,ii xxnn μμ ii = (tt) nn ττ nn,ii mean (tt) (tt+1) nn ττ nn,ii ππ ii = NN mixing proportions (tt+1) nn ττ tt nn,ii ( xx nn μμ tt+1 ii ) ( xx nn μμ tt+1 ii ) TT Σ ii = tt nn ττ nn,ii covariance 22

24 Hierarchical Agglomerative Clustering A bottom up approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. Distance Metric Linkage Criteria Image: Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press

25 Ranking General Public Release Goal: Rank positive instances to the top and negative instances to the bottom An example application: Rank instances according to their likelihood to be involved in a serious event. i.e. top of the ranked list are serious instances; bottom of the ranked list are non-serious instances. Serious events are rare. More formally: the goal is to construct a ranking function f that gives a real valued score to each instance in X, that is, f : x->r. Note: Because of the rarity of the serious events, predicting the actual probability of a rare event may not be feasible (or accurate). Thus, we do not care about the actual values, but only the relative values btw instances. We can formulize a general form of our objective: where x: a data instance I: instances of serious events K: instances of non-serious events f: a function to map an instance to a seriousness score l: a loss function to penalize a small score of a serious event g: a price function to penalize a high ranked non-serious event Ranked Lis t of Instances more serious less serious ff(xx ii ) ff(xx ii ) ff(xx kk ) ff(xx ii ) ff(xx kk ) ff(xx kk ) 24

26 Bayesian Models an example in topic modelling General Public Release Distinguish words or cluster words by semantics auto engine bonnet tyres lorry boot Synonymy car emissions hood make model trunk Polysemy make hidden Markov model emissions normalize Small cosine dissimilarity but are related Large cosine dissimilarity but not truly related Uncover the latent relation between documents and words 25

27 Probabilistic Models BOW & NB General Public Release Models related to aspect model (topic model) Document is a mixture of underlying (latent) K aspects (topics) Each aspect is represented by a distribution of words p(w z) Unigram model Mixture of unigram model Prob. Latent Semantic Indexing La te n t Dirichlet Allocation Mixture of unigram model (Naïve Bayes) For each of M documents Choose a topic z Z i Choose N words by drawing each one independently from a multinomial conditioned on z w i1 w 2i w 3i w 4i One topic per document 26

28 Probabilistic Models PLSI General Public Release Models related to aspect model (topic model) Document is a mixture of underlying (latent) K aspects (topics) Each aspect is represented by a distribution of words p(w z) Unigram model Mixture of unigram model Prob. Latent Semantic Indexing La te n t Dirichlet Allocation Hofmann99 (Probabilistic Latent Semantic Indexing model) For each word of document d in the training set d # parameters: km+kv Choose a topic z according to a multinomial conditioned on the index d Generate the word by drawing from a multinomial conditioned on z. z d1 z d2 z d3 z d4 Documents can have multiple topics w d1 w d2 w d3 w d4 27

29 Probabilistic Models LDA General Public Release Models related to aspect model (topic model) Document is a mixture of underlying (latent) K aspects (topics) Each aspect is represented by a distribution of words p(w z) Unigram model Mixture of unigram model Prob. Latent Semantic Indexing La te n t Dirichlet Allocation Blei03 (Latent Dirichlet Allocation model) For each document Choose θ ~ Dirichlet(α) Choose a topic z n ~ Multinomial(θ) Choose a word w n from p(w n z n,β), a multinomial probability conditioned on the topic z w 1 n Topic mixture weights is hidden random variable z 1 θ z 2 z 3 w 2 w 3 z 4 w 4 z 1 w 1 z 2 w 2 α θ β z 3 w 3 z 4 w 4 # parameters: k+kv z 1 w 1 z 2 w 2 θ z 3 w 3 28 z 4 w 4

30 Machine Learning Applications Speech Recognition (HMM, Neural Nets/ Deep Learning, ) Computer Vision (Neural Nets/ Deep Learning, SVM, ) Time Series Prediction (HMM, Gaussian Process, Bayesian, ) Genomics (HMM, SVM, ) Natural Language Processing (HMM, CRF, Bayesian, Deep Learning, ) Information Retrieval (Entropy, SVM, Clustering, ) Medical (Decision Tree, HMM, Bayesian, ) Behavior/ Games (Reinforcement Learning, Bayesian, Deep Learning ) 29

31 Reference Christopher M. Bishop Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Ve rla g New York, Inc., Secaucus, NJ, USA. Tony Jebara, Machine Learning course materials, Department of Computer Science, Columbia University Sergios Theodoridis and Konstantinos Koutroumbas Pattern Recognition, Fourth Edition (4th ed.). Academic Press. Daniel Jurafsky and James H. Martin Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (1st ed.). Prentice Hall PTR, Upper Saddle River, NJ, USA. Jiawei Han Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Rudin, Cynthia "The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List." Journal of Machine Learning Research 10 (2009) Thomas Hofmann Probabilistic latent semantic analysis. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence (UAI'9 9 ), Kathryn B. Laskey and Henri Prade (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, David M. Blei, Andrew Y. Ng, Michael I. Jordan Latent Dirichlet Allocation. Journal of Machine Learning Research 3(Jan): Wikipedia pages and images of machine learning topics Other papers published in academic journals and conferences 30

32 31

33 Legal notice 2017 Swiss Re. All rights reserved. You are not permitted to create any modifications or derivative works of this presentation or to use it for commercial or other public purposes without the prior written permission of Swiss Re. The information and opinions contained in the presentation are provided as at the date of the presentation and are subject to change without notice. Although the information used was taken from reliable sources, Swiss Re does not accept any responsibility for the accuracy or comprehensiveness of the details given. All liability for the accuracy and completeness thereof or for any damage or loss resulting from the use of the information contained in this presentation is expressly excluded. Under no circumstances shall Swiss Re or its Group companies be liable for any financial or consequential loss relating to this presentation. 32

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Vector Data: Clustering: Part II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 3, 2017 Methods to Learn: Last Lecture Classification Clustering Vector Data Text Data Recommender

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Lecture 6. Notes on Linear Algebra. Perceptron

Lecture 6. Notes on Linear Algebra. Perceptron Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Support Vector Machine and Neural Network Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 1 Announcements Due end of the day of this Friday (11:59pm) Reminder

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Loss Functions, Decision Theory, and Linear Models

Loss Functions, Decision Theory, and Linear Models Loss Functions, Decision Theory, and Linear Models CMSC 678 UMBC January 31 st, 2018 Some slides adapted from Hamed Pirsiavash Logistics Recap Piazza (ask & answer questions): https://piazza.com/umbc/spring2018/cmsc678

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Classification: k nearest neighbors Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 28 Introduction Classification = supervised method

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Statistical Learning Reading Assignments

Statistical Learning Reading Assignments Statistical Learning Reading Assignments S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Press, 2001 (Chapt. 3, hard copy). T. Evgeniou, M. Pontil, and T. Poggio, "Statistical

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Nonlinear Classification

Nonlinear Classification Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

More information

Chapter 6: Classification

Chapter 6: Classification Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 6: Classification Lecture: Prof. Dr. Thomas

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Mixture of Gaussians Models

Mixture of Gaussians Models Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification 10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Intelligent Systems Statistical Machine Learning

Intelligent Systems Statistical Machine Learning Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2014/2015, Our tasks (recap) The model: two variables are usually present: - the first one is typically discrete k

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Kernel Density Topic Models: Visual Topics Without Visual Words

Kernel Density Topic Models: Visual Topics Without Visual Words Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Deep Learning for NLP

Deep Learning for NLP Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

day month year documentname/initials 1

day month year documentname/initials 1 ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1) 11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes

More information

Text mining and natural language analysis. Jefrey Lijffijt

Text mining and natural language analysis. Jefrey Lijffijt Text mining and natural language analysis Jefrey Lijffijt PART I: Introduction to Text Mining Why text mining The amount of text published on paper, on the web, and even within companies is inconceivably

More information

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes Week 10 Based in part on slides from textbook, slides of Susan Holmes Part I Linear regression & December 5, 2012 1 / 1 2 / 1 We ve talked mostly about classification, where the outcome categorical. If

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Radial Basis Function (RBF) Networks

Radial Basis Function (RBF) Networks CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks 1 Function approximation We have been using MLPs as pattern classifiers But in general, they are function approximators Depending

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

ECE-271B. Nuno Vasconcelos ECE Department, UCSD

ECE-271B. Nuno Vasconcelos ECE Department, UCSD ECE-271B Statistical ti ti Learning II Nuno Vasconcelos ECE Department, UCSD The course the course is a graduate level course in statistical learning in SLI we covered the foundations of Bayesian or generative

More information

Nearest Neighbors Methods for Support Vector Machines

Nearest Neighbors Methods for Support Vector Machines Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad

More information

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations : DAG-Structured Mixture Models of Topic Correlations Wei Li and Andrew McCallum University of Massachusetts, Dept. of Computer Science {weili,mccallum}@cs.umass.edu Abstract Latent Dirichlet allocation

More information

Latent Variable Models Probabilistic Models in the Study of Language Day 4

Latent Variable Models Probabilistic Models in the Study of Language Day 4 Latent Variable Models Probabilistic Models in the Study of Language Day 4 Roger Levy UC San Diego Department of Linguistics Preamble: plate notation for graphical models Here is the kind of hierarchical

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First

More information

Final Exam, Fall 2002

Final Exam, Fall 2002 15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang

More information