Multi-label learning from batch and streaming data

Size: px
Start display at page:

Download "Multi-label learning from batch and streaming data"

Transcription

1 Multi-label learning from batch and streaming data Jesse Read Télécom ParisTech École Polytechnique Summer School on Mining Big and Complex Data 5 September 26 Ohrid, Macedonia

2 Introduction x = Binary classification y {sunset, non sunset}

3 Introduction x = Multi-class classification y {sunset, people, foliage, beach, urban}

4 Introduction x = Multi-label classification y {sunset, people, foliage, beach, urban} {, } 5 = [,,,,, ] i.e., multiple labels per instance instead of a single label.

5 Introduction K = 2 K > 2 L = binary multi-class L > multi-label multi-output also known as multi-target, multi-dimensional. Figure: For L target variables (labels), each of K values. multi-output can be cast to multi-label, just as multi-class can be cast to binary set of labels (L) is predefined (contrast to tagging/keyword assignment)

6 Introduction Table: Academic articles containing the phrase multi-label classification (Google Scholar, 26) year in text in title

7 Single-label vs. Multi-label Table: Single-label Y {, } X X 2 X 3 X 4 X 5 Y ? Table: Multi-label Y {λ,..., λ L } X X 2 X 3 X 4 X 5 Y. 3 {λ 2, λ 3 }.9 {λ }. {λ 2 }.8 2 {λ, λ 4 }. 2 {λ 4 }. 3?

8 Single-label vs. Multi-label Table: Single-label Y {, } X X 2 X 3 X 4 X 5 Y ? Table: Multi-label [Y,..., Y L ] 2 L X X 2 X 3 X 4 X 5 Y Y 2 Y 3 Y ????

9 Outline Introduction 2 Applications 3 Methods 4 Label Dependence 5 Multi-label Classification in Data Streams

10 Text Categorization and Tag Recommendation For example, the IMDb dataset: Textual movie plot summaries associated with genres (labels). Also: Bookmarks, Bibtex, del.icio.us datasets.

11 Text Categorization and Tag Recommendation For example, the IMDb dataset: Textual movie plot summaries associated with genres (labels). Also: Bookmarks, Bibtex, del.icio.us datasets. abandoned accident... violent wedding i X X 2... X X Y Y 2... Y 27 Y horror romance... comedy action

12 Labelling Images Images are labelled to indicate multiple concepts multiple objects multiple people e.g., Associating Scenes with concepts {beach, sunset, foliage, field, mountain, urban}

13 Labelling Audio For example, labelling music with emotions, concepts, etc. amazed happy relaxing quiet sad angry amazed-surprised happy-pleased relaxing-calm quiet-still sad-lonely angry-aggressive amazed happy relaxing quiet sad angry

14 Related Tasks multi-output classification: outputs are nominal rank gender group X X 2 X 3 X 4 X 5 x x 2 x 3 x 4 x 5 M 2 x x 2 x 3 x 4 x 5 4 F 2 x x 2 x 3 x 4 x 5 2 M multi-output regression: outputs are real-valued price age percent X X 2 X 3 X 4 X 5 x x 2 x 3 x 4 x x x 2 x 3 x 4 x x x 2 x 3 x 4 x label ranking, i.e., preference learning λ 3 λ λ 4... λ 2 aka multi-target, multi-dimensional

15 Related Areas multi-task learning: multiple tasks, shared representation sequential learning: predict across time indices instead of across label indices; each label may have a different input y y 2 y 3 y 4 x x 2 x 3 x 4 structured output prediction: assume particular structure amoung outputs, e.g., pixels, hierarchy n n2 n3 n4 n5 n6 n7 n8 n9 n 6 5 n n2 n3 n4 n5 8 7 n6 n7 n8 n9 n2 9 n2 n22 n23 n24 n25

16 Streaming Multi-label Data Many advanced applications must deal with data streams: Data arrives continuously, potentially infinitely Prediction must be made immediately Expect concept drift For example, Demand prediction Intrusion detection Pollution detection

17 Demand Prediction Outputs (labels) represent the demand at multiple points. Figure: Stops in the greater Helsinki region. The Kutsuplus taxi service could be called to any of these. We are interested in predicting, for each label [y,..., y L ], p(y j = x) probability of demand at j-th node

18 Localization and Tracking Outputs represent points in space which may contain an object (y j = ) or not (y j = ). Observations are given as x Figure: Modelled on a real-world scenario; a room with a single light source and a number of light sensors. We are interested in predicting, for each label [y,..., y L ], y j = if j-th tile occupied

19 Route/Destination Forecasting Personal nodes of a traveller and predicted trajectory L number of geographic points of interest x observed data (e.g., GPS, sensor activity, time of day) p(yj = x) probability an object is present at the j-th node {xi, yi }N i= training data

20 Outline Introduction 2 Applications 3 Methods 4 Label Dependence 5 Multi-label Classification in Data Streams

21 Multi-label Classification x y y 2 y 3 y 4 and then, ŷ j = h j (x) = argmax p(y j x) for index, j =,..., L y j {,} ŷ = h(x) = [ŷ,..., ŷ 4 ] [ = p(y x),, argmax = argmax y {,} [ f (x),, f 4 (x) y 4 {,} ] = f (W x) This is the Binary Relevance method (BR). ] p(y 4 x)

22 BR Transformation Transform dataset... X Y Y 2 Y 3 Y 4 x () x (2) x (3) x (4) x (5)... into L separate binary problems (one for each label) X Y x () x (2) x (3) x (4) x (5) X Y 2 x () x (2) x (3) x (4) x (5) X Y 3 x () x (2) x (3) x (4) x (5) X Y 4 x () x (2) x (3) x (4) x (5) 2 and train with any off-the-shelf binary base classifier.

23 Why Not Binary Relevance? BR ignores label dependence, i.e., p(y x) = which may not always hold! L p(y j x) j= Example (Film Genre Classification) p(y romance x) p(y romance x, y horror )

24 Why Not Binary Relevance? BR ignores label dependence, i.e., p(y x) = which may not always hold! L p(y j x) j= Table: Average predictive performance (5 fold CV, EXACT MATCH) L BR MCC Music Scene Yeast Genbase Medical Enron Reuters.29.37

25 Modelling label dependence, Classifier Chains x y y 2 y 3 y 4 and, p(y x) = p(y x) L p(y j x, y,..., y j ) j=2 ŷ = argmax p(y x) y {,} L

26 Bayes Optimal CC Example p(y = [,, ]) =.4 2 p(y = [,, ]) =.4 3 p(y = [,, ]) = p(y = [,, ]) = p(y = [,, ]) =.9 return argmax y p(y x) Search space of {, } L paths is too much

27 CC Transformation Similar to BR: make L binary problems, but include previous predictions as feature attributes, X Y x () x (2) x (3) x (4) x (5) X Y Y 2 x () x (2) x (3) x (4) x (5) X Y Y 2 Y 3 x () x (2) x (3) x (4) x (5) X Y Y 3 Y 3 Y 4 x () x (2) x (3) x (4) x (5) and, again, apply any classifier (not necessarily a probabilistic one)!

28 Greedy CC x y y 2 y 3 y 4 L classifiers for L labels. For test instance x, classify ŷ = h ( x) 2 ŷ 2 = h 2 ( x, ŷ ) 3 ŷ 3 = h 3 ( x, ŷ, ŷ 2 ) 4 ŷ 4 = h 4 ( x, ŷ, ŷ 2, ŷ 3 ) and return ŷ = [ŷ,..., ŷ L ]

29 Example x y y 2 y 3 ŷ = h( x) = [?,?,?]

30 Example x.6 y y 2 y 3.4 ŷ = h ( x) = argmax y p(y x) = ŷ = h( x) = [,?,?]

31 Example y x y 2 y 3 ŷ = h ( x) = argmax y p(y x) = 2 ŷ 2 = h 2 ( x, ŷ ) =... = ŷ = h( x) = [,,?]

32 Example ŷ = h( x) = [,, ] y x y 2 y 3 ŷ = h ( x) = argmax y p(y x) = 2 ŷ 2 = h 2 ( x, ŷ ) =... = 3 ŷ 3 = h 3 ( x, ŷ, ŷ 2 ) =... =

33 Example x y y 2 y 3 ŷ = h ( x) = argmax y p(y x) = 2 ŷ 2 = h 2 ( x, ŷ ) =... = ŷ = h( x) = [,, ] 3 ŷ 3 = h 3 ( x, ŷ, ŷ 2 ) =... = Improves over BR; similar build time (if L < D); able to use any off-the-shelf classifier for h j ; parralelizable But, errors may be propagated down the chain

34 Example Monte-Carlo search for CC Sample T times... p([,, ]) = =.252 p([,, ]) = =.288 return argmax yt p(y t x) Tractable, with similar accuracy to (Bayes Optimal) PCC Can use other search algorithms, e.g., beam search

35 Does Label-order Matter? x x y y 2 y 3 y 4 y 4 y 2 y 3 y In theory, models are equivalent, since p(y x) = p(y x)p(y 2 y, x) = p(y 2 x)p(y y 2, x) vs but we are estimating p from finite and noisy data; thus ˆp(y x)ˆp(y 2 y, x) ˆp(y 2 x)ˆp(y y 2, x) and in the greedy case, ˆp(y 2 y, x) ˆp(y 2 ŷ, x) = ˆp(y 2 y = argmax ˆp(y x) x) y

36 Does Label-order Matter? In theory, models are equivalent, since p(y x) = p(y x)p(y 2 y, x) = p(y 2 x)p(y y 2, x) but we are estimating p from finite and noisy data; thus ˆp(y x)ˆp(y 2 y, x) ˆp(y 2 x)ˆp(y y 2, x) and in the greedy case, ˆp(y 2 y, x) ˆp(y 2 ŷ, x) = ˆp(y 2 y = argmax ˆp(y x) x) y The approximations cause high variance on account of error propagation. We can can reduce variance with an ensemble of classifier chains 2 we can search space of chain orders (huge space, but a little search makes a difference)

37 Label Powerset Method (LP) One multi-class problem (taking many values), x y, y 2, y 3, y 4 ŷ = argmax p(y x) where Y {, } L y Y Each value is a label vector, 2 L in total, but typically, only the occurrences of the training set. (in practice, Y size of training set, and Y 2 L )

38 Label Powerset Method (LP) Transform dataset... X Y Y 2 Y 3 Y 4 x () x (2) x (3) x (4) x (5)... into a multi-class problem, taking 2 L possible values: X Y 2 L x () x (2) x (3) x (4) x (5) 2... and train any off-the-shelf multi-class classifier.

39 Issues with LP complexity (up to 2 L combinations) imbalance: few examples per class label overfitting: how to predict new value? Example In the Enron dataset, 44% of labelsets are unique (to a single training example or test instance). In del.icio.us dataset, 98% are unique.

40 Meta Labels Improving the label-powerset approach: decomposition of label set into M subsets of size k (k < L) pruning, such that, e.g., Y,2 {[, ], [, ], [, ]} combine together with random subspace method with a voting scheme Y Y 2 Y 3 Y,2 Y 2,3 X X 2 X 3

41 Meta Labels Improving the label-powerset approach: decomposition of label set into M subsets of size k (k < L) pruning, such that, e.g., Y,2 {[, ], [, ], [, ]} combine together with random subspace method with a voting scheme Method Inference Complexity Label Powerset O(2 L D) Pruned Sets O(P D) Decomposition / RAkEL O(M 2 k D) Meta Labels O(M P D ) where P < 2 L and P < 2 k, D < D.

42 Summary of Mehtods Two views of a multi-label problem of L labels: L binary problems 2 a multi-class problem with up to 2 L classes Problem Transformation: Transform data into subproblems (binary or multi-class) 2 Apply some off-the-shelf base classifier or, Algorithm Adaptation: Take a suitable single-label classifier (knn, neural networks, decision trees... ) 2 Adapt it (if necessary) for multi-label classification

43 Outline Introduction 2 Applications 3 Methods 4 Label Dependence 5 Multi-label Classification in Data Streams

44 Label Dependence in MLC Common approach: Present methods to measure label dependence 2 find a structure that best represents this and then apply classifiers, compare results to BR. Example x x y 24 y 234 y 3 y y 4 y 2 y y 2 y 3 y 4 Link particular labels (nodes) together (CC-based methods) Form particular label subsets (LP-based methods)

45 Label Dependence in MLC Common approach: Present methods to measure label dependence 2 find a structure that best represents this and then apply classifiers, compare results to BR. Problem Measuring label dependence is expensive, models built on it often do not improve over models built on random dependence! Problem For some metrics (such as Hamming-loss / label accuracy), knowledge of label dependence is theoretically unnecessary!

46 Marginal label dependence Marginal dependence When the joint is not the product of the marginals, i.e., p(y 2 ) p(y 2 y ) p(y )p(y 2 ) p(y, y 2 ) Y Y 2 Estimate from co-occurrence frequencies in training data

47 Example Marginal label dependence amazed amazed happy relaxing quiet sad angry happy relaxing quiet sad angry Figure: Music dataset - Mutual Information

48 Example Marginal label dependence beach beach sunset foliage field mountain urban sunset foliage field mountain urban Figure: Scene dataset - Mutual Information

49 Marginal label dependence Marginal dependence When the joint is not the product of the marginals, i.e., p(y 2 ) p(y 2 y ) p(y )p(y 2 ) p(y, y 2 ) Y Y 2 Estimate from co-occurrence frequencies in training data Used for regularization/constraints: ŷ = h(x) makes a prediction 2 ŷ = g(ŷ) regularizes the prediction

50 Conditional label dependence But at classification time, we condition on the input! Conditional dependence... conditioned on input observation x. p(y 2 y, x) p(y 2 x) X Y Y 2 Have to build and measure models Indication of conditional dependence if the performance of LP/CC exeeds that of BR errors among the binary models are correlated But what does this mean?

51 Conditional label dependence But at classification time, we condition on the input! Conditional independence... conditioned on input observation x. For example, p(y 2 ) p(y 2 y ) but p(y 2 x) = p(y 2, y, x) X Y Y 2 vs X Y Y 2 Have to build and measure models Indication of conditional dependence if the performance of LP/CC exeeds that of BR errors among the binary models are correlated But what does this mean?

52 The LOGICAL Problem Example (The LOGICAL Toy Problem) OR AND XOR X X 2 Y Y 2 Y 3 Each label is a logical operation (independent of the others!)

53 The LOGICAL Problem x x x or and xor or and xor or, and, xor Figure: BR (left), CC (middle), LP (right) Table: The LOGICAL problem, base classifier logistic regression. Metric BR CC LP HAMMING SCORE.83.. EXACT MATCH.5.. Dependence is introduced by an inadequate model! Dependence depends on the model.

54 The LOGICAL Problem.2 Model Y OR x,x 2 Y OR =.2 Model Y AND x,x 2 Y AND =. Y OR =. Y AND = x 2.4 x x x.2. Model Y XOR x,x 2 Y XOR = Y XOR =.8 x x or and xor x Figure: Binary Relevance (BR): linear decision boundary (solid line, estimated with logistic regression) not viable for Y XOR label

55 Solution via Structure.2 Model Y OR x,x 2 Y OR =.2 Model Y AND x,x 2 Y AND =. Y OR =. Y AND = x 2.4 x x x Model Y XOR Y XOR,x,x 2 or x and xor x x 2 y OR Figure: Classifier chains (CC): linear model now applicable to Y XOR

56 Solution via Multi-class Decomposition.2..8 Model Y OR,AND,XOR x,x 2 Y [OR,AND,XOR] =[,,] Y [OR,AND,XOR] =[,,] Y [OR,AND,XOR] =[,,] x.6 x or, and, xor x Figure: Label Powerset (LP): solvable with one-vs-one multi-class decomposition for any (e.g., linear) base classifier.

57 Solution via Con. Independence.3 Model Y OR,z,z 2 Y OR =.2 Model Y AND,z,z 2 Y AND =.2 Y OR =. Y AND = z z z z.6.4 Model Y XOR,z,z 2 Y XOR = Y XOR = φ(x) z or and xor z Figure: Solution via latent structure (e.g., random RBF) to new input space z; creating independence: p(y XOR z, y OR, y AND ) p(y XOR z).

58 Solution via Suitable Base-classifier no x = yes x 2 = x 2 = no yes no yes [,, ] [,, ] [,, ] [,, ] Figure: Solution via non-linear classifier (e.g., Decision Tree). Leaves hold examples, where y = [y AND, y OR, y XOR ]

59 Detecting Dependence Conditional label dependence and the choice of base model are inseperable. y j = h j (x) + ɛ j y k = h k (x) + ɛ k ɛ = [ɛor, ɛxor]. ɛ = [ɛor, ɛxor] ɛxor.2.5 ɛxor ɛor ɛor Figure: Errors from logistic regression (left) and decision tree (right).

60 A fresh look at Problem Transformation Y 3 Y 2 Y Y 2 Y 3 Y Y,2 Y 2,3 X X 2 X 3 X X 2 X 3 Figure: Standard methods can be viewed as ( deep /cascaded) basis functions on the label space.

61 Label Dependence: Summary Marginal dependence for regularization Conditional dependence... depends on the model... may be introduced Should consider together: base classifier label structure inner-layer structure An open problem Much existing research is relevant (latent-variable models, neural networks, deep learning,... )

62 Outline Introduction 2 Applications 3 Methods 4 Label Dependence 5 Multi-label Classification in Data Streams

63 Classification in Data Streams Setting: sequence is potentially infinite high speed of arrival stream is one-way Implications work in limited memory adapt to concept drift

64 Multi-label Streams Methods Batch-incremental Ensemble 2 Problem transformation with an incremental base learner 3 Multi-label knn 4 Multi-label incremental decision trees 5 Neural networks

65 Batch-Incremental Ensemble Build regular multi-label models on batches/windows of instances (typically in a [weighted] ensemble). [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] x x x x x x x x,,,,,,,, y y y y y y y y }{{}}{{} h h 2 [ ] x, y 9 [ ] x ŷ A common approach in the literature, and can be surprisingly effective Free choice of base classifier (e.g., C4.5, SVM) What batch size to use? Too small = models insufficient Too large = slow to adapt Too many batches = too slow

66 Problem Transformation with Incremental Base Learner Use an incremental learner (Naive Bayes, SGD, Hoeffding trees) with any problem transformation method (BR, LP, CC,... ) Simple implementation, x Risk of overfitting (e.g., with classifier chains) y y 2 y 3 y 4 Concept drift may invalidate structure Limited choice of base learner (must be incremental)

67 Multi-label knn Maintain a dynamic buffer of instances, compare each test instance x to the k neighbouring instances, { ( ) k i x ŷ j = (i) Ne( x) y(i) j >.5 otherwise efficient wrt L... but not wrt D limited buffer size, not suitable for all problems

68 ML Incremental Decision Trees A small sample can suffice to choose a splitting attribute (Hoeffding bound gives guarantees) As in regular tree, with modified splitting criteria, e.g., H ML (S) = L j= c {,} P(y j = c) log 2 P(y j = c) Examples with multiple labels collect at the leaves. no x = yes [,, ], [,, ] x 2 = no yes [,, ] [,, ]

69 ML Incremental Decision Trees Fast, and usually competitive, But tree may grow very conservatively,... and need to replace it (or part of it) when concept changes. no x = yes [,, ], [,, ] x 2 = no yes [,, ] [,, ] Place multi-label classifiers at the leaves of the tree and wrap it in an ensemble.

70 Neural networks Each label is a node. Trained with SGD y y 2 y 3 y 4 ŷ = W x x x 2 x 3 x 4 x 5 Can be applied natively W (t+) j,k g = E(W) = W (t) j,k + λg j,k One layer = BR, should use hidden layers to model label dependence / improve performance Hyper-parameter tuning can be tedious Relatively poor performance in empirical comparisons on standard data streams (improving now with recent advances in SGD, more common use of basis expansion)

71 Multi-label Data Streams: Issues Overfitting Class imbalance Multi-dimensional concept drift Labelled examples difficult to obtain (semi-supervised) Dynamic label set Time dependence

72 Multi-label Concept Drift Consider the relative frequencies of the j-th and k-th labels: [ ] pj p j,k p k (if marginal independence then p j,k = p j p k ). Possible drift: p j increases (j-th label relatively more frequent) p j and p k both decrease (label cardinality decreasing) p j,k changes relative to p j p k (change in marginal dependence relation between the labels)

73 Multi-label Concept Drift And when conditioned on input x, we consider the relative frequencies/values of the j-th and k-th errors: [ ] ɛj ɛ j,k ɛ k (if conditional independence, then ɛ j,k ɛ j ɛ k ). Possible drift: ɛ j increases (more errors on j-th label) ɛ j and ɛ k both increase (more errors) ɛ j,k changes relative to ɛ j, ɛ k (change in conditional dependence relation)

74 Example Recall the distribution of errors.3 ɛ = [ɛor, ɛxor].25.2 ɛxor ɛor This shape may change over time and structures may need to be adjusted to cope

75 Dealing with Concept Drift Possible approaches Just ignore it batch models must be replaced anyway, knn and SGD adapt; in other cases can use weighted ensembles/fading factor Monitor a predictive performance statistic with a change detector (e.g., window based-detection, ADWIN) and reset models Monitor the distribution with a change detector (e.g., window based, KL divergence) and reset/recalibrate models (similar to single-labelled data, except more complex measurement)

76 Dealing with Unlabelled Instances Ignore instances with no label Use active learning to get good labels Use predicted labels (self-training) Use an unsupervised process for example clustering, latent-variable representations.

77 Dealing with Unlabelled Instances Use an unsupervised process for example clustering, latent-variable representations. z t = g(x t ) 2 ŷ t = h(z t ) 3 update g with (x t, z t ) 4 update h with (z t, y t ) (if y t is available) update(, ) y t ŷ t predict( ) z t update(, ) z t propagate( ) x t Can also be as one single model

78 Summary Multi-label classification is an active area of research, relevant to many real-world problems Methods that deal appropriately wiith label dependence can achieve significant gains over a naive approach Many multi-label problems come in the form of a data stream, incurring particular challenges

79 Multi-label learning from batch and streaming data Jesse Read Télécom ParisTech École Polytechnique Summer School on Mining Big and Complex Data 5 September 26 Ohrid, Macedonia

A Deep Interpretation of Classifier Chains

A Deep Interpretation of Classifier Chains A Deep Interpretation of Classifier Chains Jesse Read and Jaakko Holmén http://users.ics.aalto.fi/{jesse,jhollmen}/ Aalto University School of Science, Department of Information and Computer Science and

More information

Classifier Chains for Multi-label Classification

Classifier Chains for Multi-label Classification Classifier Chains for Multi-label Classification Jesse Read, Bernhard Pfahringer, Geoff Holmes, Eibe Frank University of Waikato New Zealand ECML PKDD 2009, September 9, 2009. Bled, Slovenia J. Read, B.

More information

Regret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss

Regret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss Regret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss Krzysztof Dembczyński 1, Willem Waegeman 2, Weiwei Cheng 1, and Eyke Hüllermeier 1 1 Knowledge

More information

Optimization of Classifier Chains via Conditional Likelihood Maximization

Optimization of Classifier Chains via Conditional Likelihood Maximization Optimization of Classifier Chains via Conditional Likelihood Maximization Lu Sun, Mineichi Kudo Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan Abstract

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Decision Trees: Overfitting

Decision Trees: Overfitting Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9

More information

Progressive Random k-labelsets for Cost-Sensitive Multi-Label Classification

Progressive Random k-labelsets for Cost-Sensitive Multi-Label Classification 1 26 Progressive Random k-labelsets for Cost-Sensitive Multi-Label Classification Yu-Ping Wu Hsuan-Tien Lin Department of Computer Science and Information Engineering, National Taiwan University, Taiwan

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Ensemble Methods for Machine Learning

Ensemble Methods for Machine Learning Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

arxiv: v1 [cs.lg] 24 Feb 2016

arxiv: v1 [cs.lg] 24 Feb 2016 Feature ranking for multi-label classification using Markov Networks Paweł Teisseyre Institute of Computer Science, Polish Academy of Sciences Jana Kazimierza 5 01-248 Warsaw, Poland arxiv:1602.07464v1

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Decision trees COMS 4771

Decision trees COMS 4771 Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?

Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture? Bayes Rule CS789: Machine Learning and Neural Network Bayesian learning P (Y X) = P (X Y )P (Y ) P (X) Jakramate Bootkrajang Department of Computer Science Chiang Mai University P (Y ): prior belief, prior

More information

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

Learning theory. Ensemble methods. Boosting. Boosting: history

Learning theory. Ensemble methods. Boosting. Boosting: history Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

Final Exam, Fall 2002

Final Exam, Fall 2002 15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same

More information

Lecture 3: Multiclass Classification

Lecture 3: Multiclass Classification Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in

More information

Statistical NLP for the Web

Statistical NLP for the Web Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Empirical Risk Minimization, Model Selection, and Model Assessment

Empirical Risk Minimization, Model Selection, and Model Assessment Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,

More information

Active Learning for Sparse Bayesian Multilabel Classification

Active Learning for Sparse Bayesian Multilabel Classification Active Learning for Sparse Bayesian Multilabel Classification Deepak Vasisht, MIT & IIT Delhi Andreas Domianou, University of Sheffield Manik Varma, MSR, India Ashish Kapoor, MSR, Redmond Multilabel Classification

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach

Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach Krzysztof Dembczyński and Wojciech Kot lowski Intelligent Decision Support Systems Laboratory (IDSS) Poznań

More information

Online Estimation of Discrete Densities using Classifier Chains

Online Estimation of Discrete Densities using Classifier Chains Online Estimation of Discrete Densities using Classifier Chains Michael Geilke 1 and Eibe Frank 2 and Stefan Kramer 1 1 Johannes Gutenberg-Universtität Mainz, Germany {geilke,kramer}@informatik.uni-mainz.de

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Support Vector and Kernel Methods

Support Vector and Kernel Methods SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007 Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

ESS2222. Lecture 4 Linear model

ESS2222. Lecture 4 Linear model ESS2222 Lecture 4 Linear model Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Logistic Regression Predicting Continuous Target Variables Support Vector Machine (Some Details)

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

On the Problem of Error Propagation in Classifier Chains for Multi-Label Classification

On the Problem of Error Propagation in Classifier Chains for Multi-Label Classification On the Problem of Error Propagation in Classifier Chains for Multi-Label Classification Robin Senge, Juan José del Coz and Eyke Hüllermeier Draft version of a paper to appear in: L. Schmidt-Thieme and

More information

Be able to define the following terms and answer basic questions about them:

Be able to define the following terms and answer basic questions about them: CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Summary and discussion of: Dropout Training as Adaptive Regularization

Summary and discussion of: Dropout Training as Adaptive Regularization Summary and discussion of: Dropout Training as Adaptive Regularization Statistics Journal Club, 36-825 Kirstin Early and Calvin Murdock November 21, 2014 1 Introduction Multi-layered (i.e. deep) artificial

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

CS60021: Scalable Data Mining. Large Scale Machine Learning

CS60021: Scalable Data Mining. Large Scale Machine Learning J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes dr. Petra Kralj Novak Petra.Kralj.Novak@ijs.si 7.11.2017 1 Course Prof. Bojan Cestnik Data preparation Prof. Nada Lavrač: Data mining overview Advanced

More information

IFT Lecture 7 Elements of statistical learning theory

IFT Lecture 7 Elements of statistical learning theory IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Neural Networks DWML, /25

Neural Networks DWML, /25 DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Tutorial 2. Fall /21. CPSC 340: Machine Learning and Data Mining

Tutorial 2. Fall /21. CPSC 340: Machine Learning and Data Mining 1/21 Tutorial 2 CPSC 340: Machine Learning and Data Mining Fall 2016 Overview 2/21 1 Decision Tree Decision Stump Decision Tree 2 Training, Testing, and Validation Set 3 Naive Bayes Classifier Decision

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

Statistics and learning: Big Data

Statistics and learning: Big Data Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

Machine Learning Lecture 12

Machine Learning Lecture 12 Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

Tufts COMP 135: Introduction to Machine Learning

Tufts COMP 135: Introduction to Machine Learning Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Predicting flight on-time performance

Predicting flight on-time performance 1 Predicting flight on-time performance Arjun Mathur, Aaron Nagao, Kenny Ng I. INTRODUCTION Time is money, and delayed flights are a frequent cause of frustration for both travellers and airline companies.

More information

Decision Tree Learning Lecture 2

Decision Tree Learning Lecture 2 Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over

More information

MIRA, SVM, k-nn. Lirong Xia

MIRA, SVM, k-nn. Lirong Xia MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information