CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
|
|
- Juliet Sims
- 5 years ago
- Views:
Transcription
1 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18
2 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2
3 Estimating! "#$ instead of! %&! "#$ h =! %& h + over;it penalty regularization estimates this quantity 3
4 Constrain hypothesis sets to prevent them from being able to fit noise Regularization Learning algorithms are optimization problems and regularization imposes constraints on that optimization 4
5 minimize ; <=> A = ; BC ( A E Ω ( Regularization. Ridge: Ω ( = * ( + / Low Order: Ω ( = * 5( + /. Lasso: Ω ( = * +,-. +,- +,- ( + 5
6 Estimating! "#$ instead of! %&! "#$ h =! %& h + over:it penalty validation estimates this quantity 6
7 Test sets Estimate! "#$ % using the error on some test dataset t& $'($,! $'($ % If & $'($ is not involved in the training process, then )! $'($ %! "#$ % >, 2/ = & $'($ 7
8 More test data leads to a tighter bound on " #$% & ' but fewer training data generally means the learned & ' is worse i.e. " #$% & ' tends to increases as (! decreases Picking! " #$% & " #$% & ' " %+,% & ' +. / 0 probability) (with high Return & but bound " #$% & using " %+,% & ' +. / 0 Practical rule of thumb:! = 2 3 8
9 Test sets Estimate! "#$ % using the error on some test dataset & $'($,! $'($ % If & $'($ is not involved in the training process, then )! $'($ %! "#$ % >, 2/ = & $'($ 9
10 ! "#$%& is used to build a finite set of candidate hypotheses: H ($) = {, -.,, 0.,,, 2. }.! ($) is used to select the hypothesis from H ($) :, 2 Validation set. 5 6 ($), ", 2 > ; 2 >?.0@A B. 6 ($), 2 C DE 2 B. 6 89", 2. 6 ($), 2 + C DE 2 B with high probability 10
11 ! "# vs.! $%& vs.! '()' Bias! "# Incredibly biased! $%& Slightly biased! '()' Not biased Relationship to * +,- VC-bound Hoeffding s bound (multiple hypotheses) Hoeffding s bound (single hypothesis) 11
12 Occam s Razor The simplest model that fits the data is also the most plausible Three Learning Principles Sampling Bias If the data is sampled in a biased way, learning will produce a similarly biased outcome Data Snooping If a data set has affected any step in the learning process, its ability to assess the outcome has been compromised 12
13 Not Tired! " Tired Backpack! $! #! # Both, Lunchbox Backpack Metro! $ Both, Lunchbox Drive No Rain Rain No Rain Rain! % Metro Bike Metro Before, After During Bike Metro Decision Tree: Example 13
14 Initialize the tree as a single leaf that contains all labels ID3 Learning Algorithm While an impure leaf (not all labels are the same) Pick an arbitrary impure leaf Find the feature, ", with the largest information gain relative to the labels in that leaf Create a child (or split) for each unique value of " Assign each label in the original leaf to one of its children depending on its corresponding " value The original leaf is no longer a leaf All of its children are new leaves 14
15 Intuitive / explainable Decision Tree / ID3 Pros Can handle categorical and real-valued features Automatically performs feature selection The ID3 algorithm has a preference for shorter trees (simpler hypotheses) 15
16 The ID3 algorithm is greedy so no optimality guarantee Overfitting! Heuristics ( regularization ): Decision Tree / ID3 Cons Do not split leaves that are past a fixed depth! or have fewer than " labels or where the maximal information gain is less than # Pruning ( validation ): Evaluate each split using a validation set and remove the one that most improves the validation error 16
17 Short for Bootstrap aggregating Combines the prediction of many independent hypotheses to reduce variance Bagging Bootstrapping: A statistical method for estimating properties of a distribution, given (potentially a small number of) samples from that distribution Relies on resampling the samples with replacement many, many times Aggregating: Combining multiple hypotheses, h ", h $,, h &, to arrive at a single hypothesis 17
18 Predictions made by trees trained on similar datasets are highly correlated Split-Feature Randomization To decorrelate these predictions, randomly limit the features available at each iteration of the ID3 algorithm Every time the ID3 algorithm goes to split an impure leaf, randomly select! < # features and only allow the algorithm to use one of those! features. For classification, a common choice is! = # For regression, a common choice is! = % & 18
19 Input:! = # $, & $, # ', & ',, # ), & ), *, + Random Forests For, = 1, 2,, * Create a dataset,! /, by sampling 0 points from! with replacement Learn a decision tree, 1 /, using! / and the ID3 algorithm with split-feature randomization Output: 1, the aggregated hypothesis 19
20 Another ensemble method (like bagging) that combines the predictions of multiple hypotheses Boosting Aims to reduce the bias of a weak or highly biased hypothesis set (can also reduce variance) Intuition: iteratively reweight inputs, giving more weight to inputs that are difficult-to-predict correctly Fundamentally requires that we have access to weak learners that are better than random chance 20
21 Input:! " = 1, +1, ( A d a B o o s t Initialize input weights: ) * (,),, )/, = * / For 0 = 1,, ( 1. Train a weak learner (hypothesis), h 2, by minimizing the weighted training error 2. Compute the weighted training error of h 2 : / 3 2 = 4 ) 27* 5 h : 5 56* 3. Compute the importance of h 2 : ; 2 = 1 2 log Update the weights: ) 5 2 = ) 5 2 B C7DE if h = : 5 C D E if h : 5 = ) 5 27* C 7D E H I J E K 2 Output: an aggregated hypothesis L M 8 = sign Q M 8 M = sign 4 ; 2 h * 21
22 Why AdaBoost? 1. If you only have access to weak learners 2. and want your final hypothesis to be a weighted combination of weak learners, 3. then Adaboost greedily minimizes the exponential loss:! h, $, & =! () * + * 1. Because of computational constraints 2. Because weak learners are not great on their own 3. Because the exponential loss upper bounds binary error 22
23 Nearest Neighbor Intuition Classify a point as the label of the most similar training point Use Euclidean distance as the similarity metric:! #, # % = # # % = (, )*+ # ) # ) % - 23
24 1 - The Nearest Neighbor Hypothesis ! # = % & #
25 Generalization of Nearest Neighbor Claim:! "#$ for the nearest neighbor hypothesis is not much worse than the best possible! "#$! Formally: with high probability,! "#$ % 2! "#$ % as ) Interpretation: half of the data s predictive power is in the nearest neighbor! 25
26 Classify a point as the most common label among the labels of the! nearest training points When! = 1, $ is the nearest neighbor hypothesis complicated decision boundaries; may overfit!-nearest Neighbors (!NN) When! = %, $ always predicts the most common label in the training dataset no decision boundaries;may underfit! controls the complexity of the hypothesis set! affects how well the learned hypothesis will generalize Practical rules of thumb:! = 3! = % Cross-validation 26
27 Pros: Intuitive / explainable No training / retraining!nn Pros and Cons Cons: Provably near-optimal in terms of " #$% Computationally expensive Always needs to store all data: & '( Computing ) + requires computing, +, and finding the! closest points: & '( + ' log! Suffers from the curse of dimensionality 27
28 The fundamental assumption of!nn is that similar points or points close to one another should have the same label Curse of Dimensionality The closer two points are, the more confident we can be that they will have the same label As the number of dimensions the input has grows, the less likely it is that two random points will be close As the number of dimensions the input has grows, it takes more points to cover the input space 28
29 More data Curing the Curse of Dimensionality Fewer dimensions Blessing of non-uniformity: data from the real world is rarely uniformly distributed across the input space 29
30 No training required! Memory: " #$ Computing % ' : " #$ + # log! Computational Cost of!nn Idea: preprocess inputs in order to speed up predictions Reduce the number of inputs held in memory by eliminating redundancies Organize inputs in data structures that make searching for nearest neighbors more efficient 30
31 1 Data Condensing Reduce the number of inputs while maintaining the same predictions on all inputs Let! " be the #NN hypothesis when trained on " $ " is training-set consistent if:! & ' ( =! " ' ( ' ( " Training-set consistent is a much weaker constraint than decision-boundary consistent
32 Intuition: split the inputs into clusters, groups of points that are close to one another but far from other groups. Organizing the Inputs If an input point is really close to one group of points and really far from the other groups then we can skip searching through the other groups and just look for nearest neighbors in the close group! We want cluster centers to be far apart and cluster radii to be small 32
33 !NN only considers some points and weights them equally Radial Basis Functions (RBF) RBFs consider all points but weight them unequally Intuition: all points are useful but some points are more useful than others! Bonus: no need to choose!. " $ = sign * +,- / ,- /
34 The margin of a separating hyperplane is the distance between the hyperplane and the nearest training point Maximal Margin Linear Separators Questions: How can we efficiently find a maximal-margin linear separator? Why are linear separators with larger margins better? What can we do if the data is not linearly separable? 34
35 minimize 1 2 >? > subject to < = = + > A =, < = E Maximizing the Margin This optimization problem to be solved (approximately) using quadratic programming (QP) in! " # time Let H % = linear separators with minimum margin '. If the input space is a "-dimensional sphere of radius (, then: ) *+ H % min ", ( 1 '
36 Linearly Inseparable Data What can we do if the data is not linearly separable? Accept some non-zero in-sample error How much in-sample error should we tolerate? Apply a non-linear transformation that shifts the data into a space where it is linearly separable How can we pick a non-linear transformation? 36
37 minimize 1 2 )* ) + K 2 5 "34 subject to ( " ) * + " + ) - 1! " + ", ( " B! " subject to! " 0 _ # 1,, E Soft-Margin SVMs! " is the soft error on the # $% training 5 2 "34 If! " > 1, then ( " ) * + " + ) - < 0 + ", ( " is incorrectly classified If 0 <! " < 1, then ( " ) * + " + ) - > 0 + ", ( " is correctly classified but inside the margin! " is the soft in-sample error 37
38 Decide on a transformation Φ: # % Nonlinear Dual SVMs Find a maximal-margin separating hyperplane in the transformed space, &', &' *, by solving the QP: 6 minimize subject to : 3 = 0 subject to : 3 : 7 Φ ; < 3 Φ ; I 1,, L Return the corresponding predictor in the original space: M ; = sign : 3 Φ ; < 3 Φ ; + &' * 3 Q R S * 6 38
39 Perceptrons Low-Dimensional Input Space High-Dimensional Input Space! "# High Low Generalization Good Bad SVMs Low-Dimensional Input Space High-Dimensional Input Space! "# High Low Generalization Good Okay $ %& H = ) + 1 vs. $ %& H / min ),
40 Depending on the transformation, Φ, and the dimensionality of the original input space. ", computing Φ $ can be computationally expensive Computing Φ % $ requires & " % time Efficiency High-dimensional transformations can result in good hypotheses (as long as they don t overfit) but highdimensional transformations are expensive Approach: instead of computing Φ $, find a function ' ( s.t. ' ( $, $ * = Φ $, Φ $ * $, $ * / 40
41 Decide on a (valid) kernel function! " Nonlinear Dual SVMs Find a maximal-margin separating hyperplane in the transformed space, #$, #$ ', by solving the QP: 3 minimize 1 2 / subject to / = 0 subject to / ! " 8 0, 8 4 / E 1,, H Return the corresponding predictor in the original space: 3 I 8 = sign / 0 M N O ' ! " 8 0, 8 + #$ ' 41
42 - - - h # h # h " h " $ & = () *+, h " &,h # &,*+, h " &,h # & 42
43 ! # = %& '() h + #,h - #,'() h + #,h - # Building a Network # + 4 -,5 4 +,+ 4 +,5 h + # ! # 4 -,+ # 2 4 +,2 4 -,2 h - #
44 Replace the hard sign function with a soft, differentiable approximation, & Feed-Forward Neural Network (NN) 1 $ ( 1 & 1 & & ' $ $ ) h " & $ & 44
45 The architecture of a NN is the vector dimensionalities: " = " $, " &,, " ( Architecture " = ) the NN has ) layers, ) 1 hidden layers and 1 output layer Layer - has dimension " (/) Layer - has " (/) + 1 nodes, counting the bias node Every architecture corresponds to a hypothesis set A hypothesis is specified by setting all the weights 45
46 The weights between layer! 1 and layer! are a matrix: $ % R ( )*+,- ( ) Weights, Signals and Outputs % / 01 is the weight between node 2 in layer! 1 and node 3 in layer! Every node has an incoming signal, 4 % 1, and an outgoing output, 5 % 1 : 5 % = % and 4 % = $ % ; 5 %<- 46
47 Input: weights! ",,! % and a query point ' Initialize ' ( = 1 ' Forward Propagation For + = 1,,, -. =!. / '.0" '. = Output: ' ",, ' % 47
48 Input: weights! ",,! % and a query point ' Run forward propagation to get ' ",, ' % Backpropagation Initialize ( " % = 2 ' " %, -. 1 '" % For 0 = 1 1,, 1. Compute ( 2 =! 23" ( 23" 1 ' 2 ' 2 Output: ( ",, ( % 48
49 Input:! ",,! % and & = ( ", ) ",, ( *, ) * Initialize +,* = 0 and. / = 0! / for 1 = 1,, 3 For 4 = 1,, 5 Run forward propagation to get ( ",, ( % Computing Gradients Run backpropagation to get 6 ",, 6 % Increment +,* : +,* = +,* + " * ( % 9 ), For 1 = 1,, 3 / Compute., = ( /:" 6 / ; Increment. / :. / =. / + " *., Output:. ",,. %, the gradients of +,* w.r.t! ",,! % / 49
50 Both forward and backpropagation contain matrix multiplications involving! ",,! % both take time '! " + +! % Complexity Computing * ",, * % requires running forward and backpropagation for each training point,, - / Each iteration of gradient descent for a neural network takes time ' 0! " + +! % Use stochastic gradient descent instead! Also use parallelization and GPUs / TPUs! 50
51 Stochastic Gradient Descent for Neural Networks Input:! = # $, & $,, # (, & (, * + Initialize all weights, + $,,, + - to small, random numbers and set. = 0 While some termination condition is not satisfied For 0 = 1,, 2 Randomly select a point # $, & $! Compute = 8 9 : h # <, > $,,, > - Update, 5 5 :, >?$ Increment.:. =. + 1 Output:, > $,,, > - =, > 5 * , & < 51
52 Initialization: Randomness is good for non-convex optimization Initialize weights by sampling from! 0, $ % Initialization and Termination Termination: For complicated surfaces, the gradient s magnitude is not a good metric for proximity to a minimum A simple solution: combine multiple termination criteria e.g. stop if enough iterations have passed and the improvement in error is small 52
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationSupport Vector Machines. Machine Learning Fall 2017
Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationBig Data Analytics. Special Topics for Computer Science CSE CSE Feb 24
Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationFINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE
FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationLearning from Examples
Learning from Examples Data fitting Decision trees Cross validation Computational learning theory Linear classifiers Neural networks Nonparametric methods: nearest neighbor Support vector machines Ensemble
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationAE = q < H(p < ) + (1 q < )H(p > ) H(p) = p lg(p) (1 p) lg(1 p)
1 Decision Trees (13 pts) Data points are: Negative: (-1, 0) (2, 1) (2, -2) Positive: (0, 0) (1, 0) Construct a decision tree using the algorithm described in the notes for the data above. 1. Show the
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationDecision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014
Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists
More informationLearning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I
Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationDecision Tree Learning
Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationLecture 3: Decision Trees
Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationEnsemble Methods and Random Forests
Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationLearning Theory Continued
Learning Theory Continued Machine Learning CSE446 Carlos Guestrin University of Washington May 13, 2013 1 A simple setting n Classification N data points Finite number of possible hypothesis (e.g., dec.
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationCS6375: Machine Learning Gautam Kunapuli. Decision Trees
Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s
More informationData Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition
Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationNeural Networks and Ensemble Methods for Classification
Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationDecision Trees: Overfitting
Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationSupport Vector Machines
Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification
More informationChapter 6: Classification
Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?
More informationDecision Tree Learning Lecture 2
Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over
More informationMachine Learning & Data Mining
Group M L D Machine Learning M & Data Mining Chapter 7 Decision Trees Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University Top 10 Algorithm in DM #1: C4.5 #2: K-Means #3: SVM
More informationDecision Trees (Cont.)
Decision Trees (Cont.) R&N Chapter 18.2,18.3 Side example with discrete (categorical) attributes: Predicting age (3 values: less than 30, 30-45, more than 45 yrs old) from census data. Attributes (split
More informationCOMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37
COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation
More informationMachine Learning 4771
Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines
More informationNumerical Learning Algorithms
Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................
More informationDan Roth 461C, 3401 Walnut
CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn
More informationDecision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1
Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating
More informationLecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan
Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof Ganesh Ramakrishnan October 20, 2016 1 / 25 Decision Trees: Cascade of step
More informationMidterm Exam Solutions, Spring 2007
1-71 Midterm Exam Solutions, Spring 7 1. Personal info: Name: Andrew account: E-mail address:. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationBagging and Other Ensemble Methods
Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationLearning Ensembles. 293S T. Yang. UCSB, 2017.
Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)
More informationFinal Exam, Fall 2002
15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationKernelized Perceptron Support Vector Machines
Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:
More informationDecision Trees. Danushka Bollegala
Decision Trees Danushka Bollegala Rule-based Classifiers In rule-based learning, the idea is to learn a rule from train data in the form IF X THEN Y (or a combination of nested conditions) that explains
More informationMidterm Exam, Spring 2005
10-701 Midterm Exam, Spring 2005 1. Write your name and your email address below. Name: Email address: 2. There should be 15 numbered pages in this exam (including this cover sheet). 3. Write your name
More informationLecture 3: Decision Trees
Lecture 3: Decision Trees Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning Lecture 3: Decision Trees p. Decision
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More informationClassification: Decision Trees
Classification: Decision Trees These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online. Feel
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationMachine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationEnsemble Methods for Machine Learning
Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied
More informationMachine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)
Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationIntroduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees
Introduction to ML Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Why Bayesian learning? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical
More informationRecitation 9. Gradient Boosting. Brett Bernstein. March 30, CDS at NYU. Brett Bernstein (CDS at NYU) Recitation 9 March 30, / 14
Brett Bernstein CDS at NYU March 30, 2017 Brett Bernstein (CDS at NYU) Recitation 9 March 30, 2017 1 / 14 Initial Question Intro Question Question Suppose 10 different meteorologists have produced functions
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationArtificial Intelligence Roman Barták
Artificial Intelligence Roman Barták Department of Theoretical Computer Science and Mathematical Logic Introduction We will describe agents that can improve their behavior through diligent study of their
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More information