Fast and Effective Limited Pass Learning for Large Data Quantities. Presented by: Nayyar Zaidi
|
|
- Godwin Dorsey
- 5 years ago
- Views:
Transcription
1 Fast and Effective Limited Pass Learning for Large Data Quantities Presented by: Nayyar Zaidi
2 RMSE , 2, 3, 4, 5, 6, Training set size 7, 8, 9, 1,,
3 RMSE , 2, 3, 4, 5, 6, Training set size 7, 8, 9, 1,,
4 Machine Learning Computational Models of Neural Networks Machine Learning Artificial Neural Networks < SVM (Linear) Support Vector Machines Deep Learning Random Boosting Forest Kernel Perceptron Relational Databases Data Warehousing Small Machine Learning WWW iphone Kaggle Data Scale Big Machine Learning
5 Good Old-Fashioned Machine Learning 1) Regularization 2) Non-parameteric Methods Nearest Neighbour Tree-based Methods 3) Power of Ensemble Random Forest Boosting 4) Kernel Theory 5) Batch Optimization Methods 6) Bayesian vs. Frequentist 7) Feature Selection Large Scale Machine Learning 1) 2) 3) 4) 5) 6) Future Machine Learning Feature Engineering Deep Learning SGD Minimal Pass Learning Automatic Regularization Minimal Tuning 1) Single Pass Learning 2) Automatic Feature Engineering 3) No Tuning Parameter
6 Good Old-Fashioned Machine Learning 1) Regularization 2) Non-parameteric Methods Nearest Neighbour Tree-based Methods 3) Power of Ensemble Random Forest Boosting 4) Kernel Theory 5) Batch Optimization Methods 6) Bayesian vs. Frequentist 7) Feature Selection Large Scale Machine Learning (LSML) Future Machine Learning (FML) 1) 2) 3) 4) 5) 6) 1) Single Pass Learning 2) Automatic Feature Engineering 3) No Tuning Parameter Feature Engineering Deep Learning SGD Minimal Pass Learning Automatic Regularization Minimal Tuning
7 GOFML vs. LSML RMSE , 2, 3, 4, 5, 6, Training set size 7, 8, 9, 1,,
8 Objectives of the Talk Summarize the properties of Large-scale Machine Learning (LSML) algorithms Propose two Fast and Effective limited pass learning algorithms Outline of the Talk Introduction Background (NB, RF and Bayesian Networks) Algorithm I: FewPla Algorithm II: Selective ALR Discussion Target Audience Research Scientists Machine Learning Practitioners Ph.D Students Final year under-graduate students
9 Three Properties of LSML Minimal Pass Learning Minimal Tuning Parameters Low-Bias Learning
10 Two Extremes NB High-bias, low-variance Extremely easy to train Single Pass Minimal Tuning Parameters #Passes through the Data Random Forest RF Low-bias, high-variance Multiple Pass Some Tuning Parameters Naive Bayes Bias of the learner #Tuning Parameters
11 Random Forest Naive Bayes 2 Y 1 2 n Bayesian Network Factorizes joint distribution, P(,Y) Maximum Likelihood Estimate of Log-Likelihood Good for small data Non-parameteric Method Bagged data + Bagged variables Trees are grown to full Many variants such as Gradient boosting gives state of the art results Good for large datasets
12 Variance RF NB RF Loss.6.15 NB -1 Loss RF Comparison b/w NB and RF RF Bias NB NB Training Time 25 NB RF1 Classification Time 4 NB RF Semi-naive Bayes Methods AnDE, TAN, KDB, etc. 5 All Big All Big
13 Bayesian Network Classifiers Y 3 PB (y, x) = y A BN is characterised by two set of parameters: B = (G, ) Learning a BN: Structure Learning + Parameters Learning Graph has nice properties Structure Learning: K2 and many variants Parameter Learning: Accumulating Counts Maximise the log-likelihood n Y i=1 xi y, i (x). Qn y i=1 xi y, i (x) PB (y, x) Qn PB (y x) = =P. PB (x) y 2Y y i =1 xi y, i (x) LL(B) = N j=1 = N j=1 log PB (y (j), x(j) ), n log y(j) + i=1! log x(j) i (x(j) ). i
14 Bias.6 k-dependence Bayesian Estimator.5 KDB1.4 Y NB n Bias KDB2 Known as KDB Calculate Mutual information MI(i,Y) for all attributes Calculate MI(i,j Y) Sort all attributes based on MI(i,Y) For every i'th attribute, Make class Y its parent Set K = min(i-1, k) Choose K attributes from to i-1 attributes based on their MI(i,j Y) score KDB1 Bias KDB KDB2
15 Covtype KDB NB RF.3.25 Error K= K=1 K=2 K=3 K=4 Comparative Analysis of the Performance of KDB with NB and RF
16 Covtype KDB NB RF.3.25 Error Comparative Analysis of the Performance of KDB with NB and RF
17 KDB - Model Structure P (, Y ) = P (Y )P (1 Y )P (2 Y, 1)P (3 Y, 1, 2) Y 3 Y 2 Y 1 Y 1=a 2 1 1=a 1=c 1=b 1=c 1=b 3 2=a 2=b An Example of Parameter Structure for KDB (Tries)
18 Selective KDB Adds an additional third pass to KDB Selects the best k and the best number of attributes Good for reducing size of KDB A maximum kmax has to be specified SKDB explores the symmetry of KDB attributes and k s by using the leaveone-out-cross-validation PK= (, Y ) = P (1 Y )P (2 Y )P (3 Y )P (4 Y )P (5 Y ) PK=1 (, Y ) = P (1 Y )P (2 Y, 1 )P (3 Y, 1 )P (4 Y, 2 )P (5 Y, 4 ) PK=2 (, Y ) = P (1 Y )P (2 Y, 1 )P (3 Y, 1, 2 )P (4 Y, 2, 3 )P (5 Y, 4, 3 ) PK=3 (, Y ) = P (1 Y )P (2 Y, 1 )P (3 Y, 1, 2 )P (4 Y, 2, 3, 1 )P (5 Y, 4, 3, 2 ) PK=4 (, Y ) = P (1 Y )P (2 Y, 1 )P (3 Y, 1, 2 )P (4 Y, 2, 3, 1 )P (5 Y, 4, 3, 2, 1 )
19 P (4 Y, 3 ) K=1 4 a b 3
20 P (4 Y, 3, 2 ) K=2 4 a b 3 a a b b 2
21 P (4 Y, 3, 2, 1 ) K=3 4 a b 3 a a b b 2 a b a b a b a b 1
22 Selective KDB Attributes First Pass Second Pass (Third Pass Begins) For each data point x: Subtract it from the counts table For each k ( to K) For each Attribute i (in ordered set) LF[k][i] += LossFunction(P[y x],yx) i= i=1 i=2. k= k=1 k=2. k Select k and i from the table with best LF Trim data structure Loss Function Table 1. Martinez, A. and Webb, G. and Li, S. and Zaidi, N. Scalable Learning of Bayesian Network Classifiers, JMLR, pp 1-35, 216
23 Covtype KDB skdb NB RF.3.25 Error K= K=1 K=2 K=3 K=4 Comparative Analysis of the Performance of KDB and skdb
24 1 Covtype 8 KDB skdb No. of Parameters K= K=1 K=2 K=3 K=4 Comparative Analysis of the No. of Parameters of KDB and skdb
25 Discriminative Semi-naive Bayes Classifiers PB (y, x) = y LL(B) = N j=1 = log y(j) j=1 CLL(B) = j=1 = N j=1 = N j=1 i=1 xi y, i (x). 1) 2) 3) 4) 5) log PB (y (j), x(j) ), N N n Y NB -> NBd TAN -> TANd KDB -> KDBd AnDE -> AnDEd BN -> BNd! n + log x(j) i (x(j) ). i i=1 log PB (y (j) x(j) PB (y (j), x(j) ) log y(j) + n i=1 log Y y PB (y, x(j) )A, log x(j) i (x(j) ) i 1! 1 Y n log y xi y, i (x(j) ) A. y i =1 1. Zaidi, N. and Webb G. and Carman, M. and Petitjean, F. and Buntine, W. and Hynes, M. and De Sterck H. Efficient Parameter Learning of Bayesian Network Classifiers, Machine Learning, Volume 16, pp 1-44, 216
26 P (, Y ) = P (Y )P (1 Y )P (2 Y, 1)P (3 Y, 1, 2) 3 Y Counts Parameters Gradients 3 1=a 1=c 1=b 2=a 2=b An Example of Parameter Structure for dkdb (Tries)
27 Covtype KDB dkdb NB RF.3.25 Error K= K=1 K=2 K=3 K=4 Comparative Analysis of the Performance of KDB and dkdb
28 Covtype KDB skdb dkdb NB RF.3.25 Error K= K=1 K=2 K=3 K=4 Comparative Analysis of the Performance of KDB, skdb and dkdb
29 Covtype KDB skdb dkdb sdkdb NB RF.3.25 Error K= K=1 K=2 K=3 K=4 Comparative Analysis of the Performance of KDB, skdb, dkdb, sdkdb
30 Discriminative Semi-naive Bayes Classifiers #Passes through the Data RF K-DBd K-DB 2-DBd 1-DBd 2-DB 1-DB NB Bias of the learner #Tuning Parameters 1. Zaidi, N. and Webb G. Fast and Efficient Single Pass Bayesian Learning, Advances in Knowledge Discovery and Data Mining, pp , Martinez, A. and Webb, G. and Li, S. and Zaidi, N. Scalable Learning of Bayesian Network Classifiers, JMLR, pp 1-35, 216
31 The Equivalence y PNB (y x) = PC c=1 exp( PLR (y x) = PC c=1 Q exp(log y + i log y,i,xi ) PNB (y x) = PC. P c=1 exp(log c + j log c,j,xj ) i y,i,xi c y Q + exp( P c j P + c,j,xj log y,i,xi! log c,j,xj! c,j,xj ) exp(. PNB (y x) = PC c=1 exp( PWC (y x) = PC c=1 y log y + exp( c P log c + i y,i,xi P j y + exp( c log y,i,xi ) c,j,xj log c,j,xj ) P + i y y,i,xi log c!. i y,i,xi ) P j log y! c c,j,xj y,i,xi ) P j c,j,xj ).. 1. Zaidi, N. and Carman, M. and Cerquides, J. and Webb G. Naive-Bayes Inspired Effective Pre-Conditioners for Speeding-up Logistic Regression, ICDM, pp , Zaidi, N. and Webb, G. Preconditioning an Artificial Neural Network Using Naive Bayes, Advances in Knowledge Discovery and Data Mining, pp , 216
32 Discriminative Semi-naive Bayes Methods CLLd (B) = N y (j) + j=1 CLLe (B) = N i=1 log y(j) + j=1 CLLw (B) n n i=1 = Q (j) xi y (j), i (x(j) ) log x(j) y(j),q N i y log y + j=1 Y y i (x! Y 1 Y n Y log y (j) log x(j) y(j),q (x(j) ) A (j) ) n i=1 y log y y y xi y, n Y i =1 Q i (x) xi y, y (j) n Y i =1 i =1 log xi y,qi (x) Q i (x) xi (j) y (j),! i Q i (x (j) ) 1 A i 1 log xi y,qi (x) A
33 15 Covtype dkdb () wkdb () Negative Log-Likelihood Covtype dkdb (3) wkdb (3) No. of Iterations Negative Log-Likelihood Negative Log-Likelihood 11 No. of Iterations Covtype Covtype -1 dkdb (2) wkdb (2) No. of Iterations 1 dkdb (1) wkdb (1) Negative Log-Likelihood -.5 Negative Log-Likelihood No. of Iterations 15 Covtype dkdb (4) wkdb (4) No. of Iterations
34 FewPLA Discriminative Selective K-DB Salient Features: 1. Nine Pass Learning 2. Low-Bias 3. Minimal Tuning Parameters MNIST.25 SKDB FewPLa NB RF 2-Pass KDB 1-Pass SKDB 5-Pass for SGD optimising Error Step-size tuning Adaptive Gradients AdaGrad Tune initial step size on hold-out set NoPeskyLearning Rates Regularization Adaptive None Fixed.5 K= K=1 K=2 K=3 K=4 K=5
35 Covtype KDB skdb dkdb sdkdb NB RF.3.25 Error K= K=1 K=2 K=3 K=4 Comparative Analysis of the Performance of KDB, skdb, dkdb, sdkdb
36 Story So Far PK=2 (, Y ) = P (1 Y )P (2 Y, 1 )P (3 Y, 1, 2 )P (4 Y, 2, 3 )P (5 Y, 4, 3 ) PK=2 (, Y ) = P (1 )P (2 1 )P (3 2, 1 )P (4 3, 2 )P (5 4, 3 )
37 PK=2 (, Y ) = P (1 )P (2 1 )P (3 2, 1 )P (4 3, 2 )P (5 4, 3 )
38 Higher-order Logistic Regression LRn PLRn (y x) = y PY y y = PLRn (y x) Q x y 2(A n) Q x y 2 ( A n) B y + = B PLRn (y x) = B y log y + 2(A n) y +, log x y ) 2(A n) 2(A n) x y ) log( Y exp(log y + y log( x y log x y ) Y 1 2( A n) exp( y + y log( C log x y ))A. x 2( A n) Y y exp( y log y + y 1 C ))A. 2 ( A n) x y 1 C log x y ))A
39 Higher-order Logistic Regression LRn #Passes through the Data RF LRK LR2 K-DBd 2-DBd K-DB LR1 1-DBd 2-DB 1-DB NB Bias of the learner #Tuning Parameters
40 Covtype KDB dkdb ALR NB RF.3.25 Error K= K=1 K=2 K=3 K=4 Comparative Analysis of the Performance of LRn and dkdb
41 Accelerated Logistic Regression (ALRn) PLRn (y x) = P PAnJE (y x) = log PALRn (y x) = P P exp( c2c exp( + c + exp(log y + c2c exp( c2c y P exp( 2(A n) P 2 ( A n) P 2(A n) exp(log y + y log y + c P log c + y,,x ) P log y,,x ) 2 ( A n) 2(A n) P c,,x ). 2 ( A n) log y,, ) y,,x. log y,,x ) c,,x log c,,x ). 1. Zaidi, N. and Webb, G. and Carman, M. and Petitjean, F. and Cerquides, J. ALRn: Accelerated Higher-Order Logistic Regression, Machine Learning, Volume 14, pp , 216
42 Accelerated Logistic Regression (ALRn) PLRn (y x) = P PAnJE (y x) = log PALRn (y x) = P P exp( c2c y exp( + c exp(log y + c2c + 1 w 2(A n) P exp( y log y + c c,,x ) P. w= P log c + y,,x ) 2 ( A n) exp(log y + exp( c2c P 2(A n) 1 w P log y,,x ) 2 ( A n) 2(A n) P 2 ( A n) log y,, ) y,,x A/n.. log y,,x ) c,,x A n log c,,x ). 1. Zaidi, N. and Webb, G. and Carman, M. and Petitjean, F. and Cerquides, J. ALRn: Accelerated Higher-Order Logistic Regression, Machine Learning, Volume 14, pp , 216
43 .66 AnJE LR n ALRn Prequential Learning of of LRn, ALRn, and AnJE
44 Selective Accelerated-Higher-order Logistic Regression Salient Features: 1. Two Pass Learning 2. Low-Bias 3. Minimal Tuning Parameters How to do selection to reduce the size of the model? Indexing Approximate: Feature Hashing Mutual Information Frequency Automatic selection LOOCV as in SKDB Or validate on sample of data
45 Covtype NB RF.3 ALR ALR (Count) ALR2 (MI) ALR2 (CV) Error Comparative Analysis of the Performance of salr2 (count, MI and CV)
46 Covtype NB RF.3 ALR ALR (MI) ALR3 (CV) Error Comparative Analysis of the Performance of salr3 (MI and CV)
47 Decreasing Bias AnDE skdb KDB NB sdkdb RF dkdb LRn slrn hlrn shlrn FM Minimal Pass, Minimal Tuning Parameters
48 Burning Issues 1. Discretization 1. Discretization leads to better results 2. Multiple Classes 1. Optimizing Softmax leads to better calibrated probabilities 3. SGD 1. AdaGrad 2. Cross-validate Eta 4. Regularization 1. L2 regularisation with lambda equal.1 works well 2. Adaptive Regularization 5. Indexing 1. Hashing 2. Feature transformation 6. Non-stationary Data 1. Low-bias models for fast decay 2. High-bias models for slow decay 7. Adaptive Models 1. Starts with high-bias model and then shift gears with more data 2. Hierarchical Models
49 Aquila Audax Salient Features Implements KDB, ALR, FM Objective Functions MSE, CLL, HL Multiple Classes Optimizes Softmax SGD AdaGrad AdaDelta Regularization Adaptive Regularization Creates Features on the run Feature Selection Counts, MI, LOOCV, Hashing Others
50 Collaborators Offline Discussions Github: nayyarzaidi LinkedIn: nayyar_zaidi URL: Questions?
Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationBoosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi
Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationStochastic gradient descent; Classification
Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationA Magiv CV Theory for Large-Margin Classifiers
A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 9 Learning Classifiers: Bagging & Boosting Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationNumerical Learning Algorithms
Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationCS260: Machine Learning Algorithms
CS260: Machine Learning Algorithms Lecture 4: Stochastic Gradient Descent Cho-Jui Hsieh UCLA Jan 16, 2019 Large-scale Problems Machine learning: usually minimizing the training loss min w { 1 N min w {
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationCS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont)
CS325 Artificial Intelligence Cengiz Spring 2013 Model Complexity in Learning f(x) x Model Complexity in Learning f(x) x Let s start with the linear case... Linear Regression Linear Regression price =
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationCourse in Data Science
Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationGradient Boosting (Continued)
Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More informationCS Machine Learning Qualifying Exam
CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There
More informationNeural Networks: Optimization & Regularization
Neural Networks: Optimization & Regularization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) NN Opt & Reg
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationBe able to define the following terms and answer basic questions about them:
CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationIntroduction to SVM and RVM
Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationFinal Examination CS 540-2: Introduction to Artificial Intelligence
Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationCOMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation
COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationEnsemble Methods for Machine Learning
Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More information9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients
What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationMachine Learning Gaussian Naïve Bayes Big Picture
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationMIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationAdvanced statistical methods for data analysis Lecture 2
Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationOverview of gradient descent optimization algorithms. HYUNG IL KOO Based on
Overview of gradient descent optimization algorithms HYUNG IL KOO Based on http://sebastianruder.com/optimizing-gradient-descent/ Problem Statement Machine Learning Optimization Problem Training samples:
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationCS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine
CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationLinear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt)
Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt) Nathan Schneider (some slides borrowed from Chris Dyer) ENLP 12 February 2018 23 Outline Words, probabilities Features,
More information