Fast and Effective Limited Pass Learning for Large Data Quantities. Presented by: Nayyar Zaidi

Size: px
Start display at page:

Download "Fast and Effective Limited Pass Learning for Large Data Quantities. Presented by: Nayyar Zaidi"

Transcription

1 Fast and Effective Limited Pass Learning for Large Data Quantities Presented by: Nayyar Zaidi

2 RMSE , 2, 3, 4, 5, 6, Training set size 7, 8, 9, 1,,

3 RMSE , 2, 3, 4, 5, 6, Training set size 7, 8, 9, 1,,

4 Machine Learning Computational Models of Neural Networks Machine Learning Artificial Neural Networks < SVM (Linear) Support Vector Machines Deep Learning Random Boosting Forest Kernel Perceptron Relational Databases Data Warehousing Small Machine Learning WWW iphone Kaggle Data Scale Big Machine Learning

5 Good Old-Fashioned Machine Learning 1) Regularization 2) Non-parameteric Methods Nearest Neighbour Tree-based Methods 3) Power of Ensemble Random Forest Boosting 4) Kernel Theory 5) Batch Optimization Methods 6) Bayesian vs. Frequentist 7) Feature Selection Large Scale Machine Learning 1) 2) 3) 4) 5) 6) Future Machine Learning Feature Engineering Deep Learning SGD Minimal Pass Learning Automatic Regularization Minimal Tuning 1) Single Pass Learning 2) Automatic Feature Engineering 3) No Tuning Parameter

6 Good Old-Fashioned Machine Learning 1) Regularization 2) Non-parameteric Methods Nearest Neighbour Tree-based Methods 3) Power of Ensemble Random Forest Boosting 4) Kernel Theory 5) Batch Optimization Methods 6) Bayesian vs. Frequentist 7) Feature Selection Large Scale Machine Learning (LSML) Future Machine Learning (FML) 1) 2) 3) 4) 5) 6) 1) Single Pass Learning 2) Automatic Feature Engineering 3) No Tuning Parameter Feature Engineering Deep Learning SGD Minimal Pass Learning Automatic Regularization Minimal Tuning

7 GOFML vs. LSML RMSE , 2, 3, 4, 5, 6, Training set size 7, 8, 9, 1,,

8 Objectives of the Talk Summarize the properties of Large-scale Machine Learning (LSML) algorithms Propose two Fast and Effective limited pass learning algorithms Outline of the Talk Introduction Background (NB, RF and Bayesian Networks) Algorithm I: FewPla Algorithm II: Selective ALR Discussion Target Audience Research Scientists Machine Learning Practitioners Ph.D Students Final year under-graduate students

9 Three Properties of LSML Minimal Pass Learning Minimal Tuning Parameters Low-Bias Learning

10 Two Extremes NB High-bias, low-variance Extremely easy to train Single Pass Minimal Tuning Parameters #Passes through the Data Random Forest RF Low-bias, high-variance Multiple Pass Some Tuning Parameters Naive Bayes Bias of the learner #Tuning Parameters

11 Random Forest Naive Bayes 2 Y 1 2 n Bayesian Network Factorizes joint distribution, P(,Y) Maximum Likelihood Estimate of Log-Likelihood Good for small data Non-parameteric Method Bagged data + Bagged variables Trees are grown to full Many variants such as Gradient boosting gives state of the art results Good for large datasets

12 Variance RF NB RF Loss.6.15 NB -1 Loss RF Comparison b/w NB and RF RF Bias NB NB Training Time 25 NB RF1 Classification Time 4 NB RF Semi-naive Bayes Methods AnDE, TAN, KDB, etc. 5 All Big All Big

13 Bayesian Network Classifiers Y 3 PB (y, x) = y A BN is characterised by two set of parameters: B = (G, ) Learning a BN: Structure Learning + Parameters Learning Graph has nice properties Structure Learning: K2 and many variants Parameter Learning: Accumulating Counts Maximise the log-likelihood n Y i=1 xi y, i (x). Qn y i=1 xi y, i (x) PB (y, x) Qn PB (y x) = =P. PB (x) y 2Y y i =1 xi y, i (x) LL(B) = N j=1 = N j=1 log PB (y (j), x(j) ), n log y(j) + i=1! log x(j) i (x(j) ). i

14 Bias.6 k-dependence Bayesian Estimator.5 KDB1.4 Y NB n Bias KDB2 Known as KDB Calculate Mutual information MI(i,Y) for all attributes Calculate MI(i,j Y) Sort all attributes based on MI(i,Y) For every i'th attribute, Make class Y its parent Set K = min(i-1, k) Choose K attributes from to i-1 attributes based on their MI(i,j Y) score KDB1 Bias KDB KDB2

15 Covtype KDB NB RF.3.25 Error K= K=1 K=2 K=3 K=4 Comparative Analysis of the Performance of KDB with NB and RF

16 Covtype KDB NB RF.3.25 Error Comparative Analysis of the Performance of KDB with NB and RF

17 KDB - Model Structure P (, Y ) = P (Y )P (1 Y )P (2 Y, 1)P (3 Y, 1, 2) Y 3 Y 2 Y 1 Y 1=a 2 1 1=a 1=c 1=b 1=c 1=b 3 2=a 2=b An Example of Parameter Structure for KDB (Tries)

18 Selective KDB Adds an additional third pass to KDB Selects the best k and the best number of attributes Good for reducing size of KDB A maximum kmax has to be specified SKDB explores the symmetry of KDB attributes and k s by using the leaveone-out-cross-validation PK= (, Y ) = P (1 Y )P (2 Y )P (3 Y )P (4 Y )P (5 Y ) PK=1 (, Y ) = P (1 Y )P (2 Y, 1 )P (3 Y, 1 )P (4 Y, 2 )P (5 Y, 4 ) PK=2 (, Y ) = P (1 Y )P (2 Y, 1 )P (3 Y, 1, 2 )P (4 Y, 2, 3 )P (5 Y, 4, 3 ) PK=3 (, Y ) = P (1 Y )P (2 Y, 1 )P (3 Y, 1, 2 )P (4 Y, 2, 3, 1 )P (5 Y, 4, 3, 2 ) PK=4 (, Y ) = P (1 Y )P (2 Y, 1 )P (3 Y, 1, 2 )P (4 Y, 2, 3, 1 )P (5 Y, 4, 3, 2, 1 )

19 P (4 Y, 3 ) K=1 4 a b 3

20 P (4 Y, 3, 2 ) K=2 4 a b 3 a a b b 2

21 P (4 Y, 3, 2, 1 ) K=3 4 a b 3 a a b b 2 a b a b a b a b 1

22 Selective KDB Attributes First Pass Second Pass (Third Pass Begins) For each data point x: Subtract it from the counts table For each k ( to K) For each Attribute i (in ordered set) LF[k][i] += LossFunction(P[y x],yx) i= i=1 i=2. k= k=1 k=2. k Select k and i from the table with best LF Trim data structure Loss Function Table 1. Martinez, A. and Webb, G. and Li, S. and Zaidi, N. Scalable Learning of Bayesian Network Classifiers, JMLR, pp 1-35, 216

23 Covtype KDB skdb NB RF.3.25 Error K= K=1 K=2 K=3 K=4 Comparative Analysis of the Performance of KDB and skdb

24 1 Covtype 8 KDB skdb No. of Parameters K= K=1 K=2 K=3 K=4 Comparative Analysis of the No. of Parameters of KDB and skdb

25 Discriminative Semi-naive Bayes Classifiers PB (y, x) = y LL(B) = N j=1 = log y(j) j=1 CLL(B) = j=1 = N j=1 = N j=1 i=1 xi y, i (x). 1) 2) 3) 4) 5) log PB (y (j), x(j) ), N N n Y NB -> NBd TAN -> TANd KDB -> KDBd AnDE -> AnDEd BN -> BNd! n + log x(j) i (x(j) ). i i=1 log PB (y (j) x(j) PB (y (j), x(j) ) log y(j) + n i=1 log Y y PB (y, x(j) )A, log x(j) i (x(j) ) i 1! 1 Y n log y xi y, i (x(j) ) A. y i =1 1. Zaidi, N. and Webb G. and Carman, M. and Petitjean, F. and Buntine, W. and Hynes, M. and De Sterck H. Efficient Parameter Learning of Bayesian Network Classifiers, Machine Learning, Volume 16, pp 1-44, 216

26 P (, Y ) = P (Y )P (1 Y )P (2 Y, 1)P (3 Y, 1, 2) 3 Y Counts Parameters Gradients 3 1=a 1=c 1=b 2=a 2=b An Example of Parameter Structure for dkdb (Tries)

27 Covtype KDB dkdb NB RF.3.25 Error K= K=1 K=2 K=3 K=4 Comparative Analysis of the Performance of KDB and dkdb

28 Covtype KDB skdb dkdb NB RF.3.25 Error K= K=1 K=2 K=3 K=4 Comparative Analysis of the Performance of KDB, skdb and dkdb

29 Covtype KDB skdb dkdb sdkdb NB RF.3.25 Error K= K=1 K=2 K=3 K=4 Comparative Analysis of the Performance of KDB, skdb, dkdb, sdkdb

30 Discriminative Semi-naive Bayes Classifiers #Passes through the Data RF K-DBd K-DB 2-DBd 1-DBd 2-DB 1-DB NB Bias of the learner #Tuning Parameters 1. Zaidi, N. and Webb G. Fast and Efficient Single Pass Bayesian Learning, Advances in Knowledge Discovery and Data Mining, pp , Martinez, A. and Webb, G. and Li, S. and Zaidi, N. Scalable Learning of Bayesian Network Classifiers, JMLR, pp 1-35, 216

31 The Equivalence y PNB (y x) = PC c=1 exp( PLR (y x) = PC c=1 Q exp(log y + i log y,i,xi ) PNB (y x) = PC. P c=1 exp(log c + j log c,j,xj ) i y,i,xi c y Q + exp( P c j P + c,j,xj log y,i,xi! log c,j,xj! c,j,xj ) exp(. PNB (y x) = PC c=1 exp( PWC (y x) = PC c=1 y log y + exp( c P log c + i y,i,xi P j y + exp( c log y,i,xi ) c,j,xj log c,j,xj ) P + i y y,i,xi log c!. i y,i,xi ) P j log y! c c,j,xj y,i,xi ) P j c,j,xj ).. 1. Zaidi, N. and Carman, M. and Cerquides, J. and Webb G. Naive-Bayes Inspired Effective Pre-Conditioners for Speeding-up Logistic Regression, ICDM, pp , Zaidi, N. and Webb, G. Preconditioning an Artificial Neural Network Using Naive Bayes, Advances in Knowledge Discovery and Data Mining, pp , 216

32 Discriminative Semi-naive Bayes Methods CLLd (B) = N y (j) + j=1 CLLe (B) = N i=1 log y(j) + j=1 CLLw (B) n n i=1 = Q (j) xi y (j), i (x(j) ) log x(j) y(j),q N i y log y + j=1 Y y i (x! Y 1 Y n Y log y (j) log x(j) y(j),q (x(j) ) A (j) ) n i=1 y log y y y xi y, n Y i =1 Q i (x) xi y, y (j) n Y i =1 i =1 log xi y,qi (x) Q i (x) xi (j) y (j),! i Q i (x (j) ) 1 A i 1 log xi y,qi (x) A

33 15 Covtype dkdb () wkdb () Negative Log-Likelihood Covtype dkdb (3) wkdb (3) No. of Iterations Negative Log-Likelihood Negative Log-Likelihood 11 No. of Iterations Covtype Covtype -1 dkdb (2) wkdb (2) No. of Iterations 1 dkdb (1) wkdb (1) Negative Log-Likelihood -.5 Negative Log-Likelihood No. of Iterations 15 Covtype dkdb (4) wkdb (4) No. of Iterations

34 FewPLA Discriminative Selective K-DB Salient Features: 1. Nine Pass Learning 2. Low-Bias 3. Minimal Tuning Parameters MNIST.25 SKDB FewPLa NB RF 2-Pass KDB 1-Pass SKDB 5-Pass for SGD optimising Error Step-size tuning Adaptive Gradients AdaGrad Tune initial step size on hold-out set NoPeskyLearning Rates Regularization Adaptive None Fixed.5 K= K=1 K=2 K=3 K=4 K=5

35 Covtype KDB skdb dkdb sdkdb NB RF.3.25 Error K= K=1 K=2 K=3 K=4 Comparative Analysis of the Performance of KDB, skdb, dkdb, sdkdb

36 Story So Far PK=2 (, Y ) = P (1 Y )P (2 Y, 1 )P (3 Y, 1, 2 )P (4 Y, 2, 3 )P (5 Y, 4, 3 ) PK=2 (, Y ) = P (1 )P (2 1 )P (3 2, 1 )P (4 3, 2 )P (5 4, 3 )

37 PK=2 (, Y ) = P (1 )P (2 1 )P (3 2, 1 )P (4 3, 2 )P (5 4, 3 )

38 Higher-order Logistic Regression LRn PLRn (y x) = y PY y y = PLRn (y x) Q x y 2(A n) Q x y 2 ( A n) B y + = B PLRn (y x) = B y log y + 2(A n) y +, log x y ) 2(A n) 2(A n) x y ) log( Y exp(log y + y log( x y log x y ) Y 1 2( A n) exp( y + y log( C log x y ))A. x 2( A n) Y y exp( y log y + y 1 C ))A. 2 ( A n) x y 1 C log x y ))A

39 Higher-order Logistic Regression LRn #Passes through the Data RF LRK LR2 K-DBd 2-DBd K-DB LR1 1-DBd 2-DB 1-DB NB Bias of the learner #Tuning Parameters

40 Covtype KDB dkdb ALR NB RF.3.25 Error K= K=1 K=2 K=3 K=4 Comparative Analysis of the Performance of LRn and dkdb

41 Accelerated Logistic Regression (ALRn) PLRn (y x) = P PAnJE (y x) = log PALRn (y x) = P P exp( c2c exp( + c + exp(log y + c2c exp( c2c y P exp( 2(A n) P 2 ( A n) P 2(A n) exp(log y + y log y + c P log c + y,,x ) P log y,,x ) 2 ( A n) 2(A n) P c,,x ). 2 ( A n) log y,, ) y,,x. log y,,x ) c,,x log c,,x ). 1. Zaidi, N. and Webb, G. and Carman, M. and Petitjean, F. and Cerquides, J. ALRn: Accelerated Higher-Order Logistic Regression, Machine Learning, Volume 14, pp , 216

42 Accelerated Logistic Regression (ALRn) PLRn (y x) = P PAnJE (y x) = log PALRn (y x) = P P exp( c2c y exp( + c exp(log y + c2c + 1 w 2(A n) P exp( y log y + c c,,x ) P. w= P log c + y,,x ) 2 ( A n) exp(log y + exp( c2c P 2(A n) 1 w P log y,,x ) 2 ( A n) 2(A n) P 2 ( A n) log y,, ) y,,x A/n.. log y,,x ) c,,x A n log c,,x ). 1. Zaidi, N. and Webb, G. and Carman, M. and Petitjean, F. and Cerquides, J. ALRn: Accelerated Higher-Order Logistic Regression, Machine Learning, Volume 14, pp , 216

43 .66 AnJE LR n ALRn Prequential Learning of of LRn, ALRn, and AnJE

44 Selective Accelerated-Higher-order Logistic Regression Salient Features: 1. Two Pass Learning 2. Low-Bias 3. Minimal Tuning Parameters How to do selection to reduce the size of the model? Indexing Approximate: Feature Hashing Mutual Information Frequency Automatic selection LOOCV as in SKDB Or validate on sample of data

45 Covtype NB RF.3 ALR ALR (Count) ALR2 (MI) ALR2 (CV) Error Comparative Analysis of the Performance of salr2 (count, MI and CV)

46 Covtype NB RF.3 ALR ALR (MI) ALR3 (CV) Error Comparative Analysis of the Performance of salr3 (MI and CV)

47 Decreasing Bias AnDE skdb KDB NB sdkdb RF dkdb LRn slrn hlrn shlrn FM Minimal Pass, Minimal Tuning Parameters

48 Burning Issues 1. Discretization 1. Discretization leads to better results 2. Multiple Classes 1. Optimizing Softmax leads to better calibrated probabilities 3. SGD 1. AdaGrad 2. Cross-validate Eta 4. Regularization 1. L2 regularisation with lambda equal.1 works well 2. Adaptive Regularization 5. Indexing 1. Hashing 2. Feature transformation 6. Non-stationary Data 1. Low-bias models for fast decay 2. High-bias models for slow decay 7. Adaptive Models 1. Starts with high-bias model and then shift gears with more data 2. Hierarchical Models

49 Aquila Audax Salient Features Implements KDB, ALR, FM Objective Functions MSE, CLL, HL Multiple Classes Optimizes Softmax SGD AdaGrad AdaDelta Regularization Adaptive Regularization Creates Features on the run Feature Selection Counts, MI, LOOCV, Hashing Others

50 Collaborators Offline Discussions Github: nayyarzaidi LinkedIn: nayyar_zaidi URL: Questions?

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

More information

Stochastic gradient descent; Classification

Stochastic gradient descent; Classification Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 9 Learning Classifiers: Bagging & Boosting Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Logistic Regression. William Cohen

Logistic Regression. William Cohen Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Numerical Learning Algorithms

Numerical Learning Algorithms Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................

More information

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35 Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

VBM683 Machine Learning

VBM683 Machine Learning VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

CS260: Machine Learning Algorithms

CS260: Machine Learning Algorithms CS260: Machine Learning Algorithms Lecture 4: Stochastic Gradient Descent Cho-Jui Hsieh UCLA Jan 16, 2019 Large-scale Problems Machine learning: usually minimizing the training loss min w { 1 N min w {

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

CS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont)

CS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont) CS325 Artificial Intelligence Cengiz Spring 2013 Model Complexity in Learning f(x) x Model Complexity in Learning f(x) x Let s start with the linear case... Linear Regression Linear Regression price =

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For

More information

Neural networks and support vector machines

Neural networks and support vector machines Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Course in Data Science

Course in Data Science Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an

More information

CS7267 MACHINE LEARNING

CS7267 MACHINE LEARNING CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning

More information

Gradient Boosting (Continued)

Gradient Boosting (Continued) Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve

More information

CS Machine Learning Qualifying Exam

CS Machine Learning Qualifying Exam CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There

More information

Neural Networks: Optimization & Regularization

Neural Networks: Optimization & Regularization Neural Networks: Optimization & Regularization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) NN Opt & Reg

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Intelligent Systems Discriminative Learning, Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm

More information

Be able to define the following terms and answer basic questions about them:

Be able to define the following terms and answer basic questions about them: CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Introduction to SVM and RVM

Introduction to SVM and RVM Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance

More information

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning, Midterm Exam: Spring 2009 SOLUTION 10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of

More information

Machine Learning (CSE 446): Neural Networks

Machine Learning (CSE 446): Neural Networks Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Final Examination CS 540-2: Introduction to Artificial Intelligence

Final Examination CS 540-2: Introduction to Artificial Intelligence Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Ensemble Methods for Machine Learning

Ensemble Methods for Machine Learning Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Machine Learning Gaussian Naïve Bayes Big Picture

Machine Learning Gaussian Naïve Bayes Big Picture Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Overview of gradient descent optimization algorithms. HYUNG IL KOO Based on

Overview of gradient descent optimization algorithms. HYUNG IL KOO Based on Overview of gradient descent optimization algorithms HYUNG IL KOO Based on http://sebastianruder.com/optimizing-gradient-descent/ Problem Statement Machine Learning Optimization Problem Training samples:

More information

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012 Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

More information

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt)

Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt) Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt) Nathan Schneider (some slides borrowed from Chris Dyer) ENLP 12 February 2018 23 Outline Words, probabilities Features,

More information