1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure

Size: px
Start display at page:

Download "1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure"

Transcription

1 Overview Breiman L (2001). Statistical modeling: The two cultures Statistical learning reading group Aleander L 1 Histor of statistical/machine learning 2 Supervised learning 3 Two approaches to supervised learning 4 The general learning procedure Pschological Methods Universit of Amsterdam Amsterdam 27 October Model compleit 6 Conclusion Definition of machine learning Histor Arthur Samuel (1959) Field of stud that gives computers the abilit to learn without being eplicitl programmed Tom M. Mitchell (1997) A computer program is said to learn from eperience E with respect to some class of tasks T and performance measure P if its performance at tasks in T as measured b P improves with eperience E. Evolved from Artificial intelligence Pattern recognition Success stories (from 1990s onwards): Spam filters Optical character recognition Natural language processing (Search engines) Recommender sstems Netfli challenge (2006) won b AT&T labs research

2 Histor Success stories (from 1990s onwards): Spam filters Optical character recognition Natural language processing (Search engines) Recommender sstems Netfli challenge (2006) won b AT&T labs research Summar: Success correlated with the rise of the internet and reinvented statistics. Machine learning terms for statistical concepts Ignored b most statisticians ecept for the Breiman and Tibshirani Hastie Friedman (Efron Stanford school) Histor Summar: Success correlated with the rise of the internet and reinvented statistics. Machine learning terms for statistical concepts Ignored b most statisticians ecept for the Breiman and Tibshirani Hastie Friedman (Efron Stanford school) Statistics Estimation Data point Regression Classification Covariate Response Machine learning Learning Eample/Instance Supervised learning Supervised learning Feature Label Breiman Career: 1954 PhD in probabilit 1960s Consulting 1980s onwards professor at UC Berkele Breiman s consulting eperience Prediction problems Live with the data before modelling Solution should be either an algorithmic or a data model. Predictive accurac is the criterion for the qualit of the model Computers necessar in practice Problem statement: Supervised learning Problem setting Based on n pairs of data n where i are features and i are correct labels predict future labels new given new. Eample (classification): Features: 1 =(nose mouth). Label: 1 = es face Features: 2 =(doorbell hinge). Label: 2 = no face n Eamples of algorithmic techniques/models: Classification and regression trees bagging random forest. In the paper Breiman focusses on supervised learning

3 Problem statement: Supervised learning Problem setting Based on n pairs of data n where i are features and i are correct labels predict future labels new given new. n Problem statement: Supervised learning Problem setting Based on n pairs of data n where i are features and i are correct labels predict future labels new given new. n unknown f = f ()+ For prediction: discover the unknown f that relates features (covariates) to labels (dependent variables). input output Goal: Machine learning Predict future instances based on alread observed data Prediction based on past data. Eample: "Based on previous data I predict that it will rain tomorrow" Give a prediction accurac. Eample: "Based on previous data I m 67% sure that it will rain tomorrow" Use data models or even algorithmic models to discover the relationship f provided b nature Machine learning vs "standard" statistical inference culture Machine learning Goal: Prediction of new data (learn f from the data) Approach: Data are alwas right there are no models onl algorithms Passive: Data are alread collected "Big" data "Standard" approach in pscholog Goal: evaluation of theor (evaluate a known f ) Approach: The model is right the data could be wrong. Evaluate theor b comparing two models Active: Confirmator analsis based on the model: Design of eperiments power analsis etc etc "Small" data

4 "Standard" approach in pscholog Goal: evaluation of theor (evaluate a known f ) X Man time no theor most research has an eplorator flavour (80% according to Lakens) Approach: The model is right the data could be wrong. Evaluate theor b comparing two models X What if the model is wrong? Active: Confirmator analsis based on the model: Design of eperiments power analsis etc etc X Design of eperiments power analsis etc etc wrong if the model assumption is wrong "Small" data X Mechanical turk fmri data genetics cito OSF international collaboration man labs etc. Breiman s: Critique on the data modelling approach Wrong presumption Let data be generated according to the data model f where f is linear/logistic regression/... Eample: Uncritical use of linear regression to for instance bimodal data. Breiman s: Critique on the data modelling approach Wrong presumption Let data be generated according to the data model f where f is linear/logistic regression/... Breiman s: Critique on the data modelling approach Wrong presumption Let data be generated according to the data model f where f is linear/logistic regression/... f is linear f is logistic Focus on modelling the sampling distribution of the error not the f : {z} obs known z} { f ( {z} )= N(0 obs 2 ) f is a "network" Eample: in ANOVA sums of squared error R 2 etc. If the errors are big it is implied that the theor is bad. To quantif "big" require sampling distribution.

5 Breiman s: Critique on the data modelling approach Breiman s eperience from consulting Wrong presumption Let data be generated according to the data model f where f is linear/logistic regression/... To calculate sampling distributions stuck with simple models (Linear) Conclusion are about the model mechanism not about nature s mechanism If model is a poor emulation of nature conclusions will be wrong. The classical (linear) models are tractable but tpicall ield bad predictions Live with the data before modelling Algorithmic models are also good Predictive accurac on test set is the criterion for how good the model is Computers are important Recall setting Recall setting Problem: Problem: With which f does nature generate data = f ()+ Goal: Prediction based on past data. Learn (estimate) unknown f unknown f Give a prediction accurac. "Live with the data before modelling". Data split in three parts: Training set to learn f from the data input output

6 Recall setting Recall setting Problem: With which f does nature generate data = f ()+ Goal: Prediction based on past data. Learn (estimate) unknown f Give a prediction accurac. "Live with the data before modelling". Data split in three parts: Training set to learn f from the data Validation set to do model selection Problem: With which f does nature generate data = f ()+ Goal: Prediction based on past data. Learn (estimate) unknown f Give a prediction accurac. "Live with the data before modelling". Data split in three parts: Training set to learn f from the data Validation set To do model selection Test set to estimate the prediction accurac Recall data set consists of n pairs sa Randoml select training samples sa Training set is used to learn f from the data Corresponding formula = f ()+ Fill in the true traini traini traini = f ( traini )+ find that f for which the loss between 1 n train n train X i=1 Loss traini f ( traini ) is smallest. Call the minimiser f trained

7 Use the other samples as test samples sa Training set is used to learn f from the data. 8 Test set is used to derive the prediction accurac Fill in f trained and appl it to to ield implied = f trained ( testi ) and compare the estimate the error between implied with true testi estim = 1 Xn test Loss testi f trained ( testi ) n test i=1 10 Generalise to a new instance new Training set is used to learn f from the data. Test set is used to derive the prediction accurac. Final answer for et unseen features new use to generate new new =... implied z } { f trained ( new ) ± estim {z } Generalisation error The error estim so estimated serves as the prediction accurac. Remarks Loss functions Onl assumption is that the data i i are iid that is the are generated with the same (true fied but unknown) f. The data are assumed to be true No specification of the Loss function. The loss function replaces the assumptions on the error (tpicall Gaussian error in "standard" statistics) No specification of the collection F of functions f that we believe to be viable For categorical data tpicall zero-one (all or nothing) loss: ( 0 if i = f ( i ) Loss i f ( i ) = 1 if i 6= f ( i ) Note the hard rule here observation i "supervise" the learning.

8 Loss functions Loss functions For categorical data tpicall zero-one (all or nothing) loss: ( 0 if i = f ( i ) Loss i f ( i ) = 1 if i 6= f ( i ) Note the hard rule here observation i "supervise" the learning. For continuous data tpicall mean-squared error loss: h Loss Y f () = E Y i 2 f () Here E is the epectation (average) with respect to the true relationship f between X and Y. For categorical data tpicall zero-one (all or nothing) loss: ( 0 if i = f ( i ) Loss i f ( i ) = 1 if i 6= f ( i ) Note the hard rule here observation i "supervise" the learning. For continuous data tpicall mean-squared error loss: Loss Y f () = 1 n nx i=1 h i 2 i f ( i ) Here the epectation E is replaced with the empirical average with respect to the data being true. Bias-Variance trade-off and overfitting Bias-Variance trade-off and overfitting For continuous data tpicall mean-squared error loss: h Loss Y f () = E Y i 2 f () Here E is the epectation (average) with respect to the true f. For continuous data tpicall mean-squared error loss: h Loss Y f () =E Y =Var(f ()) + i 2 f () h 2 Bias(f ())i + Var( ) Both Var and E thus Bias are with respect to the true f.

9 Bias-Variance trade-off and overfitting Overfitting For continuous data tpicall mean-squared error loss: h Var(f ()) + i 2 Bias(f ()) {z } Structural error + Var( ) {z } Unavoidable error Recall that f trained is from minimising this loss. The more complicated a candidate f the smaller the unavoidable error as everthing is seen as structural. Problem: overfitting. For continuous data tpicall mean-squared error loss: h Var(f ()) + i 2 Bias(f ()) {z } Structural error + Var( ) {z } Unavoidable error Overfitting occurs if the f s under consideration are too complicated. An over complicated f generalises badl and is recognised b low bias high variance. Overfitting: Over complicated f : low bias high variance. True data generating f () = where is a Laplace distribution (thicker tails than normal). Raw data: Overfitting: Over complicated f : low bias high variance. Raw data and true function f :

10 Overfitting: Over complicated f : low bias high variance. Fitted with a polnomial of order nine. Overfitting: Over complicated f : low bias high variance. Observe new test sample/instance test = Overfitting: Over complicated f : low bias high variance. Observe new test sample/instance test = 6.5 and test = 66 Overfitting: Over complicated f : low bias high variance. Large loss between test and implied :

11 Overfitting: Over complicated f : low bias high variance. Model compleit True data generation was done with a polnomial of degree 2 polnomial of degree 9 is too comple. Hence the collection of candidate f s should be restricted to those f s with ma degree 2. In realit don t know the "compleit" of the true. How to choose the collection of candidates F? Eample conclusion: our epected survival ears are 12 ears ± 50 ears. Meaningless prediction. Two tpes of model compleit Two tpes of model compleit The structure of f. Linear polnomial (but still a summation of terms thus linear) <- nothing new same as in statistics The structure of f. Linear polnomial (but still a summation of terms thus linear). Neural networks (non-linear) support vector machines generalised additive models kernel smoothing splines reproducing kernel Hilbert space regression trees. Number of features n < p.

12 "Standard" setting Regularisation The functions f s in F are linear More data than features p < n. The functions f s in F are linear More features than data n < p. Hazard of overfitting Control the compleit with additional tuning parameter (for instance lasso) Hence F = F Eample: f () = p p + {z } {z} (1) Over complicated structure tuning parameter Here acts as a penalt for compleit. How to tune? Regularisation The functions f s in F are linear More features than data n < p. Hazard of overfitting Control the compleit with additional tuning parameter (for instance lasso) Hence F = F Eample: f () = p p + {z } {z} (1) Over complicated structure tuning parameter Here acts as a penalt for compleit. Tune with validation set aka repeat the general procedure man times with different (fied). Split data into three sets. Training set is used to learn f from the data. Validation set to tune (model selection). Test set is used to derive the prediction accurac. Partition. For the lasso = 0 no regularisation for = 1 onl the constant function is viable. Sa = For each fied follow the general procedure f.

13 Training set is used to learn f from the data. Validation set to tune (model selection). Test set is used to derive the prediction accurac. Sa = We get F = f 0trained f 2trained f 2 2 trained...f 2 10 trained (2) Training set is used to learn f from the data. Validation set to tune (model selection). Test set is used to derive the prediction accurac. Sa = Pick f trained with lowest average loss on the validation set. f val = arg min 1 n val n val X i=1 loss f trained ( ival ) ival (2) Training set is used to learn f from the data. Validation set to tune (model selection). Test set is used to derive the prediction accurac. Sa = Estimate prediction accurac b averaging the average loss on the test set est = 1 Xn test loss f val ( itest ) itest n test i=1 (2) Gave general idea of supervised learning. Role of training validation and test set Which specific classes of F to take. This seminar discusses a couple of them: Neural networks (non-linear) support vector machines generalised additive models kernel smoothing splines reproducing kernel Hilbert space regression trees. How to minimise? Technicallities Selection method based on select the best. Other methods bagging and boosting

14 Further organisation Changed name from machine learning reading group to statistical learning seminar Reading groups die Seminar does not require everone to read everthing Requires a small peak in preparation of a talk Not necessar to understand everthing. Can be practical and theoretical Still good to read things in advanced. Also website with outube clips are available.

Learning From Data Lecture 7 Approximation Versus Generalization

Learning From Data Lecture 7 Approximation Versus Generalization Learning From Data Lecture 7 Approimation Versus Generalization The VC Dimension Approimation Versus Generalization Bias and Variance The Learning Curve M. Magdon-Ismail CSCI 4100/6100 recap: The Vapnik-Chervonenkis

More information

CSE 546 Midterm Exam, Fall 2014

CSE 546 Midterm Exam, Fall 2014 CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,

More information

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging

More information

INF Introduction to classifiction Anne Solberg

INF Introduction to classifiction Anne Solberg INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300

More information

Machine Learning Foundations

Machine Learning Foundations Machine Learning Foundations ( 機器學習基石 ) Lecture 13: Hazard of Overfitting Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan Universit

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

Research Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D.

Research Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D. Research Design - - Topic 15a Introduction to Multivariate Analses 009 R.C. Gardner, Ph.D. Major Characteristics of Multivariate Procedures Overview of Multivariate Techniques Bivariate Regression and

More information

Overview. IAML: Linear Regression. Examples of regression problems. The Regression Problem

Overview. IAML: Linear Regression. Examples of regression problems. The Regression Problem 3 / 38 Overview 4 / 38 IAML: Linear Regression Nigel Goddard School of Informatics Semester The linear model Fitting the linear model to data Probabilistic interpretation of the error function Eamples

More information

Feedforward Neural Networks

Feedforward Neural Networks Chapter 4 Feedforward Neural Networks 4. Motivation Let s start with our logistic regression model from before: P(k d) = softma k =k ( λ(k ) + w d λ(k, w) ). (4.) Recall that this model gives us a lot

More information

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless

More information

CS 188: Artificial Intelligence Spring Today

CS 188: Artificial Intelligence Spring Today CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Ridge Regression: Regulating overfitting when using many features

Ridge Regression: Regulating overfitting when using many features Ridge Regression: Regulating overfitting when using man features Emil Fox Universit of Washington April 10, 2018 Training, true, & test error vs. model complexit Overfitting if: Error Model complexit 2

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

More information

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12 Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose

More information

INF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.

INF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image. INF 4300 700 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter -6 6inDuda and Hart: attern Classification 303 INF 4300 Introduction to classification One of the most challenging

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Importance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University

Importance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University Importance Sampling: An Alternative View of Ensemble Learning Jerome H. Friedman Bogdan Popescu Stanford University 1 PREDICTIVE LEARNING Given data: {z i } N 1 = {y i, x i } N 1 q(z) y = output or response

More information

Supervised Learning. George Konidaris

Supervised Learning. George Konidaris Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

Decision trees COMS 4771

Decision trees COMS 4771 Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Day 3: Classification, logistic regression

Day 3: Classification, logistic regression Day 3: Classification, logistic regression Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 20 June 2018 Topics so far Supervised

More information

Linear regression Class 25, Jeremy Orloff and Jonathan Bloom

Linear regression Class 25, Jeremy Orloff and Jonathan Bloom 1 Learning Goals Linear regression Class 25, 18.05 Jerem Orloff and Jonathan Bloom 1. Be able to use the method of least squares to fit a line to bivariate data. 2. Be able to give a formula for the total

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis

Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analsis Matthew B. Blaschko, Christoph H. Lampert, & Arthur Gretton Ma Planck Institute for Biological Cbernetics Tübingen, German

More information

Multilayer Neural Networks

Multilayer Neural Networks Pattern Recognition Multilaer Neural Networs Lecture 4 Prof. Daniel Yeung School of Computer Science and Engineering South China Universit of Technolog Outline Introduction (6.) Artificial Neural Networ

More information

Regular Physics - Notes Ch. 1

Regular Physics - Notes Ch. 1 Regular Phsics - Notes Ch. 1 What is Phsics? the stud of matter and energ and their relationships; the stud of the basic phsical laws of nature which are often stated in simple mathematical equations.

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics) COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24 Administrivia This course Topics http://www.satyenkale.com/coms4771/ 1. Supervised learning Core

More information

Classification using stochastic ensembles

Classification using stochastic ensembles July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics

More information

Correlation. Bret Hanlon and Bret Larget. Department of Statistics University of Wisconsin Madison. December 6, Correlation 1 / 25

Correlation. Bret Hanlon and Bret Larget. Department of Statistics University of Wisconsin Madison. December 6, Correlation 1 / 25 Correlation Bret Hanlon and Bret Larget Department of Statistics Universit of Wisconsin Madison December 6, 2 Correlation / 25 The Big Picture We have just completed a length series of lectures on ANOVA

More information

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

RELATIONS AND FUNCTIONS through

RELATIONS AND FUNCTIONS through RELATIONS AND FUNCTIONS 11.1.2 through 11.1. Relations and Functions establish a correspondence between the input values (usuall ) and the output values (usuall ) according to the particular relation or

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

More information

Introduction to Machine Learning and Cross-Validation

Introduction to Machine Learning and Cross-Validation Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, 2019 1 / 29 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

0.24 adults 2. (c) Prove that, regardless of the possible values of and, the covariance between X and Y is equal to zero. Show all work.

0.24 adults 2. (c) Prove that, regardless of the possible values of and, the covariance between X and Y is equal to zero. Show all work. 1 A socioeconomic stud analzes two discrete random variables in a certain population of households = number of adult residents and = number of child residents It is found that their joint probabilit mass

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

A Statistical Framework for Analysing Big Data Global Conference on Big Data for Official Statistics October, 2015 by S Tam, Chief

A Statistical Framework for Analysing Big Data Global Conference on Big Data for Official Statistics October, 2015 by S Tam, Chief A Statistical Framework for Analysing Big Data Global Conference on Big Data for Official Statistics 20-22 October, 2015 by S Tam, Chief Methodologist Australian Bureau of Statistics 1 Big Data (BD) Issues

More information

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

VBM683 Machine Learning

VBM683 Machine Learning VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

More information

Least-Squares Independence Test

Least-Squares Independence Test IEICE Transactions on Information and Sstems, vol.e94-d, no.6, pp.333-336,. Least-Squares Independence Test Masashi Sugiama Toko Institute of Technolog, Japan PRESTO, Japan Science and Technolog Agenc,

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

Is the test error unbiased for these programs? 2017 Kevin Jamieson

Is the test error unbiased for these programs? 2017 Kevin Jamieson Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io

Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io 1 Examples https://www.youtube.com/watch?v=bmka1zsg2 P4 http://www.r2d3.us/visual-intro-to-machinelearning-part-1/

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of

More information

ECE-271B. Nuno Vasconcelos ECE Department, UCSD

ECE-271B. Nuno Vasconcelos ECE Department, UCSD ECE-271B Statistical ti ti Learning II Nuno Vasconcelos ECE Department, UCSD The course the course is a graduate level course in statistical learning in SLI we covered the foundations of Bayesian or generative

More information

Review of Probability

Review of Probability Review of robabilit robabilit Theor: Man techniques in speech processing require the manipulation of probabilities and statistics. The two principal application areas we will encounter are: Statistical

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes. Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average

More information

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. 1/48 Linear regression Linear regression is a simple approach

More information

Chapter 3. Expectations

Chapter 3. Expectations pectations - Chapter. pectations.. Introduction In this chapter we define five mathematical epectations the mean variance standard deviation covariance and correlation coefficient. We appl these general

More information

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7 Bagging Ryan Tibshirani Data Mining: 36-462/36-662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Big Data, Machine Learning, and Causal Inference

Big Data, Machine Learning, and Causal Inference Big Data, Machine Learning, and Causal Inference I C T 4 E v a l I n t e r n a t i o n a l C o n f e r e n c e I F A D H e a d q u a r t e r s, R o m e, I t a l y P a u l J a s p e r p a u l. j a s p e

More information

Chemistry 141 Laboratory Lab Lecture Notes for Kinetics Lab Dr Abrash

Chemistry 141 Laboratory Lab Lecture Notes for Kinetics Lab Dr Abrash Q: What is Kinetics? Chemistr 141 Laborator Lab Lecture Notes for Kinetics Lab Dr Abrash Kinetics is the stud of rates of reaction, i.e., the stud of the factors that control how quickl a reaction occurs.

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Machine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015

Machine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015 Machine Learning Regression basics Linear regression, non-linear features (polynomial, RBFs, piece-wise), regularization, cross validation, Ridge/Lasso, kernel trick Marc Toussaint University of Stuttgart

More information

Classification: The rest of the story

Classification: The rest of the story U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

CS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont)

CS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont) CS325 Artificial Intelligence Cengiz Spring 2013 Model Complexity in Learning f(x) x Model Complexity in Learning f(x) x Let s start with the linear case... Linear Regression Linear Regression price =

More information

Tutorial on Machine Learning for Advanced Electronics

Tutorial on Machine Learning for Advanced Electronics Tutorial on Machine Learning for Advanced Electronics Maxim Raginsky March 2017 Part I (Some) Theory and Principles Machine Learning: estimation of dependencies from empirical data (V. Vapnik) enabling

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

Generalization Error on Pruning Decision Trees

Generalization Error on Pruning Decision Trees Generalization Error on Pruning Decision Trees Ryan R. Rosario Computer Science 269 Fall 2010 A decision tree is a predictive model that can be used for either classification or regression [3]. Decision

More information

Introduction to the regression problem. Luca Martino

Introduction to the regression problem. Luca Martino Introduction to the regression problem Luca Martino 2017 2018 1 / 30 Approximated outline of the course 1. Very basic introduction to regression 2. Gaussian Processes (GPs) and Relevant Vector Machines

More information

WALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics

WALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics 1 WALD LECTURE II LOOKING INSIDE THE BLACK BOX Leo Breiman UCB Statistics leo@stat.berkeley.edu ORIGIN OF BLACK BOXES 2 Statistics uses data to explore problems. Think of the data as being generated by

More information

MACHINE LEARNING 2012 MACHINE LEARNING

MACHINE LEARNING 2012 MACHINE LEARNING 1 MACHINE LEARNING kernel CCA, kernel Kmeans Spectral Clustering 2 Change in timetable: We have practical session net week! 3 Structure of toda s and net week s class 1) Briefl go through some etension

More information

Ensemble Methods for Machine Learning

Ensemble Methods for Machine Learning Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied

More information

The Force Table Introduction: Theory:

The Force Table Introduction: Theory: 1 The Force Table Introduction: "The Force Table" is a simple tool for demonstrating Newton s First Law and the vector nature of forces. This tool is based on the principle of equilibrium. An object is

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

20 Unsupervised Learning and Principal Components Analysis (PCA)

20 Unsupervised Learning and Principal Components Analysis (PCA) 116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Sample questions for Fundamentals of Machine Learning 2018

Sample questions for Fundamentals of Machine Learning 2018 Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure

More information

BAGGING PREDICTORS AND RANDOM FOREST

BAGGING PREDICTORS AND RANDOM FOREST BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS

More information

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function

More information

CS7267 MACHINE LEARNING

CS7267 MACHINE LEARNING CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning

More information

Learning From Data: Modelling as an Optimisation Problem

Learning From Data: Modelling as an Optimisation Problem Learning From Data: Modelling as an Optimisation Problem Iman Shames April 2017 1 / 31 You should be able to... Identify and formulate a regression problem; Appreciate the utility of regularisation; Identify

More information