1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure
|
|
- Harry Clarke
- 5 years ago
- Views:
Transcription
1 Overview Breiman L (2001). Statistical modeling: The two cultures Statistical learning reading group Aleander L 1 Histor of statistical/machine learning 2 Supervised learning 3 Two approaches to supervised learning 4 The general learning procedure Pschological Methods Universit of Amsterdam Amsterdam 27 October Model compleit 6 Conclusion Definition of machine learning Histor Arthur Samuel (1959) Field of stud that gives computers the abilit to learn without being eplicitl programmed Tom M. Mitchell (1997) A computer program is said to learn from eperience E with respect to some class of tasks T and performance measure P if its performance at tasks in T as measured b P improves with eperience E. Evolved from Artificial intelligence Pattern recognition Success stories (from 1990s onwards): Spam filters Optical character recognition Natural language processing (Search engines) Recommender sstems Netfli challenge (2006) won b AT&T labs research
2 Histor Success stories (from 1990s onwards): Spam filters Optical character recognition Natural language processing (Search engines) Recommender sstems Netfli challenge (2006) won b AT&T labs research Summar: Success correlated with the rise of the internet and reinvented statistics. Machine learning terms for statistical concepts Ignored b most statisticians ecept for the Breiman and Tibshirani Hastie Friedman (Efron Stanford school) Histor Summar: Success correlated with the rise of the internet and reinvented statistics. Machine learning terms for statistical concepts Ignored b most statisticians ecept for the Breiman and Tibshirani Hastie Friedman (Efron Stanford school) Statistics Estimation Data point Regression Classification Covariate Response Machine learning Learning Eample/Instance Supervised learning Supervised learning Feature Label Breiman Career: 1954 PhD in probabilit 1960s Consulting 1980s onwards professor at UC Berkele Breiman s consulting eperience Prediction problems Live with the data before modelling Solution should be either an algorithmic or a data model. Predictive accurac is the criterion for the qualit of the model Computers necessar in practice Problem statement: Supervised learning Problem setting Based on n pairs of data n where i are features and i are correct labels predict future labels new given new. Eample (classification): Features: 1 =(nose mouth). Label: 1 = es face Features: 2 =(doorbell hinge). Label: 2 = no face n Eamples of algorithmic techniques/models: Classification and regression trees bagging random forest. In the paper Breiman focusses on supervised learning
3 Problem statement: Supervised learning Problem setting Based on n pairs of data n where i are features and i are correct labels predict future labels new given new. n Problem statement: Supervised learning Problem setting Based on n pairs of data n where i are features and i are correct labels predict future labels new given new. n unknown f = f ()+ For prediction: discover the unknown f that relates features (covariates) to labels (dependent variables). input output Goal: Machine learning Predict future instances based on alread observed data Prediction based on past data. Eample: "Based on previous data I predict that it will rain tomorrow" Give a prediction accurac. Eample: "Based on previous data I m 67% sure that it will rain tomorrow" Use data models or even algorithmic models to discover the relationship f provided b nature Machine learning vs "standard" statistical inference culture Machine learning Goal: Prediction of new data (learn f from the data) Approach: Data are alwas right there are no models onl algorithms Passive: Data are alread collected "Big" data "Standard" approach in pscholog Goal: evaluation of theor (evaluate a known f ) Approach: The model is right the data could be wrong. Evaluate theor b comparing two models Active: Confirmator analsis based on the model: Design of eperiments power analsis etc etc "Small" data
4 "Standard" approach in pscholog Goal: evaluation of theor (evaluate a known f ) X Man time no theor most research has an eplorator flavour (80% according to Lakens) Approach: The model is right the data could be wrong. Evaluate theor b comparing two models X What if the model is wrong? Active: Confirmator analsis based on the model: Design of eperiments power analsis etc etc X Design of eperiments power analsis etc etc wrong if the model assumption is wrong "Small" data X Mechanical turk fmri data genetics cito OSF international collaboration man labs etc. Breiman s: Critique on the data modelling approach Wrong presumption Let data be generated according to the data model f where f is linear/logistic regression/... Eample: Uncritical use of linear regression to for instance bimodal data. Breiman s: Critique on the data modelling approach Wrong presumption Let data be generated according to the data model f where f is linear/logistic regression/... Breiman s: Critique on the data modelling approach Wrong presumption Let data be generated according to the data model f where f is linear/logistic regression/... f is linear f is logistic Focus on modelling the sampling distribution of the error not the f : {z} obs known z} { f ( {z} )= N(0 obs 2 ) f is a "network" Eample: in ANOVA sums of squared error R 2 etc. If the errors are big it is implied that the theor is bad. To quantif "big" require sampling distribution.
5 Breiman s: Critique on the data modelling approach Breiman s eperience from consulting Wrong presumption Let data be generated according to the data model f where f is linear/logistic regression/... To calculate sampling distributions stuck with simple models (Linear) Conclusion are about the model mechanism not about nature s mechanism If model is a poor emulation of nature conclusions will be wrong. The classical (linear) models are tractable but tpicall ield bad predictions Live with the data before modelling Algorithmic models are also good Predictive accurac on test set is the criterion for how good the model is Computers are important Recall setting Recall setting Problem: Problem: With which f does nature generate data = f ()+ Goal: Prediction based on past data. Learn (estimate) unknown f unknown f Give a prediction accurac. "Live with the data before modelling". Data split in three parts: Training set to learn f from the data input output
6 Recall setting Recall setting Problem: With which f does nature generate data = f ()+ Goal: Prediction based on past data. Learn (estimate) unknown f Give a prediction accurac. "Live with the data before modelling". Data split in three parts: Training set to learn f from the data Validation set to do model selection Problem: With which f does nature generate data = f ()+ Goal: Prediction based on past data. Learn (estimate) unknown f Give a prediction accurac. "Live with the data before modelling". Data split in three parts: Training set to learn f from the data Validation set To do model selection Test set to estimate the prediction accurac Recall data set consists of n pairs sa Randoml select training samples sa Training set is used to learn f from the data Corresponding formula = f ()+ Fill in the true traini traini traini = f ( traini )+ find that f for which the loss between 1 n train n train X i=1 Loss traini f ( traini ) is smallest. Call the minimiser f trained
7 Use the other samples as test samples sa Training set is used to learn f from the data. 8 Test set is used to derive the prediction accurac Fill in f trained and appl it to to ield implied = f trained ( testi ) and compare the estimate the error between implied with true testi estim = 1 Xn test Loss testi f trained ( testi ) n test i=1 10 Generalise to a new instance new Training set is used to learn f from the data. Test set is used to derive the prediction accurac. Final answer for et unseen features new use to generate new new =... implied z } { f trained ( new ) ± estim {z } Generalisation error The error estim so estimated serves as the prediction accurac. Remarks Loss functions Onl assumption is that the data i i are iid that is the are generated with the same (true fied but unknown) f. The data are assumed to be true No specification of the Loss function. The loss function replaces the assumptions on the error (tpicall Gaussian error in "standard" statistics) No specification of the collection F of functions f that we believe to be viable For categorical data tpicall zero-one (all or nothing) loss: ( 0 if i = f ( i ) Loss i f ( i ) = 1 if i 6= f ( i ) Note the hard rule here observation i "supervise" the learning.
8 Loss functions Loss functions For categorical data tpicall zero-one (all or nothing) loss: ( 0 if i = f ( i ) Loss i f ( i ) = 1 if i 6= f ( i ) Note the hard rule here observation i "supervise" the learning. For continuous data tpicall mean-squared error loss: h Loss Y f () = E Y i 2 f () Here E is the epectation (average) with respect to the true relationship f between X and Y. For categorical data tpicall zero-one (all or nothing) loss: ( 0 if i = f ( i ) Loss i f ( i ) = 1 if i 6= f ( i ) Note the hard rule here observation i "supervise" the learning. For continuous data tpicall mean-squared error loss: Loss Y f () = 1 n nx i=1 h i 2 i f ( i ) Here the epectation E is replaced with the empirical average with respect to the data being true. Bias-Variance trade-off and overfitting Bias-Variance trade-off and overfitting For continuous data tpicall mean-squared error loss: h Loss Y f () = E Y i 2 f () Here E is the epectation (average) with respect to the true f. For continuous data tpicall mean-squared error loss: h Loss Y f () =E Y =Var(f ()) + i 2 f () h 2 Bias(f ())i + Var( ) Both Var and E thus Bias are with respect to the true f.
9 Bias-Variance trade-off and overfitting Overfitting For continuous data tpicall mean-squared error loss: h Var(f ()) + i 2 Bias(f ()) {z } Structural error + Var( ) {z } Unavoidable error Recall that f trained is from minimising this loss. The more complicated a candidate f the smaller the unavoidable error as everthing is seen as structural. Problem: overfitting. For continuous data tpicall mean-squared error loss: h Var(f ()) + i 2 Bias(f ()) {z } Structural error + Var( ) {z } Unavoidable error Overfitting occurs if the f s under consideration are too complicated. An over complicated f generalises badl and is recognised b low bias high variance. Overfitting: Over complicated f : low bias high variance. True data generating f () = where is a Laplace distribution (thicker tails than normal). Raw data: Overfitting: Over complicated f : low bias high variance. Raw data and true function f :
10 Overfitting: Over complicated f : low bias high variance. Fitted with a polnomial of order nine. Overfitting: Over complicated f : low bias high variance. Observe new test sample/instance test = Overfitting: Over complicated f : low bias high variance. Observe new test sample/instance test = 6.5 and test = 66 Overfitting: Over complicated f : low bias high variance. Large loss between test and implied :
11 Overfitting: Over complicated f : low bias high variance. Model compleit True data generation was done with a polnomial of degree 2 polnomial of degree 9 is too comple. Hence the collection of candidate f s should be restricted to those f s with ma degree 2. In realit don t know the "compleit" of the true. How to choose the collection of candidates F? Eample conclusion: our epected survival ears are 12 ears ± 50 ears. Meaningless prediction. Two tpes of model compleit Two tpes of model compleit The structure of f. Linear polnomial (but still a summation of terms thus linear) <- nothing new same as in statistics The structure of f. Linear polnomial (but still a summation of terms thus linear). Neural networks (non-linear) support vector machines generalised additive models kernel smoothing splines reproducing kernel Hilbert space regression trees. Number of features n < p.
12 "Standard" setting Regularisation The functions f s in F are linear More data than features p < n. The functions f s in F are linear More features than data n < p. Hazard of overfitting Control the compleit with additional tuning parameter (for instance lasso) Hence F = F Eample: f () = p p + {z } {z} (1) Over complicated structure tuning parameter Here acts as a penalt for compleit. How to tune? Regularisation The functions f s in F are linear More features than data n < p. Hazard of overfitting Control the compleit with additional tuning parameter (for instance lasso) Hence F = F Eample: f () = p p + {z } {z} (1) Over complicated structure tuning parameter Here acts as a penalt for compleit. Tune with validation set aka repeat the general procedure man times with different (fied). Split data into three sets. Training set is used to learn f from the data. Validation set to tune (model selection). Test set is used to derive the prediction accurac. Partition. For the lasso = 0 no regularisation for = 1 onl the constant function is viable. Sa = For each fied follow the general procedure f.
13 Training set is used to learn f from the data. Validation set to tune (model selection). Test set is used to derive the prediction accurac. Sa = We get F = f 0trained f 2trained f 2 2 trained...f 2 10 trained (2) Training set is used to learn f from the data. Validation set to tune (model selection). Test set is used to derive the prediction accurac. Sa = Pick f trained with lowest average loss on the validation set. f val = arg min 1 n val n val X i=1 loss f trained ( ival ) ival (2) Training set is used to learn f from the data. Validation set to tune (model selection). Test set is used to derive the prediction accurac. Sa = Estimate prediction accurac b averaging the average loss on the test set est = 1 Xn test loss f val ( itest ) itest n test i=1 (2) Gave general idea of supervised learning. Role of training validation and test set Which specific classes of F to take. This seminar discusses a couple of them: Neural networks (non-linear) support vector machines generalised additive models kernel smoothing splines reproducing kernel Hilbert space regression trees. How to minimise? Technicallities Selection method based on select the best. Other methods bagging and boosting
14 Further organisation Changed name from machine learning reading group to statistical learning seminar Reading groups die Seminar does not require everone to read everthing Requires a small peak in preparation of a talk Not necessar to understand everthing. Can be practical and theoretical Still good to read things in advanced. Also website with outube clips are available.
Learning From Data Lecture 7 Approximation Versus Generalization
Learning From Data Lecture 7 Approimation Versus Generalization The VC Dimension Approimation Versus Generalization Bias and Variance The Learning Curve M. Magdon-Ismail CSCI 4100/6100 recap: The Vapnik-Chervonenkis
More informationCSE 546 Midterm Exam, Fall 2014
CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging
More informationINF Introduction to classifiction Anne Solberg
INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300
More informationMachine Learning Foundations
Machine Learning Foundations ( 機器學習基石 ) Lecture 13: Hazard of Overfitting Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan Universit
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationResearch Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D.
Research Design - - Topic 15a Introduction to Multivariate Analses 009 R.C. Gardner, Ph.D. Major Characteristics of Multivariate Procedures Overview of Multivariate Techniques Bivariate Regression and
More informationOverview. IAML: Linear Regression. Examples of regression problems. The Regression Problem
3 / 38 Overview 4 / 38 IAML: Linear Regression Nigel Goddard School of Informatics Semester The linear model Fitting the linear model to data Probabilistic interpretation of the error function Eamples
More informationFeedforward Neural Networks
Chapter 4 Feedforward Neural Networks 4. Motivation Let s start with our logistic regression model from before: P(k d) = softma k =k ( λ(k ) + w d λ(k, w) ). (4.) Recall that this model gives us a lot
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationCS 188: Artificial Intelligence Spring Today
CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationRidge Regression: Regulating overfitting when using many features
Ridge Regression: Regulating overfitting when using man features Emil Fox Universit of Washington April 10, 2018 Training, true, & test error vs. model complexit Overfitting if: Error Model complexit 2
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More informationINF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.
INF 4300 700 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter -6 6inDuda and Hart: attern Classification 303 INF 4300 Introduction to classification One of the most challenging
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationImportance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University
Importance Sampling: An Alternative View of Ensemble Learning Jerome H. Friedman Bogdan Popescu Stanford University 1 PREDICTIVE LEARNING Given data: {z i } N 1 = {y i, x i } N 1 q(z) y = output or response
More informationSupervised Learning. George Konidaris
Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationcxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c
Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationDecision trees COMS 4771
Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationDay 3: Classification, logistic regression
Day 3: Classification, logistic regression Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 20 June 2018 Topics so far Supervised
More informationLinear regression Class 25, Jeremy Orloff and Jonathan Bloom
1 Learning Goals Linear regression Class 25, 18.05 Jerem Orloff and Jonathan Bloom 1. Be able to use the method of least squares to fit a line to bivariate data. 2. Be able to give a formula for the total
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationSemi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis
Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analsis Matthew B. Blaschko, Christoph H. Lampert, & Arthur Gretton Ma Planck Institute for Biological Cbernetics Tübingen, German
More informationMultilayer Neural Networks
Pattern Recognition Multilaer Neural Networs Lecture 4 Prof. Daniel Yeung School of Computer Science and Engineering South China Universit of Technolog Outline Introduction (6.) Artificial Neural Networ
More informationRegular Physics - Notes Ch. 1
Regular Phsics - Notes Ch. 1 What is Phsics? the stud of matter and energ and their relationships; the stud of the basic phsical laws of nature which are often stated in simple mathematical equations.
More informationPrediction & Feature Selection in GLM
Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis
More informationCOMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)
COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24 Administrivia This course Topics http://www.satyenkale.com/coms4771/ 1. Supervised learning Core
More informationClassification using stochastic ensembles
July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics
More informationCorrelation. Bret Hanlon and Bret Larget. Department of Statistics University of Wisconsin Madison. December 6, Correlation 1 / 25
Correlation Bret Hanlon and Bret Larget Department of Statistics Universit of Wisconsin Madison December 6, 2 Correlation / 25 The Big Picture We have just completed a length series of lectures on ANOVA
More informationNatural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley
Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II
More informationRELATIONS AND FUNCTIONS through
RELATIONS AND FUNCTIONS 11.1.2 through 11.1. Relations and Functions establish a correspondence between the input values (usuall ) and the output values (usuall ) according to the particular relation or
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationBayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine
Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview
More informationIntroduction to Machine Learning and Cross-Validation
Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, 2019 1 / 29 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More information0.24 adults 2. (c) Prove that, regardless of the possible values of and, the covariance between X and Y is equal to zero. Show all work.
1 A socioeconomic stud analzes two discrete random variables in a certain population of households = number of adult residents and = number of child residents It is found that their joint probabilit mass
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationA Statistical Framework for Analysing Big Data Global Conference on Big Data for Official Statistics October, 2015 by S Tam, Chief
A Statistical Framework for Analysing Big Data Global Conference on Big Data for Official Statistics 20-22 October, 2015 by S Tam, Chief Methodologist Australian Bureau of Statistics 1 Big Data (BD) Issues
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More informationBayesian Support Vector Machines for Feature Ranking and Selection
Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationLeast-Squares Independence Test
IEICE Transactions on Information and Sstems, vol.e94-d, no.6, pp.333-336,. Least-Squares Independence Test Masashi Sugiama Toko Institute of Technolog, Japan PRESTO, Japan Science and Technolog Agenc,
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationBoosting Methods: Why They Can Be Useful for High-Dimensional Data
New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,
More informationIs the test error unbiased for these programs? 2017 Kevin Jamieson
Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationEssence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io
Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io 1 Examples https://www.youtube.com/watch?v=bmka1zsg2 P4 http://www.r2d3.us/visual-intro-to-machinelearning-part-1/
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More information6.867 Machine learning
6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of
More informationECE-271B. Nuno Vasconcelos ECE Department, UCSD
ECE-271B Statistical ti ti Learning II Nuno Vasconcelos ECE Department, UCSD The course the course is a graduate level course in statistical learning in SLI we covered the foundations of Bayesian or generative
More informationReview of Probability
Review of robabilit robabilit Theor: Man techniques in speech processing require the manipulation of probabilities and statistics. The two principal application areas we will encounter are: Statistical
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationRandom Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.
Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average
More informationLinear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.
Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. 1/48 Linear regression Linear regression is a simple approach
More informationChapter 3. Expectations
pectations - Chapter. pectations.. Introduction In this chapter we define five mathematical epectations the mean variance standard deviation covariance and correlation coefficient. We appl these general
More informationBagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7
Bagging Ryan Tibshirani Data Mining: 36-462/36-662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationBig Data, Machine Learning, and Causal Inference
Big Data, Machine Learning, and Causal Inference I C T 4 E v a l I n t e r n a t i o n a l C o n f e r e n c e I F A D H e a d q u a r t e r s, R o m e, I t a l y P a u l J a s p e r p a u l. j a s p e
More informationChemistry 141 Laboratory Lab Lecture Notes for Kinetics Lab Dr Abrash
Q: What is Kinetics? Chemistr 141 Laborator Lab Lecture Notes for Kinetics Lab Dr Abrash Kinetics is the stud of rates of reaction, i.e., the stud of the factors that control how quickl a reaction occurs.
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationMachine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Regression basics Linear regression, non-linear features (polynomial, RBFs, piece-wise), regularization, cross validation, Ridge/Lasso, kernel trick Marc Toussaint University of Stuttgart
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationCS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont)
CS325 Artificial Intelligence Cengiz Spring 2013 Model Complexity in Learning f(x) x Model Complexity in Learning f(x) x Let s start with the linear case... Linear Regression Linear Regression price =
More informationTutorial on Machine Learning for Advanced Electronics
Tutorial on Machine Learning for Advanced Electronics Maxim Raginsky March 2017 Part I (Some) Theory and Principles Machine Learning: estimation of dependencies from empirical data (V. Vapnik) enabling
More informationStephen Scott.
1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training
More informationGeneralization Error on Pruning Decision Trees
Generalization Error on Pruning Decision Trees Ryan R. Rosario Computer Science 269 Fall 2010 A decision tree is a predictive model that can be used for either classification or regression [3]. Decision
More informationIntroduction to the regression problem. Luca Martino
Introduction to the regression problem Luca Martino 2017 2018 1 / 30 Approximated outline of the course 1. Very basic introduction to regression 2. Gaussian Processes (GPs) and Relevant Vector Machines
More informationWALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics
1 WALD LECTURE II LOOKING INSIDE THE BLACK BOX Leo Breiman UCB Statistics leo@stat.berkeley.edu ORIGIN OF BLACK BOXES 2 Statistics uses data to explore problems. Think of the data as being generated by
More informationMACHINE LEARNING 2012 MACHINE LEARNING
1 MACHINE LEARNING kernel CCA, kernel Kmeans Spectral Clustering 2 Change in timetable: We have practical session net week! 3 Structure of toda s and net week s class 1) Briefl go through some etension
More informationEnsemble Methods for Machine Learning
Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied
More informationThe Force Table Introduction: Theory:
1 The Force Table Introduction: "The Force Table" is a simple tool for demonstrating Newton s First Law and the vector nature of forces. This tool is based on the principle of equilibrium. An object is
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More information20 Unsupervised Learning and Principal Components Analysis (PCA)
116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationSample questions for Fundamentals of Machine Learning 2018
Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure
More informationBAGGING PREDICTORS AND RANDOM FOREST
BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationLearning From Data: Modelling as an Optimisation Problem
Learning From Data: Modelling as an Optimisation Problem Iman Shames April 2017 1 / 31 You should be able to... Identify and formulate a regression problem; Appreciate the utility of regularisation; Identify
More information