Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Similar documents
PATTERN CLASSIFICATION

Learning From Data Lecture 10 Nonlinear Transforms

Pattern Recognition and Machine Learning

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

6.036 midterm review. Wednesday, March 18, 15

CS 6375 Machine Learning

Learning From Data Lecture 2 The Perceptron

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Learning From Data Lecture 25 The Kernel Trick

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

ECE 5984: Introduction to Machine Learning

Learning From Data Lecture 8 Linear Classification and Regression

Neural Networks and Deep Learning

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Introduction to Machine Learning

VBM683 Machine Learning

Learning From Data Lecture 14 Three Learning Principles

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Machine Learning. Ensemble Methods. Manfred Huber

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Learning From Data Lecture 13 Validation and Model Selection

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Unsupervised Learning

ECE 5424: Introduction to Machine Learning

CS534 Machine Learning - Spring Final Exam

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Learning From Data Lecture 12 Regularization

IE598 Big Data Optimization Introduction

Holdout and Cross-Validation Methods Overfitting Avoidance

FINAL: CS 6375 (Machine Learning) Fall 2014

CODE AND DATASETS. Rating Easy? AI? Sys? Thy? Morning? +2 y y n y n

Stochastic Gradient Descent

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Introduction to Machine Learning Midterm Exam

Kernel Methods. Barnabás Póczos

SVMs: nonlinearity through kernels

Machine Learning A Bayesian and Optimization Perspective

Introduction to Logistic Regression and Support Vector Machine

Logistic Regression. Machine Learning Fall 2018

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Learning From Data Lecture 9 Logistic Regression and Gradient Descent

Jeff Howbert Introduction to Machine Learning Winter

Approximate Inference Part 1 of 2

Qualifying Exam in Machine Learning

Learning with multiple models. Boosting.

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Approximate Inference Part 1 of 2

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Recent Advances in Bayesian Inference Techniques

STA 4273H: Statistical Machine Learning

CS Machine Learning Qualifying Exam

Course in Data Science

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1

Introduction to Support Vector Machines

Numerical Learning Algorithms

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Deep unsupervised learning

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Nonparametric Bayesian Methods (Gaussian Processes)

Artificial Intelligence Roman Barták

CSCI567 Machine Learning (Fall 2014)

Probabilistic Graphical Models

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Machine Learning, Midterm Exam

CS6220: DATA MINING TECHNIQUES

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Data Informatics. Seon Ho Kim, Ph.D.

Introduction to Machine Learning Midterm Exam Solutions

Learning From Data Lecture 5 Training Versus Testing

PAC-learning, VC Dimension and Margin-based Bounds

SVMs, Duality and the Kernel Trick

STA414/2104 Statistical Methods for Machine Learning II

Introduction and Models

Support Vector Machine, Random Forests, Boosting Based in part on slides from textbook, slides of Susan Holmes. December 2, 2012

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

Introduction Generalization Overview of the main methods Resources. Pattern Recognition. Bertrand Thirion and John Ashburner

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning from Data

Learning From Data Lecture 26 Kernel Machines

Bits of Machine Learning Part 1: Supervised Learning

Machine Learning Lecture 5

Bayesian Networks Inference with Probabilistic Graphical Models

ECE521 week 3: 23/26 January 2017

Support Vector Machines: Maximum Margin Classifiers

Learning From Data Lecture 7 Approximation Versus Generalization

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Learning Energy-Based Models of High-Dimensional Data

Logistic Regression & Neural Networks

Low Bias Bagged Support Vector Machines

Deep Feedforward Networks

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Support Vector Machines and Kernel Methods

CSC242: Intro to AI. Lecture 21

Machine Learning Basics: Maximum Likelihood Estimation

CS7267 MACHINE LEARNING

Transcription:

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100

recap: Three Learning Principles Scientist 2 resistivity ρ Scientist 3 resistivity ρ Occam s razor: simpler is better; falsifiable resistivity ρ Scientist 1 temperature T temperature T not falsifiable temperature T falsifiable Sampling bias: ensure that training and test distributions are the same, or else acknowledge/account for it You cannot sample from one bin and use your estimates for another bin Data snooping: you are charged for every choice influenced by D Choose the learning process (usually H) before looking at D? h H Data g? We know the price of choosing g from H D c AM L Creator: Malik Magdon-Ismail? Reflecting on Our Path: 2 /10 your choices g Our Plan

Our Plan 1 What is Learning? Output g f after looking at data (x n,y n ) 2 Can We do it? E in E out simple H, finite d vc, large N E in 0 good H, algorithms 3 How to do it? Linear models, nonlinear transforms Algorithms: PLA, pseudoinverse, gradient descent 4 How to do it well? Overfitting: stochastic & deterministic noise Cures: regularization, validation concepts theory practice 5 General principles? Occams razor, sampling bias, data snooping 6 Advanced techniques 7 Other Learning Paradigms c AM L Creator: Malik Magdon-Ismail Reflecting on Our Path: 3 /10 LFD Jungle

Learning From Data: It s A Jungle Out There overfitting stochastic noise K-means augmented error ill-posed distribution free learning stochastic gradient descent exploration Gaussian processes Lloyds algorithm deterministic noise bootstrapping data snooping linear regression bagging unlabelled data expectation-maximization logistic regression CART Rademacher complexity transfer learning learning VC dimension reinforcement curve exploitation Q-learning nonlinear transformation sampling bias support vectors neural networks Markov Chain Monte Carlo (MCMC) Mercer s theorem linear models ordinal regression adaboost decision trees SVM graphical models bioinformatics training versus testing Gibbs sampling extrapolation no free lunch cross validation HMMs RBF bias-variance tradeoff data contamination perceptron learning DEEP LEARNINGPAC-learning error measures biometrics active learning multiclass MDL types of learning one versus all unsupervised weak learning is learning feasible? momentum RKHS conjugate gradients online-learning Occam s razor noisy targets mixture of experts kernel methods ranking multi-agent systems boosting AIC ensemble methods classification PCA kernel-pca permutation complexity LLE regularization primal-dual colaborative filtering semi-supervised learning clustering Levenberg-Marquardt weight decay Big Data Boltzmann machine c AM L Creator: Malik Magdon-Ismail Reflecting on Our Path: 4 /10 Theory

Navigating the Jungle: Theory THEORY VC-analysis bias-variance complexity Rademacher SRM c AM L Creator: Malik Magdon-Ismail Reflecting on Our Path: 5 /10 Techniques

Navigating the Jungle: Techniques THEORY TECHNIQUES VC-analysis bias-variance complexity Rademacher SRM Models Methods c AM L Creator: Malik Magdon-Ismail Reflecting on Our Path: 6 /10 Models

Navigating the Jungle: Models THEORY TECHNIQUES VC-analysis bias-variance complexity Rademacher SRM Models linear neural networks SVM similarity Gaussian processes graphical models bilinear/svd Methods c AM L Creator: Malik Magdon-Ismail Reflecting on Our Path: 7 /10 Methods

Navigating the Jungle: Methods THEORY TECHNIQUES VC-analysis bias-variance complexity Rademacher SRM Models linear neural networks SVM similarity Gaussian processes graphical models bilinear/svd Methods regularization validation aggregation preprocessing c AM L Creator: Malik Magdon-Ismail Reflecting on Our Path: 8 /10 Paradigms

Navigating the Jungle: Paradigms THEORY TECHNIQUES PARADIGMS VC-analysis bias-variance complexity Rademacher SRM Models linear neural networks SVM similarity Gaussian processes graphical models bilinear/svd Methods regularization validation aggregation preprocessing supervised unsupervised reinforcement active online unlabeled transfer learning big data c AM L Creator: Malik Magdon-Ismail Reflecting on Our Path: 9 /10 Moving Forward

Moving Forward 1 What is Learning? Output g f after looking at data (x n,y n ) 2 Can We do it? 3 How to do it? E in E out simple H, finite d vc, large N E in 0 good H, algorithms Linear models, nonlinear transforms Algorithms: PLA, pseudoinverse, gradient descent 4 How to do it well? Overfitting: stochastic & deterministic noise Cures: regularization, validation concepts theory practice 5 General principles? Occams razor, sampling bias, data snooping 6 Advanced techniques Similarity, neural networks, SVMs, preprocessing & aggregation 7 Other Learning Paradigms Unsupervised, reinforcement c AM L Creator: Malik Magdon-Ismail Reflecting on Our Path: 10 /10