Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
|
|
- Prosper Morrison
- 5 years ago
- Views:
Transcription
1 Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
2 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics Perform experiments Make conclusion Statistical learning procedure Collect data Analyze the data Find new rules Let the data tell something. Seoul National University. 1
3 Why Statistical learning necessary? We know most of rules which can be imagined by our brain. Life (nature, socio-economic status, human behavior, biology etc.) is more complex than we have thought. Our world is changing too fast for us to keep up with based only on our logics. Due to digitalization, amount of data is increasing very fast. Most of information in huge data remains undiscovered Sample questions What are the risk factors for heart failure? Are there genes which characterize differences between various races? How does the stock market behave? Which chemical confounds are effective for a specific disease? Seoul National University. 2
4 Who are valuable customers for our company? What are the influential factors for changing the amount of ozone? Are there patterns in the content of spam mails? In statistical learning, the common objective is to find causes for a given phenomenon. One of the common features of the problems is that the set of possible causes we can think of is very large. Learning procedure suffers from time limitation unless we are lucky. Seoul National University. 3
5 Machine learning vs Statistical learning (personal view) Machine learning is a method to educate a machine (computer). Two tasks Without errors (eg. rule based learning) With errors Statistical learning is a subset of machine learning, which deals with tasks with errors. Seoul National University. 4
6 Statistical view of statistical learning Analysis of ultra-high dimensional data Methods to overcome the curse of dimensionality Seoul National University. 5
7 Supervised and Unsupervised learnings Supervised learning Use the inputs to predict the values of the outputs Examples: Regression and Classification Unsupervised learning Only use inputs to describe the data Examples: Clustering, PCA Seoul National University. 6
8 1. Basic set-up of Supervised learning Input(Covariate) : x R p Output(Response) : y Y System (Model): y = ϕ(x, ϵ) Loss function: l(y, a) Assumption : f belongs to a family of functions F. Learning set (Data): L = {(y i, x i ), i = 1,..., n} assumed to be a random sample of (Y, X) P Objective: Find f 0 = arg min f F E (Y,X) l(y, f(x)). Predictor(Estimator): ˆf(x) = f(x, L). Prediction: If new input is x, predict unknown y by ˆf(x). Seoul National University. 7
9 y is categorical Classification is continuous Regression Seoul National University. 8
10 2. From Least Squares to Nearest Neighbor (for regression) Least Squares Assumption : f(x) {β 0 + p i=1 x iβ i } Estimate β = (β 0, β 1,..., β p ) by ˆβ which minimizes the residual sum of square ( ) 2 n p RSS(β) = y i β 0 x ki β k. i=1 k=1 f(x, L) = ˆβ 0 + p i=1 x i ˆβ i. Seoul National University. 9
11 Nearest Neighbor (NN) N k (x): the neighborhood of x defined by the k closest points x i in the training sample. f(x, L) = 1 k x i N k (x) y i. Seoul National University. 10
12 Simulation 1 Model: y = x + ϵ and ϵ N(0, 1). Training sample size is 100. The test error is calculated by test sample of size Result Method Training error Test error Linear NN NN NN Seoul National University. 11
13 Plot Linear Regression Nearest Neighbor with k= 1 y y Nearest Neighbor with k= 5 x Nearest Neighbor with k= 15 x y y x x Seoul National University. 12
14 Simulation 2 Model: y = x(1 x) + ϵ and ϵ N(0, 1). Training sample size is 100. The test error is calculated by test sample of size Result Method Training error Test error Linear NN NN NN Seoul National University. 13
15 Plot Linear Regression Nearest Neighbor with k= 1 y y Nearest Neighbor with k= 5 x Nearest Neighbor with k= 15 x y y x x Seoul National University. 14
16 Comments Linear model is the best when the true model is linear and worst when the true model is nonlinear. NN performs reasonably well regardless of what the true function is. Training error is not a good estimate of the test error. Complicated models do not always perform well. The number of neighborhood k controls the complexity of the predictor. Seoul National University. 15
17 LS vs NN LS NN Assumption linear nothing Data size small to medium large Interpretation easy almost impossible Predictability good when the true stable regardless is simple of the ture Tuning parameter nothing the size of neighbor Seoul National University. 16
18 3. Statistical Decision theory Regression The training sample L is a random sample from the joint distribution P (y, x). Let l(y, f(x)) be a loss function for penalizing errors in prediction. The most popular loss function is squared error loss: l(y, f(x)) = (y f(x)) 2. The expected prediction error of f (EP E(f)) is defined as where (Y, X) P (y, x). EP E(f) = E(Y f(x)) 2 Theorem : f 0 (x) = E(Y X = x) minimizes EP E(f). E(Y X = x) is called the regression function. Seoul National University. 17
19 For NN method, f is estimated by ˆf : Two approximations are ˆf(x) = Ave(y i x i N k (x)). expectation is approximated by averaging over sample data conditioning at a point is relaxed to conditioning on some region close to the target point. Theorem: Under regularity conditions, ˆf(x) f 0 (x) for all x R p when n, k and k/n 0. The condition k/n 0 means that the model complexity should increase slower than the sample size. Seoul National University. 18
20 For LS, f is assumed to be a linear function: f(x) = β 0 + p x i β i. i=1 f with β = ( E(XX T ) ) 1 E(XY ) minimizes the EPE. The LS estimator replace the expectation by averages over the training sample. Seoul National University. 19
21 Classification y {1,..., J}. For a given loss function l, the EPE is defined as E(l(Y, f(x))). Since EP E(f) = E X J L(j, f(x))p (Y = j X), j=1 J f(x) = arg min k=1,...,j j=1 the EPE. L(j, k)p (Y = j X = x) minimizes If l(y, f(x)) = I(y f(x)), f(x) becomes f(x) = max j=1,...,j P (Y = j X = x). (1) This predictor is called the Bayes rule (Bayes classifier) and its EPE is called the Bayes rate. Seoul National University. 20
22 Estimate the Bayes classifier via function estimation First, estimate ϕ j (x) = P (Y = j X = x), and Estimate the Bayes classifier by replacing P (Y = j X = x) by ϕ j (x) in (1). The NN estimation of ϕ j ˆϕ j (x) = 1 k x i N k (x) I(y i = j). Linear models do not fit well for estimating ϕ j since ϕ j should have values between 0 and 1. Logistic regression is an promising alternative. Seoul National University. 21
23 4. Curse of dimensionality When p is large, the concept neighborhood does not work for local averaging. Phenomenon 1 X = (X 1,..., X p ) Uniform[0, 1] p Consider a hypercubical neighborhood about a target point. We want to capture a fraction r of the sample. Then the expected edge length will be e p (r) = r 1/p. e 10 (0.01) = 0.63 and e 10 (0.1) = To capture 1% or 10% of the data to form a local average, we must cover 63% or 80% of the range of each input variable. Such neighborhoods are no longer local. Seoul National University. 22
24 Phenomenon 2: X = (X 1,..., X p ) Uniform in a p dimensional unit ball centered at the origin. For the sample size n, let R i = p k=1 X2 ki Let R (1) = min{r i }. Then, the median of R (1) is (1 (1/2) 1/n ) 1/p. for i = 1,..., n. For n = 5000, p = 10, the median is approximately 0.52, more than half way to the boundary. Most data points are closer to the boundary of the sample space than to the origin. Prediction is much more difficult near the edges since one must extrapolate rather than interpolate. Seoul National University. 23
25 Phenomenon 3: Suppose X Unifrom[ 1, 1] p. Assume that the true relation is Y = f(x) = exp( 8 X 2 ). Consider the 1-NN estimate at x = 0. The bias of the estimator is 1 exp( 8 x 2 (1) ) where x (1) is the smallest norm among the training sample. Since X 2 = p i=1 X2 i X2 (p) and X2 (p) the bias tends to increase as p increases. 1 as p, Seoul National University. 24
26 5. Overfitting and Bias-Variance tradeoff As we have seen, in the NN method, the size of neighborhood k controls the complexity of the predictor. The question is how to choose k? If we know P (y, x), we can choose k by minimizing the EPE (test error): EP E( ˆf k ) = E(Y ˆf k (X)) 2 where ˆf k is the k-nn estimate of f. Unfortunately, we do not know P (y, x). One naive answer is to estimate the EPE of ˆf k by the residual sum of square (training error): n (y i ˆf k (x i )) 2. i=1 Seoul National University. 25
27 The training error is downward biased estimator of the test error since the data set is used twice (one for constructing ˆf and the other for calculating the training error). Moreover, the training error keeps decreasing as k is getting smaller while the test error decreases initially and increases later. This means that too complicated models (or models fitting the training data too closely, or overfitted models) show poor performance. This seemingly mysterious phenomenon can be explained by the bias-variance decomposition. Several ways of choosing the model complexity (i.e. k in the NN method) will be explained later. Seoul National University. 26
28 Bias-Variance tradeoff (for regression) Suppose Y = f(x) + ϵ with E(ϵ) = 0 and Var(ϵ) = σ 2. For a given training sample L, the test error of f(x, L) is given by T E = E L E (Y,X) ((Y f(x, L)) 2 ), which is decomposed by T E = E (Y,X) ((Y f(x)) 2 ) + E X ((f(x) E L (f(x, L))) 2 ) +E X (E L (f(x, L) E L (f(x, L)) 2 ) = σ 2 + E X (Bias L (X) 2 + Variance L (X)). Seoul National University. 27
29 In general, if the model is getting complicated, the bias decreases and the variance increases. Example : k-nn method f(x, L) = k l=1 (f(x (l)) + ϵ (l) )/k where the subscripts (l) indicates the sequence of nearest neighbors to x. Then Bias L (x) = f(x) 1 k k f(x (l) ) and Variance L (x) = σ2 k. For k = 1, the bias is the smallest and variance is the largest while for k = n, the bias is the largest and variance is the smallest. l=1 Seoul National University. 28
30 Plot High bias Low variance Low bias High variance Error Test error Training error Complexity Seoul National University. 29
31 6. Four situations in supervised learning 1. p is small and F is parametric. Standard regression and classification problems MLE, least square, Robust estimator etc. 2. p is large and F is parametric. Develop efficient methods for small and moderate samples Variable selection, Shrinkage, Bayesian method etc. 3. p is small and F is nonparametric. Nonparametric regression Kernel, Spline, Wavelet, Mixture model etc. 4. p is large and F is nonparametric. Main play ground of Data Mining Decision tree, Project pursuit, MARS, Neural network etc. Seoul National University. 30
Lecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationSTK Statistical Learning: Advanced Regression and Classification
STK4030 - Statistical Learning: Advanced Regression and Classification Riccardo De Bin debin@math.uio.no STK4030: lecture 1 1/ 42 Outline of the lecture Introduction Overview of supervised learning Variable
More information9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients
What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationTerminology for Statistical Data
Terminology for Statistical Data variables - features - attributes observations - cases (consist of multiple values) In a standard data matrix, variables or features correspond to columns observations
More informationChap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationIntroduction to Machine Learning and Cross-Validation
Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, 2019 1 / 29 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationOpening Theme: Flexibility vs. Stability
Opening Theme: Flexibility vs. Stability Patrick Breheny August 25 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction We begin this course with a contrast of two simple, but very different,
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationDay 3: Classification, logistic regression
Day 3: Classification, logistic regression Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 20 June 2018 Topics so far Supervised
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the
More information2. (Today) Kernel methods for regression and classification ( 6.1, 6.2, 6.6)
1. Recap linear regression, model selection, coefficient shrinkage ( 3.1, 3.2, 3.3, 3.4.1,2,3) logistic regression, linear discriminant analysis (lda) ( 4.4, 4.3) regression with spline basis (ns, bs),
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationLinear Methods for Classification
Linear Methods for Classification Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Classification Supervised learning Training data: {(x 1, g 1 ), (x 2, g 2 ),..., (x
More informationCourse in Data Science
Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationData Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction
Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationDecision trees COMS 4771
Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationToday. Calculus. Linear Regression. Lagrange Multipliers
Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationChapter 7: Model Assessment and Selection
Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationIssues and Techniques in Pattern Classification
Issues and Techniques in Pattern Classification Carlotta Domeniconi www.ise.gmu.edu/~carlotta Machine Learning Given a collection of data, a machine learner eplains the underlying process that generated
More informationSupervised Learning: Non-parametric Estimation
Supervised Learning: Non-parametric Estimation Edmondo Trentin March 18, 2018 Non-parametric Estimates No assumptions are made on the form of the pdfs 1. There are 3 major instances of non-parametric estimates:
More informationThe prediction of house price
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationBoosting Methods: Why They Can Be Useful for High-Dimensional Data
New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,
More informationPDEEC Machine Learning 2016/17
PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 9: Basis Expansions Department of Statistics & Biostatistics Rutgers University Nov 01, 2011 Regression and Classification Linear Regression. E(Y X) = f(x) We want to learn
More informationOn Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong
On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality Weiqiang Dong 1 The goal of the work presented here is to illustrate that classification error responds to error in the target probability estimates
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationArtificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!
Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationEconometrics I. Lecture 10: Nonparametric Estimation with Kernels. Paul T. Scott NYU Stern. Fall 2018
Econometrics I Lecture 10: Nonparametric Estimation with Kernels Paul T. Scott NYU Stern Fall 2018 Paul T. Scott NYU Stern Econometrics I Fall 2018 1 / 12 Nonparametric Regression: Intuition Let s get
More informationPrediction & Feature Selection in GLM
Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationChart types and when to use them
APPENDIX A Chart types and when to use them Pie chart Figure illustration of pie chart 2.3 % 4.5 % Browser Usage for April 2012 18.3 % 38.3 % Internet Explorer Firefox Chrome Safari Opera 35.8 % Pie chart
More informationLinear Regression 1 / 25. Karl Stratos. June 18, 2018
Linear Regression Karl Stratos June 18, 2018 1 / 25 The Regression Problem Problem. Find a desired input-output mapping f : X R where the output is a real value. x = = y = 0.1 How much should I turn my
More informationIntroduction to Machine Learning
Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More information9 Classification. 9.1 Linear Classifiers
9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationDecision Tree Learning Lecture 2
Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over
More informationFrom statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu
From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More information10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification
10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene
More informationBias-Variance Tradeoff. David Dalpiaz STAT 430, Fall 2017
Bias-Variance Tradeoff David Dalpiaz STAT 430, Fall 2017 1 Announcements Homework 03 released Regrade policy Style policy? 2 Statistical Learning Supervised Learning Regression Parametric Non-Parametric
More informationDay 5: Generative models, structured classification
Day 5: Generative models, structured classification Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 22 June 2018 Linear regression
More informationData Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition
Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each
More informationSupervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing
Supervised Learning Unsupervised learning: To extract structure and postulate hypotheses about data generating process from observations x 1,...,x n. Visualize, summarize and compress data. We have seen
More informationA Bias Correction for the Minimum Error Rate in Cross-validation
A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.
More informationMachine Learning & SVM
Machine Learning & SVM Shannon "Information is any difference that makes a difference. Bateman " It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible
More informationBANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1
BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture
More informationNon-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets
Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan nan.zhou@jpmorgan.com The 4th Annual
More informationcxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c
Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationGeometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat
Geometric View of Machine Learning Nearest Neighbor Classification Slides adapted from Prof. Carpuat What we know so far Decision Trees What is a decision tree, and how to induce it from data Fundamental
More informationCPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016
CPSC 340: Machine Learning and Data Mining Gradient Descent Fall 2016 Admin Assignment 1: Marks up this weekend on UBC Connect. Assignment 2: 3 late days to hand it in Monday. Assignment 3: Due Wednesday
More informationDirect Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationBrief Introduction to Machine Learning
Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More informationProbability and Statistical Decision Theory
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Probability and Statistical Decision Theory Many slides attributable to: Erik Sudderth (UCI) Prof. Mike Hughes
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More information