Introduction to Machine Learning and Cross-Validation

Size: px
Start display at page:

Download "Introduction to Machine Learning and Cross-Validation"

Transcription

1 Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, / 29

2 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance Trade-off 4 Cross-Validation 5 Conclusion J.Hersh (Chapman ) Intro & CV February 27, / 29

3 Machine learning versus econometrics Machine Learning Developed to solve problems in computer science Econometrics Developed to solve problems in economics J.Hersh (Chapman ) Intro & CV February 27, / 29

4 Machine learning versus econometrics Machine Learning Developed to solve problems in computer science Prediction/classification Econometrics Developed to solve problems in economics Explicitly testing a theory J.Hersh (Chapman ) Intro & CV February 27, / 29

5 Machine learning versus econometrics Machine Learning Developed to solve problems in computer science Prediction/classification Want: goodness of fit Econometrics Developed to solve problems in economics Explicitly testing a theory Statistical significance more important than model fit J.Hersh (Chapman ) Intro & CV February 27, / 29

6 Machine learning versus econometrics Machine Learning Developed to solve problems in computer science Prediction/classification Want: goodness of fit Huge datasets (many terabytes), large # variables (1000s) Econometrics Developed to solve problems in economics Explicitly testing a theory Statistical significance more important than model fit Small datasets, few variables J.Hersh (Chapman ) Intro & CV February 27, / 29

7 Machine learning versus econometrics Machine Learning Developed to solve problems in computer science Prediction/classification Want: goodness of fit Huge datasets (many terabytes), large # variables (1000s) Whatever works Econometrics Developed to solve problems in economics Explicitly testing a theory Statistical significance more important than model fit Small datasets, few variables It works in practice, but what about in theory? J.Hersh (Chapman ) Intro & CV February 27, / 29

8 Machine learning versus econometrics Machine Learning Developed to solve problems in computer science Prediction/classification Want: goodness of fit Huge datasets (many terabytes), large # variables (1000s) Whatever works Econometrics Developed to solve problems in economics Explicitly testing a theory Statistical significance more important than model fit Small datasets, few variables It works in practice, but what about in theory? Can we utilize some of this machinery to solve problems in development economics? J.Hersh (Chapman ) Intro & CV February 27, / 29

9 Applications of Machine Learning in economics (due to Sendhil Mullainathan) 1. New data 2. Predictions for policy 3. Better econometrics See Machine learning: an applied econometric approach, JEP See Athey Beyond Prediction: Using Big Data for Policy Problems (2017) Science See Hersh, Jonathan, and Matthew Harding. "Big Data in economics." IZA World of Labor (2018) J.Hersh (Chapman ) Intro & CV February 27, / 29

10 This course Primer in statistical learning theory, which grew out of statistics How does this differ from ML? Machine learning places more emphasis on large scale applications and prediction accuracy. Statistical learning covers J.Hersh (Chapman ) Intro & CV February 27, / 29

11 This course Primer in statistical learning theory, which grew out of statistics How does this differ from ML? Machine learning places more emphasis on large scale applications and prediction accuracy. Statistical learning covers There is much overlap and cross-fertilization J.Hersh (Chapman ) Intro & CV February 27, / 29

12 This course Primer in statistical learning theory, which grew out of statistics How does this differ from ML? Machine learning places more emphasis on large scale applications and prediction accuracy. Statistical learning covers There is much overlap and cross-fertilization Very little coding, but example code provided: jonathan-hersh.com/machinelearningdev J.Hersh (Chapman ) Intro & CV February 27, / 29

13 Topics covered 1. Cross-validation [Chapter 2] J.Hersh (Chapman ) Intro & CV February 27, / 29

14 Topics covered 1. Cross-validation [Chapter 2] 2. Shrinkage methods (Ridge and LASSO) [Chapter 6] J.Hersh (Chapman ) Intro & CV February 27, / 29

15 Topics covered 1. Cross-validation [Chapter 2] 2. Shrinkage methods (Ridge and LASSO) [Chapter 6] 3. Classification [Chapter 4, APM Chapter 11-12] J.Hersh (Chapman ) Intro & CV February 27, / 29

16 Topics covered 1. Cross-validation [Chapter 2] 2. Shrinkage methods (Ridge and LASSO) [Chapter 6] 3. Classification [Chapter 4, APM Chapter 11-12] 4. Tree-based methods (Decision trees, bagging, random forest boosting) [Chapter 8] J.Hersh (Chapman ) Intro & CV February 27, / 29

17 Topics covered 1. Cross-validation [Chapter 2] 2. Shrinkage methods (Ridge and LASSO) [Chapter 6] 3. Classification [Chapter 4, APM Chapter 11-12] 4. Tree-based methods (Decision trees, bagging, random forest boosting) [Chapter 8] 5. Unsupervised learning (PCA, k-means clustering, hierarchical clustering) [Chapter 10] J.Hersh (Chapman ) Intro & CV February 27, / 29

18 Topics covered 1. Cross-validation [Chapter 2] 2. Shrinkage methods (Ridge and LASSO) [Chapter 6] 3. Classification [Chapter 4, APM Chapter 11-12] 4. Tree-based methods (Decision trees, bagging, random forest boosting) [Chapter 8] 5. Unsupervised learning (PCA, k-means clustering, hierarchical clustering) [Chapter 10] 6. Caret Automated Machine Learning [APM] J.Hersh (Chapman ) Intro & CV February 27, / 29

19 1 Introduction 2 Preliminary Terminology 3 Bias-Variance Trade-off 4 Cross-Validation 5 Conclusion J.Hersh (Chapman ) Intro & CV February 27, / 29

20 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance Trade-off 4 Cross-Validation 5 Conclusion J.Hersh (Chapman ) Intro & CV February 27, / 29

21 Supervised vs. unsupervised learning Def: Supervised learning for every x i we also observe a response y i J.Hersh (Chapman ) Intro & CV February 27, / 29

22 Supervised vs. unsupervised learning Def: Supervised learning for every x i we also observe a response y i Ex: Estimating housing values by OLS or random forest; J.Hersh (Chapman ) Intro & CV February 27, / 29

23 Supervised vs. unsupervised learning Def: Supervised learning for every x i we also observe a response y i Ex: Estimating housing values by OLS or random forest; Def: Unsupervised learning for each observation we only observe x i, but do not observe y i J.Hersh (Chapman ) Intro & CV February 27, / 29

24 Supervised vs. unsupervised learning Def: Supervised learning for every x i we also observe a response y i Ex: Estimating housing values by OLS or random forest; Def: Unsupervised learning for each observation we only observe x i, but do not observe y i Ex: Clustering customers into segments; using principle component analysis for dimension reduction J.Hersh (Chapman ) Intro & CV February 27, / 29

25 Test, Training, and Validation Set Training set: (observation-wise) subset of data used to develop models J.Hersh (Chapman ) Intro & CV February 27, / 29

26 Test, Training, and Validation Set Training set: (observation-wise) subset of data used to develop models Validation set: subset of data used during intermediate stages to tune model parameters J.Hersh (Chapman ) Intro & CV February 27, / 29

27 Test, Training, and Validation Set Training set: (observation-wise) subset of data used to develop models Validation set: subset of data used during intermediate stages to tune model parameters Test set: subset of data (used sparingly) to approximate out of sample fit J.Hersh (Chapman ) Intro & CV February 27, / 29

28 Assessing model accuracy Mean squared error measures how well model predictions match observed data MSE (ˆf (x)) = 1 N N y i i=1 }{{} data 2 ˆf (x i ) }{{} model J.Hersh (Chapman ) Intro & CV February 27, / 29

29 Assessing model accuracy Mean squared error measures how well model predictions match observed data MSE (ˆf (x)) = 1 N N y i i=1 }{{} data 2 ˆf (x i ) }{{} model Training MSE vs Test MSE: good in-sample fit (low training MSE) can often obscure poor out of sample fit (high test MSE) J.Hersh (Chapman ) Intro & CV February 27, / 29

30 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance Trade-off 4 Cross-Validation 5 Conclusion J.Hersh (Chapman ) Intro & CV February 27, / 29

31 Quick example on out-of-sample fit Let s compare three estimators to see how estimator complexity affects out of sample fit 1. f 1 = linear regression (in orange) 2. f 2 = third order polynomial (in black) 3. f 3 = very flexible smoothing spline (in green) J.Hersh (Chapman ) Intro & CV February 27, / 29

32 Case 1: True f (x) slightly complicated J.Hersh (Chapman ) Intro & CV February 27, / 29

33 Case 2: True f (x) not complicated at all J.Hersh (Chapman ) Intro & CV February 27, / 29

34 Case 2: True f (x) not complicated at all J.Hersh (Chapman ) Intro & CV February 27, / 29

35 Case 3: True f (x) very complicated J.Hersh (Chapman ) Intro & CV February 27, / 29

36 How to select right model complexity? J.Hersh (Chapman ) Intro & CV February 27, / 29

37 Generalizing this problem: Bias-Variance tradeoff J.Hersh (Chapman ) Intro & CV February 27, / 29

38 Bias Variance Tradeoff in Math Prediction Error: E y i }{{} data 2 ˆf (x i ) }{{} model

39 Bias Variance Tradeoff in Math [ ( ) ] 2 E y ˆf Prediction Error: E y i }{{} data [ ] = E y 2 + ˆf 2 2yˆf 2 ˆf (x i ) }{{} model (expanding terms)

40 Bias Variance Tradeoff in Math [ ( ) ] 2 E y ˆf Prediction Error: E y i }{{} data [ ] = E y 2 + ˆf 2 2yˆf = E [y 2] }{{} Var(y)+E[y] 2 2 ˆf (x i ) }{{} model (expanding terms) + E [ˆf 2] [ ] E 2yˆf }{{} Var(ˆf )+E[ˆf ] 2

41 Bias Variance Tradeoff in Math [ ( ) ] 2 E y ˆf Prediction Error: E y i }{{} data [ ] = E y 2 + ˆf 2 2yˆf = E [y 2] }{{} Var(y)+E[y] 2 2 ˆf (x i ) }{{} model (expanding terms) + E [ˆf 2] [ ] E 2yˆf }{{} Var(ˆf )+E[ˆf ] 2 = Var (y) + ) ] E [y] 2 2 [ ] + Var (ˆf + E [ˆf E 2yˆf }{{} y f (x)+ε, E[ε]=0 (def n var)

42 Bias Variance Tradeoff in Math [ ( ) ] 2 E y ˆf Prediction Error: E y i }{{} data [ ] = E y 2 + ˆf 2 2yˆf = E [y 2] }{{} Var(y)+E[y] 2 2 ˆf (x i ) }{{} model (expanding terms) + E [ˆf 2] [ ] E 2yˆf }{{} Var(ˆf )+E[ˆf ] 2 = Var (y) + ) ] E [y] 2 2 [ ] + Var (ˆf + E [ˆf E 2yˆf }{{} y f (x)+ε, E[ε]=0 (def n var) = Var (y) + Var ) ( ] 2 [ ] ) (ˆf + E [ˆf E 2yˆf + f 2 (def n y)

43 Bias Variance Tradeoff in Math ) ( ]) 2 = Var (y) + Var (ˆf + f E [ˆf }{{} =E[((y E[y]) 2 ]=E[((y f ) 2 ]=E[ε 2 ]=Var(ε)=σε 2 J.Hersh (Chapman ) Intro & CV February 27, / 29

44 Bias Variance Tradeoff in Math ) ( ]) 2 = Var (y) + Var (ˆf + f E [ˆf }{{} =E[((y E[y]) 2 ]=E[((y f ) 2 ]=E[ε 2 ]=Var(ε)=σε 2 Prediction Error [( )] E y i ˆf = σε 2 }{{} irreducible error ) + Var (ˆf }{{} variance ([ + ]]) 2 f E [ˆf } {{ } bias J.Hersh (Chapman ) Intro & CV February 27, / 29

45 Bias Variance Tradeoff in Math ) ( ]) 2 = Var (y) + Var (ˆf + f E [ˆf }{{} =E[((y E[y]) 2 ]=E[((y f ) 2 ]=E[ε 2 ]=Var(ε)=σε 2 Prediction Error [( )] E y i ˆf = σε 2 }{{} irreducible error ] Bias is minimized when f = E [ˆf ) + Var (ˆf }{{} variance ([ + ]]) 2 f E [ˆf } {{ } bias But total error (variance + bias) may be minimized by some other ˆf J.Hersh (Chapman ) Intro & CV February 27, / 29

46 Bias Variance Tradeoff in Math ) ( ]) 2 = Var (y) + Var (ˆf + f E [ˆf }{{} =E[((y E[y]) 2 ]=E[((y f ) 2 ]=E[ε 2 ]=Var(ε)=σε 2 Prediction Error [( )] E y i ˆf = σε 2 }{{} irreducible error ] Bias is minimized when f = E [ˆf ) + Var (ˆf }{{} variance ([ + ]]) 2 f E [ˆf } {{ } bias But total error (variance + bias) may be minimized by some other ˆf ˆf (x) with smaller variance fewer variables, smaller magnitude coefficients J.Hersh (Chapman ) Intro & CV February 27, / 29

47 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance Trade-off 4 Cross-Validation 5 Conclusion J.Hersh (Chapman ) Intro & CV February 27, / 29

48 Cross-Validation Cross-validation is a tool to approximate out of sample fit J.Hersh (Chapman ) Intro & CV February 27, / 29

49 Cross-Validation Cross-validation is a tool to approximate out of sample fit In machine learning, many models have parameters that must be tuned We adjust these parameters using using cross-validation J.Hersh (Chapman ) Intro & CV February 27, / 29

50 Cross-Validation Cross-validation is a tool to approximate out of sample fit In machine learning, many models have parameters that must be tuned We adjust these parameters using using cross-validation Also useful to select between large classes of models e.g random forest vs lasso J.Hersh (Chapman ) Intro & CV February 27, / 29

51 K-fold Cross-Validation (CV) K-fold CV Algorithm 1. Randomly divide the data into K equal sized parts or folds. J.Hersh (Chapman ) Intro & CV February 27, / 29

52 K-fold Cross-Validation (CV) K-fold CV Algorithm 1. Randomly divide the data into K equal sized parts or folds. 2. Leave out part k, fit the model to the other K 1 parts. J.Hersh (Chapman ) Intro & CV February 27, / 29

53 K-fold Cross-Validation (CV) K-fold CV Algorithm 1. Randomly divide the data into K equal sized parts or folds. 2. Leave out part k, fit the model to the other K 1 parts. 3. Use fitted model to obtain predictions for left-out k-th part J.Hersh (Chapman ) Intro & CV February 27, / 29

54 K-fold Cross-Validation (CV) K-fold CV Algorithm 1. Randomly divide the data into K equal sized parts or folds. 2. Leave out part k, fit the model to the other K 1 parts. 3. Use fitted model to obtain predictions for left-out k-th part 4. Repeat until k = 1,..., K and combine results J.Hersh (Chapman ) Intro & CV February 27, / 29

55 K-Fold CV Example 2

56 Details of K-Fold CV Let n k be the number of test observations in fold k, where n k = N/K Cross-Validation Error for fold k: CV {k} = K k=1 n k N MSE k where MSE k = i C k (y i ŷ) 2 /n k is the mean squared error of fold k J.Hersh (Chapman ) Intro & CV February 27, / 29

57 Details of K-Fold CV Let n k be the number of test observations in fold k, where n k = N/K Cross-Validation Error for fold k: CV {k} = K k=1 n k N MSE k where MSE k = i C k (y i ŷ) 2 /n k is the mean squared error of fold k Setting k = N is referred to as leave-one out cross-validation (LOOCV) J.Hersh (Chapman ) Intro & CV February 27, / 29

58 Classical frequentist model selection Akaike information criterion (AIC) AIC = 2 N loglik + 2 d N where d is the number of parameters in our model Bayesian information criterion (BIC) BIC = 2 loglik + (log N) d J.Hersh (Chapman ) Intro & CV February 27, / 29

59 Classical frequentist model selection Akaike information criterion (AIC) AIC = 2 N loglik + 2 d N where d is the number of parameters in our model Bayesian information criterion (BIC) BIC = 2 loglik + (log N) d Both penalize models with more parameters in a somewhat arbitrary fashion This usually helps with model selection, but still does not answer the important question of model assessment J.Hersh (Chapman ) Intro & CV February 27, / 29

60 What is the Optimal Number of Folds, K? Do you want big or a small training folds? J.Hersh (Chapman ) Intro & CV February 27, / 29

61 What is the Optimal Number of Folds, K? Do you want big or a small training folds? Because training set is only (K 1)/K as big as the full dataset, the estimates of the prediction error will be biased upward. Bias is minimized when K = N (LOOCV) But LOOCV has higher variance! J.Hersh (Chapman ) Intro & CV February 27, / 29

62 What is the Optimal Number of Folds, K? Do you want big or a small training folds? Because training set is only (K 1)/K as big as the full dataset, the estimates of the prediction error will be biased upward. Bias is minimized when K = N (LOOCV) But LOOCV has higher variance! No clear statistical rules for how to set k Convention is to set K = 5 or 10 in practice is a good trade-off between bias and variance for most problems J.Hersh (Chapman ) Intro & CV February 27, / 29

63 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance Trade-off 4 Cross-Validation 5 Conclusion J.Hersh (Chapman ) Intro & CV February 27, / 29

64 Conclusion Econometric models are at risk for overfitting But what we want are theories that extend beyond the dataset we have on our computer Cross-validation is a key tool that allows us to adjust models so that they more closely match reality J.Hersh (Chapman ) Intro & CV February 27, / 29

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

PDEEC Machine Learning 2016/17

PDEEC Machine Learning 2016/17 PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /

More information

Big Data, Machine Learning, and Causal Inference

Big Data, Machine Learning, and Causal Inference Big Data, Machine Learning, and Causal Inference I C T 4 E v a l I n t e r n a t i o n a l C o n f e r e n c e I F A D H e a d q u a r t e r s, R o m e, I t a l y P a u l J a s p e r p a u l. j a s p e

More information

Bias-Variance Tradeoff. David Dalpiaz STAT 430, Fall 2017

Bias-Variance Tradeoff. David Dalpiaz STAT 430, Fall 2017 Bias-Variance Tradeoff David Dalpiaz STAT 430, Fall 2017 1 Announcements Homework 03 released Regrade policy Style policy? 2 Statistical Learning Supervised Learning Regression Parametric Non-Parametric

More information

Chapter 7: Model Assessment and Selection

Chapter 7: Model Assessment and Selection Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Chart types and when to use them

Chart types and when to use them APPENDIX A Chart types and when to use them Pie chart Figure illustration of pie chart 2.3 % 4.5 % Browser Usage for April 2012 18.3 % 38.3 % Internet Explorer Firefox Chrome Safari Opera 35.8 % Pie chart

More information

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

Terminology for Statistical Data

Terminology for Statistical Data Terminology for Statistical Data variables - features - attributes observations - cases (consist of multiple values) In a standard data matrix, variables or features correspond to columns observations

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Decision Tree Learning Lecture 2

Decision Tree Learning Lecture 2 Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over

More information

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. 1/48 Linear regression Linear regression is a simple approach

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is

More information

Machine Learning for Microeconometrics

Machine Learning for Microeconometrics Machine Learning for Microeconometrics A. Colin Cameron Univ. of California- Davis Abstract: These slides attempt to explain machine learning to empirical economists familiar with regression methods. The

More information

Variance Reduction and Ensemble Methods

Variance Reduction and Ensemble Methods Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis

More information

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

High-Dimensional Statistical Learning: Introduction

High-Dimensional Statistical Learning: Introduction Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

Minimum Description Length (MDL)

Minimum Description Length (MDL) Minimum Description Length (MDL) Lyle Ungar AIC Akaike Information Criterion BIC Bayesian Information Criterion RIC Risk Inflation Criterion MDL u Sender and receiver both know X u Want to send y using

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong

On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality Weiqiang Dong 1 The goal of the work presented here is to illustrate that classification error responds to error in the target probability estimates

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom

More information

Course in Data Science

Course in Data Science Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same

More information

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Statistical Machine Learning I

Statistical Machine Learning I Statistical Machine Learning I International Undergraduate Summer Enrichment Program (IUSEP) Linglong Kong Department of Mathematical and Statistical Sciences University of Alberta July 18, 2016 Linglong

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

The Supervised Learning Approach To Estimating Heterogeneous Causal Regime Effects

The Supervised Learning Approach To Estimating Heterogeneous Causal Regime Effects The Supervised Learning Approach To Estimating Heterogeneous Causal Regime Effects Thai T. Pham Stanford Graduate School of Business thaipham@stanford.edu May, 2016 Introduction Observations Many sequential

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Statistical aspects of prediction models with high-dimensional data

Statistical aspects of prediction models with high-dimensional data Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Chapter 6. Ensemble Methods

Chapter 6. Ensemble Methods Chapter 6. Ensemble Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Introduction

More information

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical

More information

Statistics 262: Intermediate Biostatistics Model selection

Statistics 262: Intermediate Biostatistics Model selection Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.

More information

Machine learning - HT Basis Expansion, Regularization, Validation

Machine learning - HT Basis Expansion, Regularization, Validation Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io

Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io 1 Examples https://www.youtube.com/watch?v=bmka1zsg2 P4 http://www.r2d3.us/visual-intro-to-machinelearning-part-1/

More information

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan nan.zhou@jpmorgan.com The 4th Annual

More information

Machine Learning. Ensemble Methods. Manfred Huber

Machine Learning. Ensemble Methods. Manfred Huber Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Charles H. Dyson School of Applied Economics and Management Cornell University, Ithaca, New York USA

Charles H. Dyson School of Applied Economics and Management Cornell University, Ithaca, New York USA WP 2017-15 October 2017 Working Paper Charles H. Dyson School of Applied Economics and Management Cornell University, Ithaca, New York 14853-7801 USA Can Machine Learning Improve Prediction? An Application

More information

INTRODUCTION TO DATA SCIENCE

INTRODUCTION TO DATA SCIENCE INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #13 3/9/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Mini-Project #1 is due Saturday night (3/11): Seems like people are able to do

More information

Assignment 3. Introduction to Machine Learning Prof. B. Ravindran

Assignment 3. Introduction to Machine Learning Prof. B. Ravindran Assignment 3 Introduction to Machine Learning Prof. B. Ravindran 1. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively

More information

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2

More information

12 Statistical Justifications; the Bias-Variance Decomposition

12 Statistical Justifications; the Bias-Variance Decomposition Statistical Justifications; the Bias-Variance Decomposition 65 12 Statistical Justifications; the Bias-Variance Decomposition STATISTICAL JUSTIFICATIONS FOR REGRESSION [So far, I ve talked about regression

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Ensemble Methods and Random Forests

Ensemble Methods and Random Forests Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization

More information

LogitBoost with Trees Applied to the WCCI 2006 Performance Prediction Challenge Datasets

LogitBoost with Trees Applied to the WCCI 2006 Performance Prediction Challenge Datasets Applied to the WCCI 2006 Performance Prediction Challenge Datasets Roman Lutz Seminar für Statistik, ETH Zürich, Switzerland 18. July 2006 Overview Motivation 1 Motivation 2 The logistic framework The

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Machine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015

Machine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015 Machine Learning Regression basics Linear regression, non-linear features (polynomial, RBFs, piece-wise), regularization, cross validation, Ridge/Lasso, kernel trick Marc Toussaint University of Stuttgart

More information

Multiple (non) linear regression. Department of Computer Science, Czech Technical University in Prague

Multiple (non) linear regression. Department of Computer Science, Czech Technical University in Prague Multiple (non) linear regression Jiří Kléma Department of Computer Science, Czech Technical University in Prague Lecture based on ISLR book and its accompanying slides http://cw.felk.cvut.cz/wiki/courses/b4m36san/start

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

VBM683 Machine Learning

VBM683 Machine Learning VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information