Introduction to Machine Learning and Cross-Validation
|
|
- Millicent Ball
- 5 years ago
- Views:
Transcription
1 Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, / 29
2 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance Trade-off 4 Cross-Validation 5 Conclusion J.Hersh (Chapman ) Intro & CV February 27, / 29
3 Machine learning versus econometrics Machine Learning Developed to solve problems in computer science Econometrics Developed to solve problems in economics J.Hersh (Chapman ) Intro & CV February 27, / 29
4 Machine learning versus econometrics Machine Learning Developed to solve problems in computer science Prediction/classification Econometrics Developed to solve problems in economics Explicitly testing a theory J.Hersh (Chapman ) Intro & CV February 27, / 29
5 Machine learning versus econometrics Machine Learning Developed to solve problems in computer science Prediction/classification Want: goodness of fit Econometrics Developed to solve problems in economics Explicitly testing a theory Statistical significance more important than model fit J.Hersh (Chapman ) Intro & CV February 27, / 29
6 Machine learning versus econometrics Machine Learning Developed to solve problems in computer science Prediction/classification Want: goodness of fit Huge datasets (many terabytes), large # variables (1000s) Econometrics Developed to solve problems in economics Explicitly testing a theory Statistical significance more important than model fit Small datasets, few variables J.Hersh (Chapman ) Intro & CV February 27, / 29
7 Machine learning versus econometrics Machine Learning Developed to solve problems in computer science Prediction/classification Want: goodness of fit Huge datasets (many terabytes), large # variables (1000s) Whatever works Econometrics Developed to solve problems in economics Explicitly testing a theory Statistical significance more important than model fit Small datasets, few variables It works in practice, but what about in theory? J.Hersh (Chapman ) Intro & CV February 27, / 29
8 Machine learning versus econometrics Machine Learning Developed to solve problems in computer science Prediction/classification Want: goodness of fit Huge datasets (many terabytes), large # variables (1000s) Whatever works Econometrics Developed to solve problems in economics Explicitly testing a theory Statistical significance more important than model fit Small datasets, few variables It works in practice, but what about in theory? Can we utilize some of this machinery to solve problems in development economics? J.Hersh (Chapman ) Intro & CV February 27, / 29
9 Applications of Machine Learning in economics (due to Sendhil Mullainathan) 1. New data 2. Predictions for policy 3. Better econometrics See Machine learning: an applied econometric approach, JEP See Athey Beyond Prediction: Using Big Data for Policy Problems (2017) Science See Hersh, Jonathan, and Matthew Harding. "Big Data in economics." IZA World of Labor (2018) J.Hersh (Chapman ) Intro & CV February 27, / 29
10 This course Primer in statistical learning theory, which grew out of statistics How does this differ from ML? Machine learning places more emphasis on large scale applications and prediction accuracy. Statistical learning covers J.Hersh (Chapman ) Intro & CV February 27, / 29
11 This course Primer in statistical learning theory, which grew out of statistics How does this differ from ML? Machine learning places more emphasis on large scale applications and prediction accuracy. Statistical learning covers There is much overlap and cross-fertilization J.Hersh (Chapman ) Intro & CV February 27, / 29
12 This course Primer in statistical learning theory, which grew out of statistics How does this differ from ML? Machine learning places more emphasis on large scale applications and prediction accuracy. Statistical learning covers There is much overlap and cross-fertilization Very little coding, but example code provided: jonathan-hersh.com/machinelearningdev J.Hersh (Chapman ) Intro & CV February 27, / 29
13 Topics covered 1. Cross-validation [Chapter 2] J.Hersh (Chapman ) Intro & CV February 27, / 29
14 Topics covered 1. Cross-validation [Chapter 2] 2. Shrinkage methods (Ridge and LASSO) [Chapter 6] J.Hersh (Chapman ) Intro & CV February 27, / 29
15 Topics covered 1. Cross-validation [Chapter 2] 2. Shrinkage methods (Ridge and LASSO) [Chapter 6] 3. Classification [Chapter 4, APM Chapter 11-12] J.Hersh (Chapman ) Intro & CV February 27, / 29
16 Topics covered 1. Cross-validation [Chapter 2] 2. Shrinkage methods (Ridge and LASSO) [Chapter 6] 3. Classification [Chapter 4, APM Chapter 11-12] 4. Tree-based methods (Decision trees, bagging, random forest boosting) [Chapter 8] J.Hersh (Chapman ) Intro & CV February 27, / 29
17 Topics covered 1. Cross-validation [Chapter 2] 2. Shrinkage methods (Ridge and LASSO) [Chapter 6] 3. Classification [Chapter 4, APM Chapter 11-12] 4. Tree-based methods (Decision trees, bagging, random forest boosting) [Chapter 8] 5. Unsupervised learning (PCA, k-means clustering, hierarchical clustering) [Chapter 10] J.Hersh (Chapman ) Intro & CV February 27, / 29
18 Topics covered 1. Cross-validation [Chapter 2] 2. Shrinkage methods (Ridge and LASSO) [Chapter 6] 3. Classification [Chapter 4, APM Chapter 11-12] 4. Tree-based methods (Decision trees, bagging, random forest boosting) [Chapter 8] 5. Unsupervised learning (PCA, k-means clustering, hierarchical clustering) [Chapter 10] 6. Caret Automated Machine Learning [APM] J.Hersh (Chapman ) Intro & CV February 27, / 29
19 1 Introduction 2 Preliminary Terminology 3 Bias-Variance Trade-off 4 Cross-Validation 5 Conclusion J.Hersh (Chapman ) Intro & CV February 27, / 29
20 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance Trade-off 4 Cross-Validation 5 Conclusion J.Hersh (Chapman ) Intro & CV February 27, / 29
21 Supervised vs. unsupervised learning Def: Supervised learning for every x i we also observe a response y i J.Hersh (Chapman ) Intro & CV February 27, / 29
22 Supervised vs. unsupervised learning Def: Supervised learning for every x i we also observe a response y i Ex: Estimating housing values by OLS or random forest; J.Hersh (Chapman ) Intro & CV February 27, / 29
23 Supervised vs. unsupervised learning Def: Supervised learning for every x i we also observe a response y i Ex: Estimating housing values by OLS or random forest; Def: Unsupervised learning for each observation we only observe x i, but do not observe y i J.Hersh (Chapman ) Intro & CV February 27, / 29
24 Supervised vs. unsupervised learning Def: Supervised learning for every x i we also observe a response y i Ex: Estimating housing values by OLS or random forest; Def: Unsupervised learning for each observation we only observe x i, but do not observe y i Ex: Clustering customers into segments; using principle component analysis for dimension reduction J.Hersh (Chapman ) Intro & CV February 27, / 29
25 Test, Training, and Validation Set Training set: (observation-wise) subset of data used to develop models J.Hersh (Chapman ) Intro & CV February 27, / 29
26 Test, Training, and Validation Set Training set: (observation-wise) subset of data used to develop models Validation set: subset of data used during intermediate stages to tune model parameters J.Hersh (Chapman ) Intro & CV February 27, / 29
27 Test, Training, and Validation Set Training set: (observation-wise) subset of data used to develop models Validation set: subset of data used during intermediate stages to tune model parameters Test set: subset of data (used sparingly) to approximate out of sample fit J.Hersh (Chapman ) Intro & CV February 27, / 29
28 Assessing model accuracy Mean squared error measures how well model predictions match observed data MSE (ˆf (x)) = 1 N N y i i=1 }{{} data 2 ˆf (x i ) }{{} model J.Hersh (Chapman ) Intro & CV February 27, / 29
29 Assessing model accuracy Mean squared error measures how well model predictions match observed data MSE (ˆf (x)) = 1 N N y i i=1 }{{} data 2 ˆf (x i ) }{{} model Training MSE vs Test MSE: good in-sample fit (low training MSE) can often obscure poor out of sample fit (high test MSE) J.Hersh (Chapman ) Intro & CV February 27, / 29
30 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance Trade-off 4 Cross-Validation 5 Conclusion J.Hersh (Chapman ) Intro & CV February 27, / 29
31 Quick example on out-of-sample fit Let s compare three estimators to see how estimator complexity affects out of sample fit 1. f 1 = linear regression (in orange) 2. f 2 = third order polynomial (in black) 3. f 3 = very flexible smoothing spline (in green) J.Hersh (Chapman ) Intro & CV February 27, / 29
32 Case 1: True f (x) slightly complicated J.Hersh (Chapman ) Intro & CV February 27, / 29
33 Case 2: True f (x) not complicated at all J.Hersh (Chapman ) Intro & CV February 27, / 29
34 Case 2: True f (x) not complicated at all J.Hersh (Chapman ) Intro & CV February 27, / 29
35 Case 3: True f (x) very complicated J.Hersh (Chapman ) Intro & CV February 27, / 29
36 How to select right model complexity? J.Hersh (Chapman ) Intro & CV February 27, / 29
37 Generalizing this problem: Bias-Variance tradeoff J.Hersh (Chapman ) Intro & CV February 27, / 29
38 Bias Variance Tradeoff in Math Prediction Error: E y i }{{} data 2 ˆf (x i ) }{{} model
39 Bias Variance Tradeoff in Math [ ( ) ] 2 E y ˆf Prediction Error: E y i }{{} data [ ] = E y 2 + ˆf 2 2yˆf 2 ˆf (x i ) }{{} model (expanding terms)
40 Bias Variance Tradeoff in Math [ ( ) ] 2 E y ˆf Prediction Error: E y i }{{} data [ ] = E y 2 + ˆf 2 2yˆf = E [y 2] }{{} Var(y)+E[y] 2 2 ˆf (x i ) }{{} model (expanding terms) + E [ˆf 2] [ ] E 2yˆf }{{} Var(ˆf )+E[ˆf ] 2
41 Bias Variance Tradeoff in Math [ ( ) ] 2 E y ˆf Prediction Error: E y i }{{} data [ ] = E y 2 + ˆf 2 2yˆf = E [y 2] }{{} Var(y)+E[y] 2 2 ˆf (x i ) }{{} model (expanding terms) + E [ˆf 2] [ ] E 2yˆf }{{} Var(ˆf )+E[ˆf ] 2 = Var (y) + ) ] E [y] 2 2 [ ] + Var (ˆf + E [ˆf E 2yˆf }{{} y f (x)+ε, E[ε]=0 (def n var)
42 Bias Variance Tradeoff in Math [ ( ) ] 2 E y ˆf Prediction Error: E y i }{{} data [ ] = E y 2 + ˆf 2 2yˆf = E [y 2] }{{} Var(y)+E[y] 2 2 ˆf (x i ) }{{} model (expanding terms) + E [ˆf 2] [ ] E 2yˆf }{{} Var(ˆf )+E[ˆf ] 2 = Var (y) + ) ] E [y] 2 2 [ ] + Var (ˆf + E [ˆf E 2yˆf }{{} y f (x)+ε, E[ε]=0 (def n var) = Var (y) + Var ) ( ] 2 [ ] ) (ˆf + E [ˆf E 2yˆf + f 2 (def n y)
43 Bias Variance Tradeoff in Math ) ( ]) 2 = Var (y) + Var (ˆf + f E [ˆf }{{} =E[((y E[y]) 2 ]=E[((y f ) 2 ]=E[ε 2 ]=Var(ε)=σε 2 J.Hersh (Chapman ) Intro & CV February 27, / 29
44 Bias Variance Tradeoff in Math ) ( ]) 2 = Var (y) + Var (ˆf + f E [ˆf }{{} =E[((y E[y]) 2 ]=E[((y f ) 2 ]=E[ε 2 ]=Var(ε)=σε 2 Prediction Error [( )] E y i ˆf = σε 2 }{{} irreducible error ) + Var (ˆf }{{} variance ([ + ]]) 2 f E [ˆf } {{ } bias J.Hersh (Chapman ) Intro & CV February 27, / 29
45 Bias Variance Tradeoff in Math ) ( ]) 2 = Var (y) + Var (ˆf + f E [ˆf }{{} =E[((y E[y]) 2 ]=E[((y f ) 2 ]=E[ε 2 ]=Var(ε)=σε 2 Prediction Error [( )] E y i ˆf = σε 2 }{{} irreducible error ] Bias is minimized when f = E [ˆf ) + Var (ˆf }{{} variance ([ + ]]) 2 f E [ˆf } {{ } bias But total error (variance + bias) may be minimized by some other ˆf J.Hersh (Chapman ) Intro & CV February 27, / 29
46 Bias Variance Tradeoff in Math ) ( ]) 2 = Var (y) + Var (ˆf + f E [ˆf }{{} =E[((y E[y]) 2 ]=E[((y f ) 2 ]=E[ε 2 ]=Var(ε)=σε 2 Prediction Error [( )] E y i ˆf = σε 2 }{{} irreducible error ] Bias is minimized when f = E [ˆf ) + Var (ˆf }{{} variance ([ + ]]) 2 f E [ˆf } {{ } bias But total error (variance + bias) may be minimized by some other ˆf ˆf (x) with smaller variance fewer variables, smaller magnitude coefficients J.Hersh (Chapman ) Intro & CV February 27, / 29
47 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance Trade-off 4 Cross-Validation 5 Conclusion J.Hersh (Chapman ) Intro & CV February 27, / 29
48 Cross-Validation Cross-validation is a tool to approximate out of sample fit J.Hersh (Chapman ) Intro & CV February 27, / 29
49 Cross-Validation Cross-validation is a tool to approximate out of sample fit In machine learning, many models have parameters that must be tuned We adjust these parameters using using cross-validation J.Hersh (Chapman ) Intro & CV February 27, / 29
50 Cross-Validation Cross-validation is a tool to approximate out of sample fit In machine learning, many models have parameters that must be tuned We adjust these parameters using using cross-validation Also useful to select between large classes of models e.g random forest vs lasso J.Hersh (Chapman ) Intro & CV February 27, / 29
51 K-fold Cross-Validation (CV) K-fold CV Algorithm 1. Randomly divide the data into K equal sized parts or folds. J.Hersh (Chapman ) Intro & CV February 27, / 29
52 K-fold Cross-Validation (CV) K-fold CV Algorithm 1. Randomly divide the data into K equal sized parts or folds. 2. Leave out part k, fit the model to the other K 1 parts. J.Hersh (Chapman ) Intro & CV February 27, / 29
53 K-fold Cross-Validation (CV) K-fold CV Algorithm 1. Randomly divide the data into K equal sized parts or folds. 2. Leave out part k, fit the model to the other K 1 parts. 3. Use fitted model to obtain predictions for left-out k-th part J.Hersh (Chapman ) Intro & CV February 27, / 29
54 K-fold Cross-Validation (CV) K-fold CV Algorithm 1. Randomly divide the data into K equal sized parts or folds. 2. Leave out part k, fit the model to the other K 1 parts. 3. Use fitted model to obtain predictions for left-out k-th part 4. Repeat until k = 1,..., K and combine results J.Hersh (Chapman ) Intro & CV February 27, / 29
55 K-Fold CV Example 2
56 Details of K-Fold CV Let n k be the number of test observations in fold k, where n k = N/K Cross-Validation Error for fold k: CV {k} = K k=1 n k N MSE k where MSE k = i C k (y i ŷ) 2 /n k is the mean squared error of fold k J.Hersh (Chapman ) Intro & CV February 27, / 29
57 Details of K-Fold CV Let n k be the number of test observations in fold k, where n k = N/K Cross-Validation Error for fold k: CV {k} = K k=1 n k N MSE k where MSE k = i C k (y i ŷ) 2 /n k is the mean squared error of fold k Setting k = N is referred to as leave-one out cross-validation (LOOCV) J.Hersh (Chapman ) Intro & CV February 27, / 29
58 Classical frequentist model selection Akaike information criterion (AIC) AIC = 2 N loglik + 2 d N where d is the number of parameters in our model Bayesian information criterion (BIC) BIC = 2 loglik + (log N) d J.Hersh (Chapman ) Intro & CV February 27, / 29
59 Classical frequentist model selection Akaike information criterion (AIC) AIC = 2 N loglik + 2 d N where d is the number of parameters in our model Bayesian information criterion (BIC) BIC = 2 loglik + (log N) d Both penalize models with more parameters in a somewhat arbitrary fashion This usually helps with model selection, but still does not answer the important question of model assessment J.Hersh (Chapman ) Intro & CV February 27, / 29
60 What is the Optimal Number of Folds, K? Do you want big or a small training folds? J.Hersh (Chapman ) Intro & CV February 27, / 29
61 What is the Optimal Number of Folds, K? Do you want big or a small training folds? Because training set is only (K 1)/K as big as the full dataset, the estimates of the prediction error will be biased upward. Bias is minimized when K = N (LOOCV) But LOOCV has higher variance! J.Hersh (Chapman ) Intro & CV February 27, / 29
62 What is the Optimal Number of Folds, K? Do you want big or a small training folds? Because training set is only (K 1)/K as big as the full dataset, the estimates of the prediction error will be biased upward. Bias is minimized when K = N (LOOCV) But LOOCV has higher variance! No clear statistical rules for how to set k Convention is to set K = 5 or 10 in practice is a good trade-off between bias and variance for most problems J.Hersh (Chapman ) Intro & CV February 27, / 29
63 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance Trade-off 4 Cross-Validation 5 Conclusion J.Hersh (Chapman ) Intro & CV February 27, / 29
64 Conclusion Econometric models are at risk for overfitting But what we want are theories that extend beyond the dataset we have on our computer Cross-validation is a key tool that allows us to adjust models so that they more closely match reality J.Hersh (Chapman ) Intro & CV February 27, / 29
MS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationPDEEC Machine Learning 2016/17
PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /
More informationBig Data, Machine Learning, and Causal Inference
Big Data, Machine Learning, and Causal Inference I C T 4 E v a l I n t e r n a t i o n a l C o n f e r e n c e I F A D H e a d q u a r t e r s, R o m e, I t a l y P a u l J a s p e r p a u l. j a s p e
More informationBias-Variance Tradeoff. David Dalpiaz STAT 430, Fall 2017
Bias-Variance Tradeoff David Dalpiaz STAT 430, Fall 2017 1 Announcements Homework 03 released Regrade policy Style policy? 2 Statistical Learning Supervised Learning Regression Parametric Non-Parametric
More informationChapter 7: Model Assessment and Selection
Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationChart types and when to use them
APPENDIX A Chart types and when to use them Pie chart Figure illustration of pie chart 2.3 % 4.5 % Browser Usage for April 2012 18.3 % 38.3 % Internet Explorer Firefox Chrome Safari Opera 35.8 % Pie chart
More information9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients
What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationBiostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences
Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationTerminology for Statistical Data
Terminology for Statistical Data variables - features - attributes observations - cases (consist of multiple values) In a standard data matrix, variables or features correspond to columns observations
More informationThe prediction of house price
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationDecision Tree Learning Lecture 2
Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over
More informationLinear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.
Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. 1/48 Linear regression Linear regression is a simple approach
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationcxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c
Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is
More informationMachine Learning for Microeconometrics
Machine Learning for Microeconometrics A. Colin Cameron Univ. of California- Davis Abstract: These slides attempt to explain machine learning to empirical economists familiar with regression methods. The
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationMaking Our Cities Safer: A Study In Neighbhorhood Crime Patterns
Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence
More informationSCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.
SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationHigh-Dimensional Statistical Learning: Introduction
Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationReminders. Thought questions should be submitted on eclass. Please list the section related to the thought question
Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a
More informationMinimum Description Length (MDL)
Minimum Description Length (MDL) Lyle Ungar AIC Akaike Information Criterion BIC Bayesian Information Criterion RIC Risk Inflation Criterion MDL u Sender and receiver both know X u Want to send y using
More informationMachine Learning Basics: Estimators, Bias and Variance
Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationOn Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong
On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality Weiqiang Dong 1 The goal of the work presented here is to illustrate that classification error responds to error in the target probability estimates
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationFrom statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu
From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom
More informationCourse in Data Science
Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same
More informationCSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015
CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationStatistical Machine Learning I
Statistical Machine Learning I International Undergraduate Summer Enrichment Program (IUSEP) Linglong Kong Department of Mathematical and Statistical Sciences University of Alberta July 18, 2016 Linglong
More informationFinal Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58
Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationThe Supervised Learning Approach To Estimating Heterogeneous Causal Regime Effects
The Supervised Learning Approach To Estimating Heterogeneous Causal Regime Effects Thai T. Pham Stanford Graduate School of Business thaipham@stanford.edu May, 2016 Introduction Observations Many sequential
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationStatistical aspects of prediction models with high-dimensional data
Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationChapter 6. Ensemble Methods
Chapter 6. Ensemble Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Introduction
More informationLinear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman
Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical
More informationStatistics 262: Intermediate Biostatistics Model selection
Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.
More informationMachine learning - HT Basis Expansion, Regularization, Validation
Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationSTAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă
STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationEssence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io
Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io 1 Examples https://www.youtube.com/watch?v=bmka1zsg2 P4 http://www.r2d3.us/visual-intro-to-machinelearning-part-1/
More informationNon-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets
Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan nan.zhou@jpmorgan.com The 4th Annual
More informationMachine Learning. Ensemble Methods. Manfred Huber
Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationCharles H. Dyson School of Applied Economics and Management Cornell University, Ithaca, New York USA
WP 2017-15 October 2017 Working Paper Charles H. Dyson School of Applied Economics and Management Cornell University, Ithaca, New York 14853-7801 USA Can Machine Learning Improve Prediction? An Application
More informationINTRODUCTION TO DATA SCIENCE
INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #13 3/9/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Mini-Project #1 is due Saturday night (3/11): Seems like people are able to do
More informationAssignment 3. Introduction to Machine Learning Prof. B. Ravindran
Assignment 3 Introduction to Machine Learning Prof. B. Ravindran 1. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively
More informationMethods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationPrediction & Feature Selection in GLM
Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis
More informationLearning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I
Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2
More information12 Statistical Justifications; the Bias-Variance Decomposition
Statistical Justifications; the Bias-Variance Decomposition 65 12 Statistical Justifications; the Bias-Variance Decomposition STATISTICAL JUSTIFICATIONS FOR REGRESSION [So far, I ve talked about regression
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationEnsemble Methods and Random Forests
Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization
More informationLogitBoost with Trees Applied to the WCCI 2006 Performance Prediction Challenge Datasets
Applied to the WCCI 2006 Performance Prediction Challenge Datasets Roman Lutz Seminar für Statistik, ETH Zürich, Switzerland 18. July 2006 Overview Motivation 1 Motivation 2 The logistic framework The
More informationNonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More informationMachine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Regression basics Linear regression, non-linear features (polynomial, RBFs, piece-wise), regularization, cross validation, Ridge/Lasso, kernel trick Marc Toussaint University of Stuttgart
More informationMultiple (non) linear regression. Department of Computer Science, Czech Technical University in Prague
Multiple (non) linear regression Jiří Kléma Department of Computer Science, Czech Technical University in Prague Lecture based on ISLR book and its accompanying slides http://cw.felk.cvut.cz/wiki/courses/b4m36san/start
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More information