Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets

Size: px
Start display at page:

Download "Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets"

Transcription

1 Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan The 4th Annual Modeling High Frequency Data in Finance Conference July 2012

2 Outline Outline 1 Overview 2 Data and Descriptive Analysis 3 (SKIPPED) Linear Model: Variable Selection by LASSO 4 MARS: Multivariate Adaptive Regression Splines 5 GBM: Generalized Boosted Models 6 More Results based on SPY 2011 December Level-1 Data 7 Discussions

3 Disclosures This talk is for informational purposes only. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase & Co., its subsidiaries and affiliates. Illustrations and discussions of predictive models based on High Frequency Financial Data Nonlinear (βx β i f(x, θ i )); Supervised (fixed holding period) Simple trading strategies based on high frequency predictions References (Supervised vs. Unsupervised): Creamer, G. (2012). Model calibration and automated trading agent for Euro futures. Quantitative Finance, Vol. 12, No. 4, April 2012, Moody, J. and Saffell, M. (2001), Learning to trade via direct reinforcement., IEEE Trans. Neural Networks, 2001, 12(4),

4 Overview Level-1 S&P500 Emini Future data Series Date Type Price Size 1 7/18/2012 9:00:00 AM ASK /18/2012 9:00:00 AM BID /18/2012 9:00:00 AM BID /18/2012 9:00:00 AM BID /18/2012 9:00:00 AM ASK /18/2012 9:00:00 AM ASK /18/2012 1:59:59 PM ASK /18/2012 1:59:59 PM BID /18/2012 1:59:59 PM BID /18/2012 2:00:00 PM BID /18/2012 2:00:00 PM BID /18/2012 2:00:00 PM ASK

5 Overview Target: Predicting the price movement in the next H seconds using historical information. Supervised Learning: Regression, Neural Network, Boosting, SVM, etc; Unsupervised Learning: Clustering, Graphical Model (Network), Reinforcement Learning, etc.

6 Overview Inputs & Outputs: r h t l = S t l S t l h, S t = Outputs (forward absolute returns): fr H t = S t+h S t (e.g. H = 300 s) Inputs (absolute returns): Existing variables: Group1: {r 1 t, rh/2 t, r H t } Group2: {r H/m bidt + askt 2 t }, m = 1, 2, 5, 10, 20, 50, 100, 300 }, m = 1,..., H Group3: {r H/m t Technical indicators: RSI, MACD, Bollinger Band, etc; potential variables: size, volatility, economic data, other stock/index prices, expert s analysis, etc.

7 Overview A Simple High Frequency Trading Strategy: ( ) H Predictions at time t: fr t = S t+h S t = f {rt l} h Entry signals: Open a market buy order at time t, if fr H t Open a market sell order at time t, if fr H t No action, otherwise. cost; -cost; Exit signals: Liquidate the position (opened at time t) using market order at time t+h.

8 Data and Descriptive Analysis Outline 1 Overview 2 Data and Descriptive Analysis 3 (SKIPPED) Linear Model: Variable Selection by LASSO 4 MARS: Multivariate Adaptive Regression Splines 5 GBM: Generalized Boosted Models 6 More Results based on SPY 2011 December Level-1 Data 7 Discussions

9 Data and Descriptive Analysis Time Series:

10 Data and Descriptive Analysis Distributions of Different Returns:

11 Data and Descriptive Analysis Scatter Plots: Y t = S t+300 S t v.s. r h t = S t S t h

12 (SKIPPED) Linear Model: Variable Selection by LASSO Outline 1 Overview 2 Data and Descriptive Analysis 3 (SKIPPED) Linear Model: Variable Selection by LASSO 4 MARS: Multivariate Adaptive Regression Splines 5 GBM: Generalized Boosted Models 6 More Results based on SPY 2011 December Level-1 Data 7 Discussions

13 (SKIPPED) Linear Model: Variable Selection by LASSO Generalized Linear Model Model Components Response/Dependent Variable: Y Predictors/Independent Variables: X 1,..., X p Observations: y n 1 = [y1,..., yn], x n p = [x1,..., xp] Model Setup Model: Y X F Y, E(Y ) = µ = g 1 (Xβ), V ar(y ) = V (g 1 (Xβ)) where, β is unknown parameters; g isthe link function. p 1 Examples Linear Regression: Y X N(Xβ, σ 2 ), or y i = β 0 + p j=1 βjxij + ɛi, where ɛi i.i.d N(0, σ2 ) Logistic Regression: Y X Bernoulli(p = g 1 (Xβ)), g(µ) = log( µ P(y i = 1) = exp (β 0+ p j=1 β j x ij ) 1+exp (β 0 + p j=1 β j x ij ) 1 µ ), or

14 (SKIPPED) Linear Model: Variable Selection by LASSO Generalized Linear Model Parameter Estimations Maximum Likelihood, Maximum Quasi-Likelihood, Minimum Loss Function Least Square, Iteratively Reweighted Least Squares Bayesian Methods Overfitting

15 (SKIPPED) Linear Model: Variable Selection by LASSO Overfitting: Traditional Methods Forward, Backward, Stepwise Selection Best Subset Selection: R 2, MSE, AIC, SBC Principal Components Regression Ridge Regression: Shrinkage Estimations Penalized Methods LASSO, SCAD, Dantzig selector SIS, ISIS

16 (SKIPPED) Linear Model: Variable Selection by LASSO Ridge Regression-L2 Penalized Methods L(X, β)-loss Function p y i = β 0 + β jx ij + ɛ i, ɛ i i.i.d N(0, σ 2 ) j=1 Least Squares Regression Ridge Regression ls ˆβ = argmin β {L(X, β)} p = argmin β { (y i ŷ i) 2 } i=1 ridge ˆβ = argmin β {L(X, β)} p p = argmin β { (y i ŷ i) 2 + λ βj 2 } i=1 j=1 ˆβ ls = (X X) 1 Ỹ ˆβ ridge = (X X + λi) 1 Ỹ

17 (SKIPPED) Linear Model: Variable Selection by LASSO Ridge Regression-L2 Penalized Methods L(X, β)-loss Function p y i = β 0 + β jx ij + ɛ i, ɛ i i.i.d N(0, σ 2 ) j=1 Least Squares Regression Ridge Regression ls ˆβ = argmin β {L(X, β)} p = argmin β { (y i ŷ i) 2 } i=1 ridge ˆβ = argmin β {L(X, β)} p p = argmin β { (y i ŷ i) 2 + λ βj 2 } i=1 j=1 ˆβ ls = (X X) 1 Ỹ ˆβ ridge = (X X + λi) 1 Ỹ

18 (SKIPPED) Linear Model: Variable Selection by LASSO Ridge Regression-L2 Penalized Methods ridge ˆβ = argmin β { Ỹ Xβ + λ β l2 }, where β l2 = ridge ˆβ = argmin β { Ỹ Xβ } s.t. β l2 t There is a one-to-one correspondence between λ and t p β i 2 i=1

19 (SKIPPED) Linear Model: Variable Selection by LASSO Ridge Regression-L2 Penalized Methods

20 (SKIPPED) Linear Model: Variable Selection by LASSO LASSO-L1 Penalized Model LASSO (Least Absolute Shrinkage and Selection Operator): p y i = β 0 + β jx ij + ɛ i, ɛ i i.i.d N(0, σ 2 ) j=1 lasso ˆβ lasso ˆβ s.t. = argmin β { Ỹ Xβ 2 + λ β l1 }, where β l1 = = argmin β { Ỹ Xβ 2 } β l1 t p β j j=1

21 (SKIPPED) Linear Model: Variable Selection by LASSO LASSO-L1 Penalized Model

22 (SKIPPED) Linear Model: Variable Selection by LASSO Comparison of L1 and L2 Penalized Model L2 Penalized Estimation L1 Penalized Estimation ridge ˆβ s.t. = argmin β { Ỹ Xβ 2 } β l2 t lasso ˆβ s.t. = argmin β { Ỹ Xβ 2 } β l1 t

23 (SKIPPED) Linear Model: Variable Selection by LASSO LASSO References: The original paper: Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., Vol. 58, No. 1 The least angle regression (LAR) algorithm for solving the Lasso: Efron, B., Johnstone, I., Hastie, T. and Tibshirani, R. (2004). Least angle regression, Ann. Statist. Vol. 32, No. 2 Details and comparisons: Hastie, T., Tibshirani, R. and Jerome, F. The elements of statistical learning, second edition, Springer Computations: LASSO in R: glmnet, lasso2, lars Relaxed LASSO in R: relaxo

24 (SKIPPED) Linear Model: Variable Selection by LASSO LASSO References: The original paper: Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., Vol. 58, No. 1 The least angle regression (LAR) algorithm for solving the Lasso: Efron, B., Johnstone, I., Hastie, T. and Tibshirani, R. (2004). Least angle regression, Ann. Statist. Vol. 32, No. 2 Details and comparisons: Hastie, T., Tibshirani, R. and Jerome, F. The elements of statistical learning, second edition, Springer Computations: LASSO in R: glmnet, lasso2, lars Relaxed LASSO in R: relaxo

25 MARS: Multivariate Adaptive Regression Splines Outline 1 Overview 2 Data and Descriptive Analysis 3 (SKIPPED) Linear Model: Variable Selection by LASSO 4 MARS: Multivariate Adaptive Regression Splines 5 GBM: Generalized Boosted Models 6 More Results based on SPY 2011 December Level-1 Data 7 Discussions

26 MARS: Multivariate Adaptive Regression Splines Ideas: Linear Regression: Y = β 0 + p j=1 βjxj + ɛi MARS: Instead of X j, MARS uses a collection of new predictors in the form of piecewise linear basis functions: {(X j t) +, (t X j) +}, j = 1,, p, t {x 1j,, x Nj} Algorithm: 1 Stagewise forward selection: Keeping coefficients the same for variable existed in the current model, select a new basis function pair that produces the largest decrease in training error. Repeat the whole process until some termination condition is met: reached the maximum number (pre-determined) of predictors; adding any new predictor changes RSquare by less than 0.001; reached a very high RSquare value. 2 Backward pruning: find the subset which gives the lowest Cross Validation error, or GCV.

27 MARS: Multivariate Adaptive Regression Splines Ideas: Linear Regression: Y = β 0 + p j=1 βjxj + ɛi MARS: Instead of X j, MARS uses a collection of new predictors in the form of piecewise linear basis functions: {(X j t) +, (t X j) +}, j = 1,, p, t {x 1j,, x Nj} Algorithm: 1 Stagewise forward selection: Keeping coefficients the same for variable existed in the current model, select a new basis function pair that produces the largest decrease in training error. Repeat the whole process until some termination condition is met: reached the maximum number (pre-determined) of predictors; adding any new predictor changes RSquare by less than 0.001; reached a very high RSquare value. 2 Backward pruning: find the subset which gives the lowest Cross Validation error, or GCV.

28 MARS: Multivariate Adaptive Regression Splines Cross Validation:

29 MARS: Multivariate Adaptive Regression Splines Model Fitting: Y = j β jf(x j )

30 MARS: Multivariate Adaptive Regression Splines Results:

31 MARS: Multivariate Adaptive Regression Splines Results:

32 MARS: Multivariate Adaptive Regression Splines Results:

33 MARS: Multivariate Adaptive Regression Splines Results:

34 MARS: Multivariate Adaptive Regression Splines Results:

35 GBM: Generalized Boosted Models Outline 1 Overview 2 Data and Descriptive Analysis 3 (SKIPPED) Linear Model: Variable Selection by LASSO 4 MARS: Multivariate Adaptive Regression Splines 5 GBM: Generalized Boosted Models 6 More Results based on SPY 2011 December Level-1 Data 7 Discussions

36 GBM: Generalized Boosted Models History of Boosting (Trevor Hastie, January 2003):

37 GBM: Generalized Boosted Models Adaboost: 1 Initialize the observation weights: w i = 1/N, i = 1,, N; 2 For m = 1 to M, Fit a classifier G m(x, θ) to the training data using weights w i - Stump, CART, ADT tree, etc; Compute the miss-prediction error by: err m = w i I(y i G m(x i ), θ) Compute α m = log 1 errm err m Update the weights: w i w i exp(α m I(y i G m(x i, θ))) 3 Final Output: G(x) = sign [ α m G m(x, θ)]

38 GBM: Generalized Boosted Models Equivalence: Additive Model with Exponential Loss (Forward Stagewise Algorithm): G(x) = Forward Stagewise Algorithm M β m f(x, θ m) m=1 1 Initialize G(x) = 0; 2 For m = 1 to M, Calibrate the model: (β m, θ m) = argmin β,θ N i=1 L (y i, G(x i ) + β f(x, θ)); Update: G(x) G(x) + β m f(x, θ m) Exponential Loss:L (y, f(x)) = exp( y f(x))

39 GBM: Generalized Boosted Models Exponential Loss (Hastie, 2008): Exponential Loss:L (y, f(x)) = exp( y f(x))

40 GBM: Generalized Boosted Models Gradient Boosting: Forward Stagewise Algorithm 1 Initialize G(x) = argmin N a i=1 L (yi, a); 2 For m = 1 to M, [ ] L(y,f) Calculate r i = f f=g(xi ) Fit a regression tree based on {r i } giving terminal regions {R 1,, R K } for k = 1 to K, compute a k = argmin a L(y i, f(x i ) + a) x i R k Update: G(x) G(x) + k a ki(x R k )

41 GBM: Generalized Boosted Models Model Fitting:

42 GBM: Generalized Boosted Models Results:

43 More Results based on SPY 2011 December Level-1 Data Outline 1 Overview 2 Data and Descriptive Analysis 3 (SKIPPED) Linear Model: Variable Selection by LASSO 4 MARS: Multivariate Adaptive Regression Splines 5 GBM: Generalized Boosted Models 6 More Results based on SPY 2011 December Level-1 Data 7 Discussions

44 Results based on SPY 2012 December Level-1 Data:

45 Discussions Outline 1 Overview 2 Data and Descriptive Analysis 3 (SKIPPED) Linear Model: Variable Selection by LASSO 4 MARS: Multivariate Adaptive Regression Splines 5 GBM: Generalized Boosted Models 6 More Results based on SPY 2011 December Level-1 Data 7 Discussions

46 Discussions Discussions: Generally speaking, our principles to build a successful quantitative trading strategy include: Modeling the market sense and experience quantitatively (ex: which information is useful and could be coded into the predictors, what is the rationale for a strong correlation, etc.); Keeping the model as simple as possible and generalizing the model intuitively instead of overfitting by the magic black-box. Dynamic/online Trading Strategy; Window sizes for training and testing Choice of holding period; Expert System; Asset Allocations Homogeneity: different intra-day time periods Additional predictors; different definitions of price return Risk Management

47 Discussions Questions?

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

Regularization Paths. Theme

Regularization Paths. Theme June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,

More information

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13 Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

LEAST ANGLE REGRESSION 469

LEAST ANGLE REGRESSION 469 LEAST ANGLE REGRESSION 469 Specifically for the Lasso, one alternative strategy for logistic regression is to use a quadratic approximation for the log-likelihood. Consider the Bayesian version of Lasso

More information

Least Angle Regression, Forward Stagewise and the Lasso

Least Angle Regression, Forward Stagewise and the Lasso January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Adaptive Lasso for correlated predictors

Adaptive Lasso for correlated predictors Adaptive Lasso for correlated predictors Keith Knight Department of Statistics University of Toronto e-mail: keith@utstat.toronto.edu This research was supported by NSERC of Canada. OUTLINE 1. Introduction

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Chapter 6. Ensemble Methods

Chapter 6. Ensemble Methods Chapter 6. Ensemble Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Introduction

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees

More information

Gradient Boosting (Continued)

Gradient Boosting (Continued) Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive

More information

Support Vector Machine, Random Forests, Boosting Based in part on slides from textbook, slides of Susan Holmes. December 2, 2012

Support Vector Machine, Random Forests, Boosting Based in part on slides from textbook, slides of Susan Holmes. December 2, 2012 Support Vector Machine, Random Forests, Boosting Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Neural networks Neural network Another classifier (or regression technique)

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Boosting. March 30, 2009

Boosting. March 30, 2009 Boosting Peter Bühlmann buhlmann@stat.math.ethz.ch Seminar für Statistik ETH Zürich Zürich, CH-8092, Switzerland Bin Yu binyu@stat.berkeley.edu Department of Statistics University of California Berkeley,

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes

Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes Support Vector Machine, Random Forests, Boosting December 2, 2012 1 / 1 2 / 1 Neural networks Artificial Neural networks: Networks

More information

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony

More information

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12 Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Importance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University

Importance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University Importance Sampling: An Alternative View of Ensemble Learning Jerome H. Friedman Bogdan Popescu Stanford University 1 PREDICTIVE LEARNING Given data: {z i } N 1 = {y i, x i } N 1 q(z) y = output or response

More information

PDEEC Machine Learning 2016/17

PDEEC Machine Learning 2016/17 PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /

More information

A significance test for the lasso

A significance test for the lasso 1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Discussion of Least Angle Regression

Discussion of Least Angle Regression Discussion of Least Angle Regression David Madigan Rutgers University & Avaya Labs Research Piscataway, NJ 08855 madigan@stat.rutgers.edu Greg Ridgeway RAND Statistics Group Santa Monica, CA 90407-2138

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

A Short Introduction to the Lasso Methodology

A Short Introduction to the Lasso Methodology A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael

More information

Machine Learning for Microeconometrics

Machine Learning for Microeconometrics Machine Learning for Microeconometrics A. Colin Cameron Univ. of California- Davis Abstract: These slides attempt to explain machine learning to empirical economists familiar with regression methods. The

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

A Study of Relative Efficiency and Robustness of Classification Methods

A Study of Relative Efficiency and Robustness of Classification Methods A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Boosted Lasso and Reverse Boosting

Boosted Lasso and Reverse Boosting Boosted Lasso and Reverse Boosting Peng Zhao University of California, Berkeley, USA. Bin Yu University of California, Berkeley, USA. Summary. This paper introduces the concept of backward step in contrast

More information

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions. f(x) = c m 1(x R m )

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions. f(x) = c m 1(x R m ) CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions with R 1,..., R m R p disjoint. f(x) = M c m 1(x R m ) m=1 The CART algorithm is a heuristic, adaptive

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24 Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Ridge and Lasso Regression

Ridge and Lasso Regression enote 8 1 enote 8 Ridge and Lasso Regression enote 8 INDHOLD 2 Indhold 8 Ridge and Lasso Regression 1 8.1 Reading material................................. 2 8.2 Presentation material...............................

More information

Pathwise coordinate optimization

Pathwise coordinate optimization Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

Variable Selection in Data Mining Project

Variable Selection in Data Mining Project Variable Selection Variable Selection in Data Mining Project Gilles Godbout IFT 6266 - Algorithmes d Apprentissage Session Project Dept. Informatique et Recherche Opérationnelle Université de Montréal

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

Introduction to Machine Learning and Cross-Validation

Introduction to Machine Learning and Cross-Validation Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, 2019 1 / 29 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance

More information

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions f (x) = M c m 1(x R m ) m=1 with R 1,..., R m R p disjoint. The CART algorithm is a heuristic, adaptive

More information

10701/15781 Machine Learning, Spring 2007: Homework 2

10701/15781 Machine Learning, Spring 2007: Homework 2 070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Machine Learning. Part 1. Linear Regression. Machine Learning: Regression Case. .. Dennis Sun DATA 401 Data Science Alex Dekhtyar..

Machine Learning. Part 1. Linear Regression. Machine Learning: Regression Case. .. Dennis Sun DATA 401 Data Science Alex Dekhtyar.. .. Dennis Sun DATA 401 Data Science Alex Dekhtyar.. Machine Learning. Part 1. Linear Regression Machine Learning: Regression Case. Dataset. Consider a collection of features X = {X 1,...,X n }, such that

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Lecture 5: Soft-Thresholding and Lasso

Lecture 5: Soft-Thresholding and Lasso High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

Logistic Regression and Boosting for Labeled Bags of Instances

Logistic Regression and Boosting for Labeled Bags of Instances Logistic Regression and Boosting for Labeled Bags of Instances Xin Xu and Eibe Frank Department of Computer Science University of Waikato Hamilton, New Zealand {xx5, eibe}@cs.waikato.ac.nz Abstract. In

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

Machine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015

Machine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015 Machine Learning Regression basics Linear regression, non-linear features (polynomial, RBFs, piece-wise), regularization, cross validation, Ridge/Lasso, kernel trick Marc Toussaint University of Stuttgart

More information

Modeling Real Estate Data using Quantile Regression

Modeling Real Estate Data using Quantile Regression Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Compressed Sensing in Cancer Biology? (A Work in Progress)

Compressed Sensing in Cancer Biology? (A Work in Progress) Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom

More information

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015 Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary

More information

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Conjugate direction boosting

Conjugate direction boosting Conjugate direction boosting June 21, 2005 Revised Version Abstract Boosting in the context of linear regression become more attractive with the invention of least angle regression (LARS), where the connection

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Solving Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 12. Slides adapted from Matt Nedrich and Trevor Hastie

Solving Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 12. Slides adapted from Matt Nedrich and Trevor Hastie Solving Regression Jordan Boyd-Graber University of Colorado Boulder LECTURE 12 Slides adapted from Matt Nedrich and Trevor Hastie Jordan Boyd-Graber Boulder Solving Regression 1 of 17 Roadmap We talked

More information