On prediction. Jussi Hakanen Post-doctoral researcher. TIES445 Data mining (guest lecture)

Size: px
Start display at page:

Download "On prediction. Jussi Hakanen Post-doctoral researcher. TIES445 Data mining (guest lecture)"

Transcription

1 On prediction Jussi Hakanen Post-doctoral researcher

2 Learning outcomes To understand the basic principles of prediction To understand linear regression in prediction To be aware of the connection between the least squares and optimization

3 Exercise Find out issues related to prediction in data mining Work in pairs or small groups Time: 10 minutes TIES445 Data mining (guest lecture)

4 Summary of the exercise How to measure quality of prediction? How to avoid overlearning/overfitting? Supervised learning How to predict for missing data? Different applications (stock markets, biology, sport prediction, income in finance, energy water consumption, ) Underfitting Bias of the model Concept drift (inaccuracy of the prediction) Obtaining new knowledge from a given data TIES445 Data mining (guest lecture)

5 Motivation How to estimate data Within the ranges of the dataset? Outside of the dataset? Handling missing data, outliers Predictive vs. descriptive models Prediction for numerical values (cf. classification)

6 Concepts Predictor (or independent or input) variables x = x 1,, x N T (N 1) Response (or dependent or output) variables y = y 1,, y M T (M 1) Regression model/function A model/function describing the prediction used Linear regression Regression model/function is linear

7 Prediction A set of data for which the values of predictor and response variables are known P data points x j, y j, j = 1,, P x j = x 1 j,, x N j T y j = y 1 j,, y M j T Idea is to use prediction models for predicting a value for such predictor variables for which we don t know the response Note: P should be greater or equal than N! Interpolation vs. extrapolation Can give misleading results if not interpreted carefully!!! Accuracy important Different measures of accuracy Can be used e.g. to choose between different models and/or for choosing values for different parameters in the models Can sometimes be sacrificed for a simpler model

8 Regression analysis Used for prediction and forecasting Parametric/non-parametric regression Regression function depends of a finite number of unknown parameters Non-parametric: regression function is in a set of functions (can be infinite dimensional) Linear and nonlinear regression W.r.t to the parameters

9 Linear regression Model linear w.r.t. parameters Not necessarily linear w.r.t. predictor variables N i=1 y = a 0 + a i x i y is a predicted estimate of the mean value at x a 0,, a N are parameters Oldest and most widely used due to simplicity Typically, the model used is not exact an error exists y j = y j + e j for each data point x j, j = 1,, P In matrix terms: y = Xa + e How to select values for the parameters a?

10 Least squares Determining orbits of bodies around the Sun from astronomical observations (Legendre, 1805; Gauss, 1809) Idea: minimize the sum of the squared errors For problems with P > N P P j=1 = y j N j j=1 i=0 a i x i e j 2 P min y j N j a i x a j=1 i=0 i Optimization problem! 2 Parameter values minimizing the above can be shown to be a = X T X 1 X T y Direct solution requires X T X to be invertible (problems if P is small or there are linear dependences between x i ) Typically a is computed by numerical linear algebra 2

11 Example y = x

12 Example (cont.) y = x x 2

13 Notes The parameter values in linear regression can be interpreted as follows: If the value of predictor variable x i is increased by one unit and the values of other predictor variables remain the same, then a i denotes the change in prediction N y = a 0 + i=1 a i x i TIES445 Data mining (guest lecture)

14 Connection to optimization Least squares unconstrained optimization problem min 1 a 2 (f j (a)) 2 P j=1 = min 1 a 2 f a 2 Function f j (a) = y j h(a, x j ) where h(a, x) is the model used N E.g. h a, x = a 0 + a i x i i=1 Gauss-Newton method Taylor (1st order): f a, a h f a h + f a h T (a a h ) a h+1 = a h f a h f a h T 1 f a h f(a h ) Connection to Newton s method Hessian of 1 f a 2 : f a h f a h T P + 2 f 2 j=1 j a h f j (a h ) Gauss-Newton is equivalent with Newton except the second order term!

15 Function approximation Prediction can be used in optimization for approximating the objective function Typically used when the evaluation of the objective function is time consuming E.g. if the model is a partial differential equation that takes significant amount of time to solve numerically Reduces time for optimization since typically a large amount of function evaluations are required Examples of approximation models are polynomial approximation, radial basis functions (RBFs), Kriging, support vector regression

16 Regularization Previously, no requirements were made for the parameter values Unconstraint optimization problem E.g. need for constraining the size of parameters Tikhonov regularization (ridge regression) Add a constraint that a 2, the L 2 norm of the parameter vector is not greater than a given value Can be considered as unconstraint optimization problem by adding a penalty term β a 2 to the objective function Lasso method (least absolute shrinkage and selection operator) Add a constraint that a 1, the L 1 norm of the parameter vector is not greater than a given value Prefers solutions with fewer non-zeros

17 Conclusions What were the keypoints from your perspective? What do you remember best? Extrapolation and interpolation can be dangerous Regularization is important?

18 Thank You! Dr. Jussi Hakanen Industrial Optimization Group Department of Mathematical Information Technology P.O. Box 35 (Agora) FI University of Jyväskylä

Introduction to unconstrained optimization - direct search methods

Introduction to unconstrained optimization - direct search methods Introduction to unconstrained optimization - direct search methods Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Structure of optimization methods Typically Constraint handling converts the

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Constrained optimization: direct methods (cont.)

Constrained optimization: direct methods (cont.) Constrained optimization: direct methods (cont.) Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Direct methods Also known as methods of feasible directions Idea in a point x h, generate a

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next

More information

CSC 411: Lecture 02: Linear Regression

CSC 411: Lecture 02: Linear Regression CSC 411: Lecture 02: Linear Regression Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto (Most plots in this lecture are from Bishop s book) Zemel, Urtasun, Fidler (UofT) CSC 411: 02-Regression

More information

Lecture Data Science

Lecture Data Science Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Regression Analysis JProf. Dr. Last Time How to find parameter of a regression model Normal Equation Gradient Decent

More information

Optimization. Next: Curve Fitting Up: Numerical Analysis for Chemical Previous: Linear Algebraic and Equations. Subsections

Optimization. Next: Curve Fitting Up: Numerical Analysis for Chemical Previous: Linear Algebraic and Equations. Subsections Next: Curve Fitting Up: Numerical Analysis for Chemical Previous: Linear Algebraic and Equations Subsections One-dimensional Unconstrained Optimization Golden-Section Search Quadratic Interpolation Newton's

More information

Exam 2. Average: 85.6 Median: 87.0 Maximum: Minimum: 55.0 Standard Deviation: Numerical Methods Fall 2011 Lecture 20

Exam 2. Average: 85.6 Median: 87.0 Maximum: Minimum: 55.0 Standard Deviation: Numerical Methods Fall 2011 Lecture 20 Exam 2 Average: 85.6 Median: 87.0 Maximum: 100.0 Minimum: 55.0 Standard Deviation: 10.42 Fall 2011 1 Today s class Multiple Variable Linear Regression Polynomial Interpolation Lagrange Interpolation Newton

More information

8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7]

8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7] 8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7] While cross-validation allows one to find the weight penalty parameters which would give the model good generalization capability, the separation of

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

COMP 551 Applied Machine Learning Lecture 2: Linear Regression

COMP 551 Applied Machine Learning Lecture 2: Linear Regression COMP 551 Applied Machine Learning Lecture 2: Linear Regression Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

Lecture 2: Linear regression

Lecture 2: Linear regression Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued

More information

Kernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt.

Kernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt. SINGAPORE SHANGHAI Vol TAIPEI - Interdisciplinary Mathematical Sciences 19 Kernel-based Approximation Methods using MATLAB Gregory Fasshauer Illinois Institute of Technology, USA Michael McCourt University

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Support Vector Machine I

Support Vector Machine I Support Vector Machine I Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Lecture VIII Dim. Reduction (I)

Lecture VIII Dim. Reduction (I) Lecture VIII Dim. Reduction (I) Contents: Subset Selection & Shrinkage Ridge regression, Lasso PCA, PCR, PLS Lecture VIII: MLSC - Dr. Sethu Viayakumar Data From Human Movement Measure arm movement and

More information

Parameter Norm Penalties. Sargur N. Srihari

Parameter Norm Penalties. Sargur N. Srihari Parameter Norm Penalties Sargur N. srihari@cedar.buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained

More information

Numerical Integration (Quadrature) Another application for our interpolation tools!

Numerical Integration (Quadrature) Another application for our interpolation tools! Numerical Integration (Quadrature) Another application for our interpolation tools! Integration: Area under a curve Curve = data or function Integrating data Finite number of data points spacing specified

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis

More information

Section 9.7 and 9.10: Taylor Polynomials and Approximations/Taylor and Maclaurin Series

Section 9.7 and 9.10: Taylor Polynomials and Approximations/Taylor and Maclaurin Series Section 9.7 and 9.10: Taylor Polynomials and Approximations/Taylor and Maclaurin Series Power Series for Functions We can create a Power Series (or polynomial series) that can approximate a function around

More information

Machine Learning and Data Mining. Linear regression. Kalev Kask

Machine Learning and Data Mining. Linear regression. Kalev Kask Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2017

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2017 CPSC 340: Machine Learning and Data Mining Regularization Fall 2017 Assignment 2 Admin 2 late days to hand in tonight, answers posted tomorrow morning. Extra office hours Thursday at 4pm (ICICS 246). Midterm

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Statistical Learning

Statistical Learning Statistical Learning Supervised learning Assume: Estimate: quantity of interest function predictors to get: error Such that: For prediction and/or inference Model fit vs. Model stability (Bias variance

More information

Multiobjective optimization methods

Multiobjective optimization methods Multiobjective optimization methods Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi spring 2014 TIES483 Nonlinear optimization No-preference methods DM not available (e.g. online optimization)

More information

5.6 Nonparametric Logistic Regression

5.6 Nonparametric Logistic Regression 5.6 onparametric Logistic Regression Dmitri Dranishnikov University of Florida Statistical Learning onparametric Logistic Regression onparametric? Doesnt mean that there are no parameters. Just means that

More information

Optimization Methods for Machine Learning (OMML)

Optimization Methods for Machine Learning (OMML) Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data

More information

Machine learning - HT Basis Expansion, Regularization, Validation

Machine learning - HT Basis Expansion, Regularization, Validation Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Chart types and when to use them

Chart types and when to use them APPENDIX A Chart types and when to use them Pie chart Figure illustration of pie chart 2.3 % 4.5 % Browser Usage for April 2012 18.3 % 38.3 % Internet Explorer Firefox Chrome Safari Opera 35.8 % Pie chart

More information

Learning Goals. 2. To be able to distinguish between a dependent and independent variable.

Learning Goals. 2. To be able to distinguish between a dependent and independent variable. Learning Goals 1. To understand what a linear regression is. 2. To be able to distinguish between a dependent and independent variable. 3. To understand what the correlation coefficient measures. 4. To

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

In the Name of God. Lectures 15&16: Radial Basis Function Networks

In the Name of God. Lectures 15&16: Radial Basis Function Networks 1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training

More information

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane DATA MINING AND MACHINE LEARNING Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Linear models for regression Regularized

More information

BSM510 Numerical Analysis

BSM510 Numerical Analysis BSM510 Numerical Analysis Polynomial Interpolation Prof. Manar Mohaisen Department of EEC Engineering Review of Precedent Lecture Polynomial Regression Multiple Linear Regression Nonlinear Regression Lecture

More information

Approximate Linear Relationships

Approximate Linear Relationships Approximate Linear Relationships In the real world, rarely do things follow trends perfectly. When the trend is expected to behave linearly, or when inspection suggests the trend is behaving linearly,

More information

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions

More information

Lasso, Ridge, and Elastic Net

Lasso, Ridge, and Elastic Net Lasso, Ridge, and Elastic Net David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 14 A Very Simple Model Suppose we have one feature

More information

Data Mining. Supervised Learning. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Supervised Learning. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Supervised Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 15 Table of contents 1 Introduction 2 Supervised

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

Optimal control problems with PDE constraints

Optimal control problems with PDE constraints Optimal control problems with PDE constraints Maya Neytcheva CIM, October 2017 General framework Unconstrained optimization problems min f (q) q x R n (real vector) and f : R n R is a smooth function.

More information

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016 CPSC 340: Machine Learning and Data Mining Gradient Descent Fall 2016 Admin Assignment 1: Marks up this weekend on UBC Connect. Assignment 2: 3 late days to hand it in Monday. Assignment 3: Due Wednesday

More information

II. Linear Models (pp.47-70)

II. Linear Models (pp.47-70) Notation: Means pencil-and-paper QUIZ Means coding QUIZ Agree or disagree: Regression can be always reduced to classification. Explain, either way! A certain classifier scores 98% on the training set,

More information

An economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso

An economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso An economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso PIER Exchange Nov. 17, 2016 Thammarak Moenjak What is machine learning? Wikipedia

More information

Linear and Logistic Regression. Dr. Xiaowei Huang

Linear and Logistic Regression. Dr. Xiaowei Huang Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

CHAPTER 2: QUADRATIC PROGRAMMING

CHAPTER 2: QUADRATIC PROGRAMMING CHAPTER 2: QUADRATIC PROGRAMMING Overview Quadratic programming (QP) problems are characterized by objective functions that are quadratic in the design variables, and linear constraints. In this sense,

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10 COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem

More information

Dennis Bricker Dept of Mechanical & Industrial Engineering The University of Iowa. Forecasting demand 02/06/03 page 1 of 34

Dennis Bricker Dept of Mechanical & Industrial Engineering The University of Iowa. Forecasting demand 02/06/03 page 1 of 34 demand -5-4 -3-2 -1 0 1 2 3 Dennis Bricker Dept of Mechanical & Industrial Engineering The University of Iowa Forecasting demand 02/06/03 page 1 of 34 Forecasting is very difficult. especially about the

More information

Lecture 16 Solving GLMs via IRWLS

Lecture 16 Solving GLMs via IRWLS Lecture 16 Solving GLMs via IRWLS 09 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due next class problem set 6, November 18th Goals for today fixed PCA example

More information

TRACKING and DETECTION in COMPUTER VISION

TRACKING and DETECTION in COMPUTER VISION Technischen Universität München Winter Semester 2013/2014 TRACKING and DETECTION in COMPUTER VISION Template tracking methods Slobodan Ilić Template based-tracking Energy-based methods The Lucas-Kanade(LK)

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Learning From Data Lecture 12 Regularization

Learning From Data Lecture 12 Regularization Learning From Data Lecture 12 Regularization Constraining the Model Weight Decay Augmented Error M. Magdon-Ismail CSCI 4100/6100 recap: Overfitting Fitting the data more than is warranted Data Target Fit

More information

Solving Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 12. Slides adapted from Matt Nedrich and Trevor Hastie

Solving Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 12. Slides adapted from Matt Nedrich and Trevor Hastie Solving Regression Jordan Boyd-Graber University of Colorado Boulder LECTURE 12 Slides adapted from Matt Nedrich and Trevor Hastie Jordan Boyd-Graber Boulder Solving Regression 1 of 17 Roadmap We talked

More information

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren 1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

IEOR165 Discussion Week 5

IEOR165 Discussion Week 5 IEOR165 Discussion Week 5 Sheng Liu University of California, Berkeley Feb 19, 2016 Outline 1 1st Homework 2 Revisit Maximum A Posterior 3 Regularization IEOR165 Discussion Sheng Liu 2 About 1st Homework

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 02: Linear Regression 1 I d rather die than telling you my password! Transfer success! 2 Outline Announcements Linear Regression Regularization Cross-validation 3 Outline

More information

Lec3p1, ORF363/COS323

Lec3p1, ORF363/COS323 Lec3 Page 1 Lec3p1, ORF363/COS323 This lecture: Optimization problems - basic notation and terminology Unconstrained optimization The Fermat-Weber problem Least squares First and second order necessary

More information

NUMERICAL COMPUTATION IN SCIENCE AND ENGINEERING

NUMERICAL COMPUTATION IN SCIENCE AND ENGINEERING NUMERICAL COMPUTATION IN SCIENCE AND ENGINEERING C. Pozrikidis University of California, San Diego New York Oxford OXFORD UNIVERSITY PRESS 1998 CONTENTS Preface ix Pseudocode Language Commands xi 1 Numerical

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

STK Statistical Learning: Advanced Regression and Classification

STK Statistical Learning: Advanced Regression and Classification STK4030 - Statistical Learning: Advanced Regression and Classification Riccardo De Bin debin@math.uio.no STK4030: lecture 1 1/ 42 Outline of the lecture Introduction Overview of supervised learning Variable

More information

Advanced Engineering Statistics - Section 5 - Jay Liu Dept. Chemical Engineering PKNU

Advanced Engineering Statistics - Section 5 - Jay Liu Dept. Chemical Engineering PKNU Advanced Engineering Statistics - Section 5 - Jay Liu Dept. Chemical Engineering PKNU Least squares regression What we will cover Box, G.E.P., Use and abuse of regression, Technometrics, 8 (4), 625-629,

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R. Methods and Applications of Linear Models Regression and the Analysis of Variance Third Edition RONALD R. HOCKING PenHock Statistical Consultants Ishpeming, Michigan Wiley Contents Preface to the Third

More information

Reverse engineering using computational algebra

Reverse engineering using computational algebra Reverse engineering using computational algebra Matthew Macauley Department of Mathematical Sciences Clemson University http://www.math.clemson.edu/~macaule/ Math 4500, Fall 2016 M. Macauley (Clemson)

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information