On prediction. Jussi Hakanen Post-doctoral researcher. TIES445 Data mining (guest lecture)

Similar documents
Introduction to unconstrained optimization - direct search methods

Lecture 14: Shrinkage

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Constrained optimization: direct methods (cont.)

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CPSC 340: Machine Learning and Data Mining

CSC 411: Lecture 02: Linear Regression

Lecture Data Science

Optimization. Next: Curve Fitting Up: Numerical Analysis for Chemical Previous: Linear Algebraic and Equations. Subsections

Exam 2. Average: 85.6 Median: 87.0 Maximum: Minimum: 55.0 Standard Deviation: Numerical Methods Fall 2011 Lecture 20

8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7]

High-dimensional regression

Lecture 2 Machine Learning Review

COMP 551 Applied Machine Learning Lecture 2: Linear Regression

Day 4: Shrinkage Estimators

Linear Methods for Regression. Lijun Zhang

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Lecture 2: Linear regression

Kernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt.

Machine Learning Linear Regression. Prof. Matteo Matteucci

Support Vector Machine I

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

ISyE 691 Data mining and analytics

Lecture VIII Dim. Reduction (I)

Parameter Norm Penalties. Sargur N. Srihari

Numerical Integration (Quadrature) Another application for our interpolation tools!

Data Mining Stat 588

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

ECE521 week 3: 23/26 January 2017

Support Vector Machines: Maximum Margin Classifiers

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Section 9.7 and 9.10: Taylor Polynomials and Approximations/Taylor and Maclaurin Series

Machine Learning and Data Mining. Linear regression. Kalev Kask

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Overfitting, Bias / Variance Analysis

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Lecture 3: Statistical Decision Theory (Part II)

6. Regularized linear regression

Statistical Data Mining and Machine Learning Hilary Term 2016

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2017

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Statistical Learning

Multiobjective optimization methods

5.6 Nonparametric Logistic Regression

Optimization Methods for Machine Learning (OMML)

Machine learning - HT Basis Expansion, Regularization, Validation

Introduction to Machine Learning

Chart types and when to use them

Learning Goals. 2. To be able to distinguish between a dependent and independent variable.

TDT4173 Machine Learning

In the Name of God. Lectures 15&16: Radial Basis Function Networks

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane

BSM510 Numerical Analysis

Approximate Linear Relationships

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Lasso, Ridge, and Elastic Net

Data Mining. Supervised Learning. Hamid Beigy. Sharif University of Technology. Fall 1396

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Optimal control problems with PDE constraints

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016

II. Linear Models (pp.47-70)

An economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso

Linear and Logistic Regression. Dr. Xiaowei Huang

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Least Squares Regression

CHAPTER 2: QUADRATIC PROGRAMMING

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

Dennis Bricker Dept of Mechanical & Industrial Engineering The University of Iowa. Forecasting demand 02/06/03 page 1 of 34

Lecture 16 Solving GLMs via IRWLS

TRACKING and DETECTION in COMPUTER VISION

Machine Learning Practice Page 2 of 2 10/28/13

STA 4273H: Statistical Machine Learning

Learning From Data Lecture 12 Regularization

Solving Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 12. Slides adapted from Matt Nedrich and Trevor Hastie

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

Least Squares Regression

4 Bias-Variance for Ridge Regression (24 points)

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: Introduction to Complexity Regularization

IEOR165 Discussion Week 5

CIS 520: Machine Learning Oct 09, Kernel Methods

CS489/698: Intro to ML

Lec3p1, ORF363/COS323

NUMERICAL COMPUTATION IN SCIENCE AND ENGINEERING

Linear Models for Regression CS534

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Midterm exam CS 189/289, Fall 2015

COMS 4771 Regression. Nakul Verma

STK Statistical Learning: Advanced Regression and Classification

Advanced Engineering Statistics - Section 5 - Jay Liu Dept. Chemical Engineering PKNU

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Reverse engineering using computational algebra

CS281 Section 4: Factor Analysis and PCA

Recap from previous lecture

Transcription:

On prediction Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi

Learning outcomes To understand the basic principles of prediction To understand linear regression in prediction To be aware of the connection between the least squares and optimization

Exercise Find out issues related to prediction in data mining Work in pairs or small groups Time: 10 minutes TIES445 Data mining (guest lecture)

Summary of the exercise How to measure quality of prediction? How to avoid overlearning/overfitting? Supervised learning How to predict for missing data? Different applications (stock markets, biology, sport prediction, income in finance, energy water consumption, ) Underfitting Bias of the model Concept drift (inaccuracy of the prediction) Obtaining new knowledge from a given data TIES445 Data mining (guest lecture)

Motivation How to estimate data Within the ranges of the dataset? Outside of the dataset? Handling missing data, outliers Predictive vs. descriptive models Prediction for numerical values (cf. classification)

Concepts Predictor (or independent or input) variables x = x 1,, x N T (N 1) Response (or dependent or output) variables y = y 1,, y M T (M 1) Regression model/function A model/function describing the prediction used Linear regression Regression model/function is linear

Prediction A set of data for which the values of predictor and response variables are known P data points x j, y j, j = 1,, P x j = x 1 j,, x N j T y j = y 1 j,, y M j T Idea is to use prediction models for predicting a value for such predictor variables for which we don t know the response Note: P should be greater or equal than N! Interpolation vs. extrapolation Can give misleading results if not interpreted carefully!!! Accuracy important Different measures of accuracy Can be used e.g. to choose between different models and/or for choosing values for different parameters in the models Can sometimes be sacrificed for a simpler model

Regression analysis Used for prediction and forecasting Parametric/non-parametric regression Regression function depends of a finite number of unknown parameters Non-parametric: regression function is in a set of functions (can be infinite dimensional) Linear and nonlinear regression W.r.t to the parameters

Linear regression Model linear w.r.t. parameters Not necessarily linear w.r.t. predictor variables N i=1 y = a 0 + a i x i y is a predicted estimate of the mean value at x a 0,, a N are parameters Oldest and most widely used due to simplicity Typically, the model used is not exact an error exists y j = y j + e j for each data point x j, j = 1,, P In matrix terms: y = Xa + e How to select values for the parameters a?

Least squares Determining orbits of bodies around the Sun from astronomical observations (Legendre, 1805; Gauss, 1809) Idea: minimize the sum of the squared errors For problems with P > N P P j=1 = y j N j j=1 i=0 a i x i e j 2 P min y j N j a i x a j=1 i=0 i Optimization problem! 2 Parameter values minimizing the above can be shown to be a = X T X 1 X T y Direct solution requires X T X to be invertible (problems if P is small or there are linear dependences between x i ) Typically a is computed by numerical linear algebra 2

Example y = 0.6777 + 3.0166x

Example (cont.) y = 4.1579 0.0057x + 0.3053x 2

Notes The parameter values in linear regression can be interpreted as follows: If the value of predictor variable x i is increased by one unit and the values of other predictor variables remain the same, then a i denotes the change in prediction N y = a 0 + i=1 a i x i TIES445 Data mining (guest lecture)

Connection to optimization Least squares unconstrained optimization problem min 1 a 2 (f j (a)) 2 P j=1 = min 1 a 2 f a 2 Function f j (a) = y j h(a, x j ) where h(a, x) is the model used N E.g. h a, x = a 0 + a i x i i=1 Gauss-Newton method Taylor (1st order): f a, a h f a h + f a h T (a a h ) a h+1 = a h f a h f a h T 1 f a h f(a h ) Connection to Newton s method Hessian of 1 f a 2 : f a h f a h T P + 2 f 2 j=1 j a h f j (a h ) Gauss-Newton is equivalent with Newton except the second order term!

Function approximation Prediction can be used in optimization for approximating the objective function Typically used when the evaluation of the objective function is time consuming E.g. if the model is a partial differential equation that takes significant amount of time to solve numerically Reduces time for optimization since typically a large amount of function evaluations are required Examples of approximation models are polynomial approximation, radial basis functions (RBFs), Kriging, support vector regression

Regularization Previously, no requirements were made for the parameter values Unconstraint optimization problem E.g. need for constraining the size of parameters Tikhonov regularization (ridge regression) Add a constraint that a 2, the L 2 norm of the parameter vector is not greater than a given value Can be considered as unconstraint optimization problem by adding a penalty term β a 2 to the objective function Lasso method (least absolute shrinkage and selection operator) Add a constraint that a 1, the L 1 norm of the parameter vector is not greater than a given value Prefers solutions with fewer non-zeros

Conclusions What were the keypoints from your perspective? What do you remember best? Extrapolation and interpolation can be dangerous Regularization is important?

Thank You! Dr. Jussi Hakanen Industrial Optimization Group http://www.mit.jyu.fi/optgroup/ Department of Mathematical Information Technology P.O. Box 35 (Agora) FI-40014 University of Jyväskylä jussi.hakanen@jyu.fi http://users.jyu.fi/~jhaka/en/