Least Squares. Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Winter UCSD
|
|
- Timothy Hicks
- 5 years ago
- Views:
Transcription
1 Least Squares Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 75A Winter 0 - UCSD
2 (Unweighted) Least Squares Assume linearity in the unnown, deterministic model parameters Scalar, additive noise model: y f ( x; ) ( x) E.g., for a line (f affine in x), f ( x; ) x 0 ( x) 0 x his can be generalized to arbitrary functions of x: ( x) 0( x) 0 ( x)
3 Examples Components of (x) can be arbitrary nonlinear functions of x that are linear in : Line Fitting (affine in x): f ( x; ) x 0 Polynomial Fitting (nonlinear in x): f ( x; ) i x i 0 runcated Fourier Series (nonlinear in x): i ( x) [ x] ( x) x f ( x; ) i cos( i x) i 0 ( x) cos( x) 3
4 (Unweighted)Least Squares Loss = Euclidean norm of model error: where y L( ) y ( x) y ( x ) 0 ( x) y n ( xn ) he loss function can also be rewritten as, n yi xi n y x i L( ) ( ) ( ) n E.g., for the line, L ) ( yi 0 x i i 4
5 Examples he most important component is the Design Matrix (x) Line Fitting: Polynomial Fitting: f ( x; ) x 0 x ( x) x n runcated Fourier Series: f ( x; ) i 0 x ( x) x n i x i f ( x; ) i cos( i x) i 0 cos( x) ( x) cos( xn) 5
6 (Unweighted) Least Squares One way to minimize L( ) y ( x) ˆ is to find a value such that L( ˆ ) y ( x) ˆ ( x) 0 and L( ˆ ) ( x) ( x) 0 hese conditions will hold when (x) is one-to-one, yielding ˆ ( x) ( x) ( x y ( ) xy ) hus, the least squares solution is determined by the Pseudoinverse of the Design Matrix (x): ( ) x ( x) ( x) ( x) 6
7 (Unweighted) Least Squares If the design matrix (x) is one-to-one (has full column ran) the least squares solution is conceptually easy to compute. his will nearly always be the case in practice. Recall the example of fitting a line in the plane: x x Γ ( x) Γ( x) n x x n x x x n Γ ( x) y y y n x x n xy y n 7
8 (Unweighted)Least squares Pseudoinverse solution = LS solution: ˆ ( x) y ( x) ( x) ( ) he LS line fit solution is which (naively) requires a x matrix inversion followed by a x vector post-multiplication x ˆ y x x xy x y 8
9 (Unweighted) Least Squares What about fitting th order polynomial? : f ( x; ) i x i 0 i x ( x) x n x ( x) ( x) x x n xn x n x x 9
10 (Unweighted) Least squares and y y ( x) y n x xn y n x y hus, when (x) is one-to-one (has full column ran): ˆ ( x) y ( x) ( x) ( x) y Mathematically, this is a very straightforward procedure. Numerically, this is generally NO how the solution is computed. (Viz. the sophisticated algorithms in Matlab) K x y K K x x x y 0
11 (Unweighted) Least Squares Note the computational costs when (naively) fitting a line ( st order polynomial): x ˆ y x x xy x matrix inversion, followed by a x vector post-multiplication when (naively) fitting a general th order polynomial: x y ˆ x x x y (+)x(+) matrix inversion, followed by a (+)x multiplication It is evident that the computational complexity is an increasing function of the number of parameters. More efficient (and numerically stable) algorithms are used in practice, but complexity scaling with number of parameters still remains true.
12 (Unweighted) Least Squares Suppose the dataset {x i } is constant in value over repeated, non-constant measurements of the dataset {y i }. e.g. consider some process (e.g., temperature) where the measurements {y i } are taen at the same locations {x i } every day hen the design matrix (x) is constant in value! In advance (off-line) compute: Everyday (on-line) re-compute: A x x x y ˆ A x y Hence, least squares sometimes can be implemented very efficiently. here are also efficient recursive updates, e.g. the Kalman filter.
13 Geometric Solution & Interpretation Alternatively, we can derive the least squares solution using geometric (Hilbert Space) considerations. Goal: minimize the size (norm) of the model prediction error (aa residual error), e ( ) = y - (x) : L( ) e( ) y ( x) Note that given a nown design matrix (x) the vector ( x) i i is a linear combination of the column vectors i. I.e. (x) is in the range space (column space) of (x) 3
14 (Hilbert Space) Geometric Interpretation he vector lives in the range (column space) of = (x) = [,, ] : y y ˆ ˆ Assume that y is as shown. Equivalent statements are: ˆ is the value of closest to y in the range of. ˆ is the orthogonal projection of y onto the range of. he residual e = y- is to the range of. 4
15 Geometric Interpretation of LS e = y - is to the range of iff y ˆ '' " y '' " y ˆ ˆ hus e = y - ˆ is in the nullspace of : y ˆ 0 ˆ y 0 y ˆ y 0 0 5
16 Geometric interpretation of LS Note: Nullspace Condition Normal Equation: y ˆ 0 ˆ y hus, if is one-to-one (has full column ran) we again get the pseudoinverse solution: ˆ ( x) y ( x) ( x) ( x) y 6
17 Probabilistic Interpretation We have seen that estimating a parameter by minimizing the least squares loss function is a special case of MLE his interpretation holds for many loss functions First, note that ˆ Now note that, because argminl y, f ( x; ) arg maxe L y, f ( x; ) L y, f ( x; ) e 0, y, x we can mae this exponential function into a Y X pdf by an appropriate normalization. 7
18 Prob. Interpretation of Loss Minimization I.e. by defining P ( y x; ) YX L y, f ( x ; ) L y, f ( x; ) L y, f ( x; ) ( x; ) e e e dy If the normalization constant (x;) does not depend on, (x;) = (x), then ˆ y f x arg maxe L, ( ; ) arg max P y x; YX which maes the problem a special case of MLE 8
19 Prob. Interpretation of Loss Minimization Note that for loss functions of the form, L( ) g[ y f ( x; ) ] the model f(x; ) only changes the mean of Y A shift in mean does not change the shape of a pdf, and therefore cannot change the value of the normalization constant Hence, for loss functions of the type above, it is always true that ˆ y f x arg maxe L, ( ; ) arg max P y x; YX 9
20 Regression his leads to the interpretation that we saw last time Which is the usual definition of a regression problem two random variables X and Y a dataset of examples D = {(x,y ),,(x n,y n )} a parametric model of the form y f ( x; ) where is a parameter vector, and a random variable that accounts for noise the pdf of the noise determines the loss in the other formulation 0
21 Regression Error pdf: Gaussian P ( ) Laplacian P ( ) Rayleigh P ( ) e e e Equivalent Loss Function: L distance L( x, y) ( y x) L distance L( x, y) y x Rayleigh distance L( x, y) ( y x) log( y x)
22 Regression We now how to solve the problem with losses Why would we want the added complexity incurred by introducing error probability models? he main reason is that this allows a data driven definition of the loss One good way to see this is the problem of weighted least squares Suppose that you now that not all measurements (x i,y i ) have the same importance his can be encoded in the loss function Remember that the unweighted loss is L y x y x ( ) ( ) i i
23 Regression o weigh different points differently we could use L w y ( x), w 0, i i i i i or even the more generic form L y ( x) W y ( x), W W 0, In the latter case the solution is (homewor) * he question is how do I now these weights? Without a probabilistic model one has little guidance on this. ( x) W ( x) ( x) W y 3
24 Regression he probabilistic equivalent * arg max e L y, f ( x; ) argmaxexp ( y ( x) ) W ( y ( x) ) is the MLE for a Gaussian pdf of nown covariance W In the case where the covariance W is diagonal we have i L y x i ( ) i In this case, each point is weighted by the inverse variance. i 4
25 Regression his maes sense Under the probabilistic formulation the variance i is the variance of the error associated with the i th observation his means that it is a measure of the uncertainty of the observation When y f ( x ; ) i i i W we are weighting each point by the inverse of its uncertainty (variance) We can also chec the goodness of this weighting matrix by plotting the histogram of the errors if W is chosen correctly, the errors should be Gaussian 5
26 Model Validation In fact, by analyzing the errors of the fit we can say a lot his is called model validation Leave some data on the side, and run it through the predictor Analyze the errors to see if there is deviation from the assumed model (Gaussianity for least squares) 6
27 Model Validation & Improvement Many times this will give you hints to alternative models that may fit the data better ypical problems for least squares (and solutions) 7
28 Model Validation & Improvement Example error histogram this does not loo Gaussian loo at the scatter plot of the error (y f(x, *)) increasing trend, maybe we should try log( y ) f ( x ; ) i i i 8
29 Model Validation & Improvement Example error histogram for the new model this loos Gaussian this model is probably better there are statistical tests that you can use to chec this objectively these are covered in statistics classes 9
30 Model Validation & Improvement Example error histogram his also does not loo Gaussian Checing the scatter plot now seems to suggest to try y f ( x ; ) i i i 30
31 Model Validation & Improvement Example error histogram for the new model Once again, seems to wor he residual behavior loos Gaussian However, It is NO always the case that such changes will wor. If not, maybe the problem is the assumption of Gaussianity itself Move away from least squares, try MLE with other error pdfs 3
32 END 3
Dimensionality Reduction and Principle Components
Dimensionality Reduction and Principle Components Ken Kreutz-Delgado (Nuno Vasconcelos) UCSD ECE Department Winter 2012 Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,...,
More informationBayes Decision Theory - I
Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find
More informationDS-GA 1002 Lecture notes 10 November 23, Linear models
DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationNormed & Inner Product Vector Spaces
Normed & Inner Product Vector Spaces ECE 174 Introduction to Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 27 Normed
More informationDimensionality Reduction and Principal Components
Dimensionality Reduction and Principal Components Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,..., M} and observations of X
More informationCh. 12 Linear Bayesian Estimators
Ch. 1 Linear Bayesian Estimators 1 In chapter 11 we saw: the MMSE estimator takes a simple form when and are jointly Gaussian it is linear and used only the 1 st and nd order moments (means and covariances).
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationPseudoinverse and Adjoint Operators
ECE 275AB Lecture 5 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 5 ECE 275A Pseudoinverse and Adjoint Operators ECE 275AB Lecture 5 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p.
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationECE 275A Homework # 3 Due Thursday 10/27/2016
ECE 275A Homework # 3 Due Thursday 10/27/2016 Reading: In addition to the lecture material presented in class, students are to read and study the following: A. The material in Section 4.11 of Moon & Stirling
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationLinear regression. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Linear regression DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall15 Carlos Fernandez-Granda Linear models Least-squares estimation Overfitting Example:
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationManaging Uncertainty
Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian
More informationLinear Prediction Theory
Linear Prediction Theory Joseph A. O Sullivan ESE 524 Spring 29 March 3, 29 Overview The problem of estimating a value of a random process given other values of the random process is pervasive. Many problems
More informationMetric-based classifiers. Nuno Vasconcelos UCSD
Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major
More informationThe Hilbert Space of Random Variables
The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More information10-701/15-781, Machine Learning: Homework 4
10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationLinear Regression 1 / 25. Karl Stratos. June 18, 2018
Linear Regression Karl Stratos June 18, 2018 1 / 25 The Regression Problem Problem. Find a desired input-output mapping f : X R where the output is a real value. x = = y = 0.1 How much should I turn my
More informationSTAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)
STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points
More informationECE 636: Systems identification
ECE 636: Systems identification Lectures 3 4 Random variables/signals (continued) Random/stochastic vectors Random signals and linear systems Random signals in the frequency domain υ ε x S z + y Experimental
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More information6.435, System Identification
SET 6 System Identification 6.435 Parametrized model structures One-step predictor Identifiability Munther A. Dahleh 1 Models of LTI Systems A complete model u = input y = output e = noise (with PDF).
More informationEM Algorithm & High Dimensional Data
EM Algorithm & High Dimensional Data Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Gaussian EM Algorithm For the Gaussian mixture model, we have Expectation Step (E-Step): Maximization Step (M-Step): 2 EM
More informationPractice Final Exam Solutions for Calculus II, Math 1502, December 5, 2013
Practice Final Exam Solutions for Calculus II, Math 5, December 5, 3 Name: Section: Name of TA: This test is to be taken without calculators and notes of any sorts. The allowed time is hours and 5 minutes.
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationESTIMATION THEORY. Chapter Estimation of Random Variables
Chapter ESTIMATION THEORY. Estimation of Random Variables Suppose X,Y,Y 2,...,Y n are random variables defined on the same probability space (Ω, S,P). We consider Y,...,Y n to be the observed random variables
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More informationt x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.
Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae
More informationROBOTICS 01PEEQW. Basilio Bona DAUIN Politecnico di Torino
ROBOTICS 01PEEQW Basilio Bona DAUIN Politecnico di Torino Probabilistic Fundamentals in Robotics Gaussian Filters Course Outline Basic mathematical framework Probabilistic models of mobile robots Mobile
More informationCopula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011
Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models
More informationInverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1
Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is
More informationIntroduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric
CS 189 Introduction to Machine Learning Fall 2017 Note 5 1 Overview Recall from our previous note that for a fixed input x, our measurement Y is a noisy measurement of the true underlying response f x):
More informationFoundation of Intelligent Systems, Part I. Regression 2
Foundation of Intelligent Systems, Part I Regression 2 mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Some Words on the Survey Not enough answers to say anything meaningful! Try again: survey. FIS - 2013 2 Last
More informationDATA ASSIMILATION FOR FLOOD FORECASTING
DATA ASSIMILATION FOR FLOOD FORECASTING Arnold Heemin Delft University of Technology 09/16/14 1 Data assimilation is the incorporation of measurement into a numerical model to improve the model results
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationSingular Value Decomposition (SVD)
School of Computing National University of Singapore CS CS524 Theoretical Foundations of Multimedia More Linear Algebra Singular Value Decomposition (SVD) The highpoint of linear algebra Gilbert Strang
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationTHE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5
THE ROAL STATISTICAL SOCIET 6 EAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE The Society is providing these solutions to assist candidates preparing for the examinations in 7. The solutions are intended
More informationStochastic State Estimation
Chapter 7 Stochastic State Estimation Perhaps the most important part of studying a problem in robotics or vision, as well as in most other sciences, is to determine a good model for the phenomena and
More informationNeural networks: Associative memory
Neural networs: Associative memory Prof. Sven Lončarić sven.loncaric@fer.hr http://www.fer.hr/ipg 1 Overview of topics l Introduction l Associative memories l Correlation matrix as an associative memory
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationAutonomous Navigation for Flying Robots
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 6.2: Kalman Filter Jürgen Sturm Technische Universität München Motivation Bayes filter is a useful tool for state
More informationExtension of the Sparse Grid Quadrature Filter
Extension of the Sparse Grid Quadrature Filter Yang Cheng Mississippi State University Mississippi State, MS 39762 Email: cheng@ae.msstate.edu Yang Tian Harbin Institute of Technology Harbin, Heilongjiang
More informationShort-Term Electricity Price Forecasting
1 Short-erm Electricity Price Forecasting A. Arabali, E. Chalo, Student Members IEEE, M. Etezadi-Amoli, LSM, IEEE, M. S. Fadali, SM IEEE Abstract Price forecasting has become an important tool in the planning
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More informationAdvanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)
Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,
More informationLinear regression COMS 4771
Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between
More informationEstimation and Prediction Scenarios
Recursive BLUE BLUP and the Kalman filter: Estimation and Prediction Scenarios Amir Khodabandeh GNSS Research Centre, Curtin University of Technology, Perth, Australia IUGG 2011, Recursive 28 June BLUE-BLUP
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationHand Written Digit Recognition using Kalman Filter
International Journal of Electronics and Communication Engineering. ISSN 0974-2166 Volume 5, Number 4 (2012), pp. 425-434 International Research Publication House http://www.irphouse.com Hand Written Digit
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationSIMON FRASER UNIVERSITY School of Engineering Science
SIMON FRASER UNIVERSITY School of Engineering Science Course Outline ENSC 810-3 Digital Signal Processing Calendar Description This course covers advanced digital signal processing techniques. The main
More informationStatistical Machine Learning, Part I. Regression 2
Statistical Machine Learning, Part I Regression 2 mcuturi@i.kyoto-u.ac.jp SML-2015 1 Last Week Regression: highlight a functional relationship between a predicted variable and predictors SML-2015 2 Last
More information1 Cricket chirps: an example
Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationIntroduction: The Perceptron
Introduction: The Perceptron Haim Sompolinsy, MIT October 4, 203 Perceptron Architecture The simplest type of perceptron has a single layer of weights connecting the inputs and output. Formally, the perceptron
More informationEKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics
EKF, UKF Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Kalman Filter Kalman Filter = special case of a Bayes filter with dynamics model and sensory
More informationVelocity (Kalman Filter) Velocity (knots) Time (sec) Kalman Filter Accelerations 2.5. Acceleration (m/s2)
KALMAN FILERING 1 50 Velocity (Kalman Filter) 45 40 35 Velocity (nots) 30 25 20 15 10 5 0 0 10 20 30 40 50 60 70 80 90 100 ime (sec) 3 Kalman Filter Accelerations 2.5 Acceleration (m/s2) 2 1.5 1 0.5 0
More informationEKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics
EKF, UKF Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Kalman Filter Kalman Filter = special case of a Bayes filter with dynamics model and sensory
More informationLINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS
LINEAR ALGEBRA, -I PARTIAL EXAM SOLUTIONS TO PRACTICE PROBLEMS Problem (a) For each of the two matrices below, (i) determine whether it is diagonalizable, (ii) determine whether it is orthogonally diagonalizable,
More informationLecture Notes Special: Using Neural Networks for Mean-Squared Estimation. So: How can we evaluate E(X Y )? What is a neural network? How is it useful?
Lecture Notes Special: Using Neural Networks for Mean-Squared Estimation The Story So Far... So: How can we evaluate E(X Y )? What is a neural network? How is it useful? EE 278: Using Neural Networks for
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationconditional cdf, conditional pdf, total probability theorem?
6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random
More informationCLASS NOTES Computational Methods for Engineering Applications I Spring 2015
CLASS NOTES Computational Methods for Engineering Applications I Spring 2015 Petros Koumoutsakos Gerardo Tauriello (Last update: July 27, 2015) IMPORTANT DISCLAIMERS 1. REFERENCES: Much of the material
More informationDS-GA 1002 Lecture notes 12 Fall Linear regression
DS-GA Lecture notes 1 Fall 16 1 Linear models Linear regression In statistics, regression consists of learning a function relating a certain quantity of interest y, the response or dependent variable,
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationKalman Filter Computer Vision (Kris Kitani) Carnegie Mellon University
Kalman Filter 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Examples up to now have been discrete (binary) random variables Kalman filtering can be seen as a special case of a temporal
More informationKalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q
Kalman Filter Kalman Filter Predict: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q Update: K = P k k 1 Hk T (H k P k k 1 Hk T + R) 1 x k k = x k k 1 + K(z k H k x k k 1 ) P k k =(I
More informationShape of Gaussians as Feature Descriptors
Shape of Gaussians as Feature Descriptors Liyu Gong, Tianjiang Wang and Fang Liu Intelligent and Distributed Computing Lab, School of Computer Science and Technology Huazhong University of Science and
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More information4 Derivations of the Discrete-Time Kalman Filter
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More information2D Image Processing. Bayes filter implementation: Kalman filter
2D Image Processing Bayes filter implementation: Kalman filter Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de
More informationProbabilistic Fundamentals in Robotics. DAUIN Politecnico di Torino July 2010
Probabilistic Fundamentals in Robotics Gaussian Filters Basilio Bona DAUIN Politecnico di Torino July 2010 Course Outline Basic mathematical framework Probabilistic models of mobile robots Mobile robot
More informationMODULE 8 Topics: Null space, range, column space, row space and rank of a matrix
MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix Definition: Let L : V 1 V 2 be a linear operator. The null space N (L) of L is the subspace of V 1 defined by N (L) = {x
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More informationLeast squares: the big idea
Notes for 2016-02-22 Least squares: the big idea Least squares problems are a special sort of minimization problem. Suppose A R m n where m > n. In general, we cannot solve the overdetermined system Ax
More informationVector Space Concepts
Vector Space Concepts ECE 174 Introduction to Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 25 Vector Space Theory
More informationPART II : Least-Squares Approximation
PART II : Least-Squares Approximation Basic theory Let U be an inner product space. Let V be a subspace of U. For any g U, we look for a least-squares approximation of g in the subspace V min f V f g 2,
More informationME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander
ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 Prof. Steven Waslander p(a): Probability that A is true 0 pa ( ) 1 p( True) 1, p( False) 0 p( A B) p( A) p( B) p( A B) A A B B 2 Discrete Random Variable X
More informationShannon meets Wiener II: On MMSE estimation in successive decoding schemes
Shannon meets Wiener II: On MMSE estimation in successive decoding schemes G. David Forney, Jr. MIT Cambridge, MA 0239 USA forneyd@comcast.net Abstract We continue to discuss why MMSE estimation arises
More informationTwo-Dimensional Signal Processing and Image De-noising
Two-Dimensional Signal Processing and Image De-noising Alec Koppel, Mark Eisen, Alejandro Ribeiro March 12, 2018 Until now, we considered (one-dimensional) discrete signals of the form x : [0, N 1] C of
More informationMATH 680 Fall November 27, Homework 3
MATH 680 Fall 208 November 27, 208 Homework 3 This homework is due on December 9 at :59pm. Provide both pdf, R files. Make an individual R file with proper comments for each sub-problem. Subgradients and
More informationBayesian decision theory. Nuno Vasconcelos ECE Department, UCSD
Bayesian decision theory Nuno Vasconcelos ECE Department, UCSD Notation the notation in DHS is quite sloppy e.g. show that ( error = ( error z ( z dz really not clear what this means we will use the following
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationMultidisciplinary System Design Optimization (MSDO)
oday s opics Multidisciplinary System Design Optimization (MSDO) Approimation Methods Lecture 9 6 April 004 Design variable lining Reduced-Basis Methods Response Surface Approimations Kriging Variable-Fidelity
More informationSimple Linear Regression for the Climate Data
Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO
More information