Figure 1: Visualising the input features. Figure 2: Visualising the input-output data pairs.
|
|
- Dina Potter
- 5 years ago
- Views:
Transcription
1 Regression. Data visualisation (a) To plot the data in x trn and x tst, we could use MATLAB function scatter. For example, the code for plotting x trn is scatter(x_trn(:,), x_trn(:,)); x (a) x trn..... x (b) x tst Figure : Visualising the input features. The ranges of the values of both features x and in x trn and x tst are the same. A gap. < x <. exists in x trn but not in x tst. (b) To plot the input-output data in -d, we could use function scatter. For example, the code for plotting the training data pair x trn- trn is scatter(x_trn(:,), x_trn(:,), _trn); x x (a) x trn- trn pair (b) x tst- tst pair Figure : Visualising the input-output data pairs. The data forms a curved surface with a single peak around (,.). It indicates that it would be rather difficult to capture the x- relation with a simple linear regression model.
2 . Linear regression (a) The maximum likelihood estimation (MLE) for the linear regression model is given b (in the matrix form) where X is the design matrix w mle = ( X X ) X trn, () x x X =.... () x N x N The code that implements eq. () is given b (MATLAB supports matrix manipulation nicel.) X = [ones(num_trn,) x_trn]; w_mle = (X *X)\X *_trn; % construct design matrix % MLE The maximum likelihood estimator ŵ = (ŵ, ŵ, ŵ ) is (b) The mean squared error (MSE) is defined b ŵ =.8, ŵ =.6, ŵ =.6. () MSE = N N ( n ŷ n ), () n= where ŷ = ˆf(x) = ŵ + ŵ x + ŵ is the estimated value given b the model. The code for calculating the MSE on both the training set and the test set is mle_trn_mse = mean((_trn - X * w_mle).^); _mle_tst = [ones(num_tst,) x_tst] * w_mle; mle_tst_mse = mean((_tst - _mle_tst).^); and the result is % MSE on training set % calculate predicted value of on the test set % MSE on test set mle trn mse =.896, mle tst mse = () To estimate and evaluate the dumb model = w, we onl need to estimate w using the mean of the data trn, and calculate the MSEs _dumb = mean(_trn); dumb_trn_mse = mean((_trn - _dumb).^); dumb_tst_mse = mean((_tst - _dumb).^); The result is % estimate dumb model % MSE on training set % MSE on test set dumb trn mse =.9, dumb trn mse =.6. (6) [You were not asked to comment on the results in the question, but it is interesting to do so. Note that mle_trn_mse is smaller than mle_tst_mse; this is unlikel to be due to overfitting given that there are onl parameters to fit more likel it is due to the different x-data distribution between training and testing sets. Also, as one would expect the dumb predictor has higher training and test error than the more complex linear model; the dumb model is obtained b setting w = w = in the linear regression model.] (c) The maximum likelihood estimator for the noise variance σ η is ˆσ η = N N ( n ŷ n ). (7) n= This is identical to the MSE on the training set. Therefore, we have ˆσ η = mle trn mse =.896. (8) This estimated noise variance is much larger than the true one (σ η =.). This is due to the fact that this simple linear regression model cannot capture the curved surface of the data well.
3 . Baesian linear regression (a) The function value at the input point x = (, ) is f(x ) = f = w + w + w. (9) Since w, w, w are Gaussian random variables, the linear combination of them is also Gaussian. Therefore, we have f(x ) N (µ f, σ f ) with mean µ f and variance σ f µ f = µ + µ + µ = + + =, () σ f = σ + σ + (σ ) = + + = 6. () (b) To plot the sampled function f(x w), we first create a grid for plotting, then sample the weights. For each sample evaluate the function value on the grid and do surf plot (without colour for better visualisation). [gridx, gridx] = meshgrid(-:.:,.:.:.); grid = randn + gridx*randn + gridx*randn; surf(gridx, gridx, grid, facecolor, none ); % create grid % sample weights and evaluate f(x) % plot x x x (a) sample (b) sample (c) sample Figure : Function f(x w) given three different samples of w N (, I ). The function alwas gives a -d plane in the -d input-output space. Again this propert makes it difficult to use the linear regression model to capture our data. On the grid, the output values range around [, ], which is consistent with the characteristic variance of f(x w) as its variance at the point (,.) (the point on the grid with the largest absolute value in both x and ) is. RBF regression σ f = σ + (σ ) + (.σ ) = 7... () (a) Drawing RBF functions is similar to drawing linear functions. The extra step we need is to work out the outputs φ(x) of all RBF bases for an input point x. This is done b the function eval rbf bases. [gridx, gridx] = meshgrid(-.:.:., -:.:); grid_dim = size(gridx); grid_phi = eval_rbf_bases(rbf_net, [gridx(:), gridx(:)]); rbf_net.w = /sqrt(rbf_net.alpha) * randn(gridside^, ); rbf_net.b = /sqrt(rbf_net.alpha) * randn; grid = rbf_net.b + grid_phi * rbf_net.w; surf(gridx, gridx, reshape(grid, grid_dim), edgecolor, none ); % a larger grid for plot % evaluate RBF outputs % sample weights % evaluate function (x) % surf plot
4 6 x x x (a) sample x x x (b) sample (c) sample Figure : grid RBF function given three difference samples of w N (, I6 ). Three sample functions are shown in Figure. The RBF function is nonlinear. More specificall, it is a combination of a set of Gaussian bumps/rbfs (in our case there are ) on a horizontal plane. The size of each bump/rbf is determined b the weights associated with it, and the height of the vertical offset is determined b the bias term w (i.e. rbf net.b in the rbf net object). Because of this nonlinearit it should be able to better describe the data than the linear regression model. Observe that the lengthscales of variation in the plots are rather shorter than in the data, but this arises from choosing each weight independentl in the prior. Thus the RBF network should be able to model the given data (b) The variance of the Gaussian prior over the weights N (, α I6 ) is proportional to /α. As α increases, the variance will decrease and the Gaussian prior will be more squeezed around. The sample of each weight will thus be closer to zero when α becomes larger. As a result, the vertical scaling of the plots is closer to zero, and the size of the Gaussian bumps will decrease... x x (a) α = x x (b) α = x x (c) α = Figure : Change α, resample w N (, α I6 ) and RBF function. Notice the scale on the -axis. (c) The MSE errors of the maximum a posteriori (MAP) estimator on both the training and the test sets can be calculated b rbf_map_trn_mse = mean((rbf_net.b + trn_phi * rbf_net.w - _trn).^); tst_phi = eval_rbf_bases(rbf_net, x_tst); % RBF outputs for test points rbf_map_tst_mse = mean((rbf_net.b + tst_phi * rbf_net.w - _tst).^); The result is rbf map trn mse =., rbf map tst mse =.7. () (d) With the new design matrix trn_phi, the MLE for the RBF model has the same form as the MLE for the linear regression model
5 w_rbf_mle = (trn_phi *trn_phi)\trn_phi *_trn; The MSE on both the training and the test sets are % MLE (given the design matrix) rbf mle trn mse =.9, rbf mle tst mse =.78. () The result is slightl better than that given b MAP. Given the fact that there are points in the training set while onl 6 parameters to fit, we have a low risk of overfitting the data. However, doing MAP is not a bad choice on this problem, as it gives onl a slightl worse result on MSE and it enables us to do a full probabilistic analsis on the model, such as providing the predictive variances. Both MSE scores of the RBF model are better than those of the linear regression model b an order of magnitude. This is due to the fact that the RBF model successfull captures the nonlinearit. Note again the differences between training and test errors, which again ma be due to the different x-data distribution between training and testing sets. (e) Recall that the noise variance is identical to the MSE on the training set. Therefore, ˆσ η = rbf mle trn mse =.9. () This estimate is much smaller than the one given b the linear regression model (ˆσ η =.896) but is still larger than the true value σ η =.. (f) To visualise the predictive variances in a region, we first need to specif a grid, then evaluate the predictive variance for ever point on that grid, and finall make the plot. [gridx, gridx] = meshgrid(-:.:, -:.:); grid_dim = size(gridx); grid_phi = eval_rbf_bases(rbf_net, [gridx(:), gridx(:)]); grid_phi = [ones(size(grid_phi, ), ) grid_phi]; % create grid var_pred = zeros(size(grid_phi, ), ); % predictive variance Vinv = trn_phi *trn_phi/std_n^ + rbf_net.alpha*ee(d); for ii = :size(grid_phi, ) var_pred(ii) = grid_phi(ii,:) * inv(vinv) * grid_phi(ii,:) + std_n^; end imagesc(-:.:, -:.:, reshape(var_pred, grid_dim)); set(gca, YDir, normal ) colorbar It might be more convenient to visualise in the unit of standard deviation std_pred = sqrt(var_pred); imagesc(-:.:, -:.:, reshape(std_pred, grid_dim)); In the assignment we were given V N = ση(ασ ηi + Φ T Φ), which can be rewritten as V N = (αi + ση Φ T Φ). In the code we have used inv(vinv) although one could also use the / operator. The lowest predictive variance is in the two areas ( < x <.,. < <.) and (. < x <,. < <.), which are separated b the gap. < x <.. The predictive variance in the gap area is relativel high. Compared to Figure, we find that the two low variance areas are where the data is, which helps to reduce the variance in prediction. In fact, the lowest predictive variance read from the figure is onl slightl larger., which means that the uncertaint of the prediction at these low variance points mainl results from the Gaussian noise η. There are five RBF centers in the gap, but little training data that can be used to reduce the prior uncertaint over the weights associated with them. Thus the uncertaint of the weights remains large, leading to a large predictive variance. Finall, for those points that are far awa from RBFs, their RBF outputs are ver small and thus the uncertaint in the weights cannot pass to the prediction except w f(x) = w + w k φ k (x) w, when φ k (x). (6) Therefore, we see a constant predictive variance of α + σ η far from the central area. k=
6 x x x. (a) predictive variance (b) predictive standard deviation Figure 6: The predictive variance and standard deviation of the RBF model plotted on the grid. 6
Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationRegression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.
Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions
More informationMathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory
Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 206 Jonathan Pillow Homework 8: Logistic Regression & Information Theory Due: Tuesday, April 26, 9:59am Optimization Toolbox One
More informationMachine Learning (CSE 446): Probabilistic Machine Learning
Machine Learning (CSE 446): Probabilistic Machine Learning oah Smith c 2017 University of Washington nasmith@cs.washington.edu ovember 1, 2017 1 / 24 Understanding MLE y 1 MLE π^ You can think of MLE as
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if
More informationLearning from Data: Regression
November 3, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Classification or Regression? Classification: want to learn a discrete target variable. Regression: want to learn a continuous target variable. Linear
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationLinear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1
Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationInference about the Slope and Intercept
Inference about the Slope and Intercept Recall, we have established that the least square estimates and 0 are linear combinations of the Y i s. Further, we have showed that the are unbiased and have the
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationStatistical foundations
Statistical foundations Michael K. Tippett International Research Institute for Climate and Societ The Earth Institute, Columbia Universit ERFS Climate Predictabilit Tool Training Workshop Ma 4-9, 29 Ideas
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationChapter 7: Model Assessment and Selection
Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationEstimators as Random Variables
Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maimum likelihood Consistency Confidence intervals Properties of the mean estimator Introduction Up until
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationBayesian Linear Regression. Sargur Srihari
Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationESS2222. Lecture 3 Bias-Variance Trade-off
ESS2222 Lecture 3 Bias-Variance Trade-off Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Bias-Variance Trade-off Overfitting & Regularization Ridge & Lasso Regression Nonlinear
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationSupervised Learning Coursework
Supervised Learning Coursework John Shawe-Taylor Tom Diethe Dorota Glowacka November 30, 2009; submission date: noon December 18, 2009 Abstract Using a series of synthetic examples, in this exercise session
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationAdvanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)
Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More information6.867 Machine Learning
6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationMachine Learning Foundations
Machine Learning Foundations ( 機器學習基石 ) Lecture 13: Hazard of Overfitting Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan Universit
More informationLecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population
Lecture 5 1 Lecture 3 The Population Variance The population variance, denoted σ 2, is the sum of the squared deviations about the population mean divided by the number of observations in the population,
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationLecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1
Lecture 13: Data Modelling and Distributions Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Why data distributions? It is a well established fact that many naturally occurring
More informationReminders. Thought questions should be submitted on eclass. Please list the section related to the thought question
Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a
More informationLinear regression Class 25, Jeremy Orloff and Jonathan Bloom
1 Learning Goals Linear regression Class 25, 18.05 Jerem Orloff and Jonathan Bloom 1. Be able to use the method of least squares to fit a line to bivariate data. 2. Be able to give a formula for the total
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationWeek 3: Linear Regression
Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to
More informationCS 340 Lec. 15: Linear Regression
CS 340 Lec. 15: Linear Regression AD February 2011 AD () February 2011 1 / 31 Regression Assume you are given some training data { x i, y i } N where x i R d and y i R c. Given an input test data x, you
More informationLinear regression COMS 4771
Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More informationSupport Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature
Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature suggests the design variables should be normalized to a range of [-1,1] or [0,1].
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationProblems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B
Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2
More informationProbability and Statistical Decision Theory
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Probability and Statistical Decision Theory Many slides attributable to: Erik Sudderth (UCI) Prof. Mike Hughes
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationAbstract Formulas for finding a weighted least-squares fit to a vector-to-vector transformation are provided in two cases: (1) when the mapping is
Abstract Formulas for finding a weighted least-squares fit to a vector-to-vector transformation are provided in two cases: ( when the mapping is available as a continuous analtical function on a known
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationWhen is MLE appropriate
When is MLE appropriate As a rule of thumb the following to assumptions need to be fulfilled to make MLE the appropriate method for estimation: The model is adequate. That is, we trust that one of the
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationCS 340 Lec. 16: Logistic Regression
CS 34 Lec. 6: Logistic Regression AD March AD ) March / 6 Introduction Assume you are given some training data { x i, y i } i= where xi R d and y i can take C different values. Given an input test data
More informationToday. Calculus. Linear Regression. Lagrange Multipliers
Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationUncertainty and Parameter Space Analysis in Visualization -
Uncertaint and Parameter Space Analsis in Visualiation - Session 4: Structural Uncertaint Analing the effect of uncertaint on the appearance of structures in scalar fields Rüdiger Westermann and Tobias
More informationRegression Models - Introduction
Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationLogistic regression: Miscellaneous topics
Logistic regression: Miscellaneous topics April 11 Introduction We have covered two approaches to inference for GLMs: the Wald approach and the likelihood ratio approach I claimed that the likelihood ratio
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationPDEEC Machine Learning 2016/17
PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationMachine Learning and Data Mining. Linear regression. Kalev Kask
Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance
More informationProbabilistic Reasoning in Deep Learning
Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian
More informationMachine Learning Gaussian Naïve Bayes Big Picture
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative
More informationMS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015
MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates
More informationWeek 5: Logistic Regression & Neural Networks
Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and
More informationIntroduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric
CS 189 Introduction to Machine Learning Fall 2017 Note 5 1 Overview Recall from our previous note that for a fixed input x, our measurement Y is a noisy measurement of the true underlying response f x):
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationLinear Regression. Chapter 3
Chapter 3 Linear Regression Once we ve acquired data with multiple variables, one very important question is how the variables are related. For example, we could ask for the relationship between people
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More information4 Bias-Variance for Ridge Regression (24 points)
2 count = 0 3 for x in self.x_test_ridge: 4 5 prediction = np.matmul(self.w_ridge,x) 6 ###ADD THE COMPUTED MEAN BACK TO THE PREDICTED VECTOR### 7 prediction = self.ss_y.inverse_transform(prediction) 8
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationLecture 4 Propagation of errors
Introduction Lecture 4 Propagation of errors Example: we measure the current (I and resistance (R of a resistor. Ohm's law: V = IR If we know the uncertainties (e.g. standard deviations in I and R, what
More informationReferences. Lecture 3: Shrinkage. The Power of Amnesia. Ockham s Razor
References Lecture 3: Shrinkage Isabelle Guon guoni@inf.ethz.ch Structural risk minimization for character recognition Isabelle Guon et al. http://clopinet.com/isabelle/papers/sr m.ps.z Kernel Ridge Regression
More informationLeast Squares. Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Winter UCSD
Least Squares Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 75A Winter 0 - UCSD (Unweighted) Least Squares Assume linearity in the unnown, deterministic model parameters Scalar, additive noise model: y f (
More informationParameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!
Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationCLASSIFICATION (DISCRIMINATION)
CLASSIFICATION (DISCRIMINATION) Caren Marzban http://www.nhn.ou.edu/ marzban Generalities This lecture relies on the previous lecture to some etent. So, read that one first. Also, as in the previous lecture,
More informationLinear regression. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Linear regression DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall15 Carlos Fernandez-Granda Linear models Least-squares estimation Overfitting Example:
More information