ECE521 Tutorial 2. Regression, GPs, Assignment 1. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel for the slides.
|
|
- Branden Owens
- 5 years ago
- Views:
Transcription
1 ECE521 Tutorial 2 Regression, GPs, Assignment 1 ECE521 Winter 2016 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel for the slides. ECE521 Tutorial 2 ECE521 Winter 2016 Credits to Alireza / 3
2 Outline 1 Linear regression, knn 2 Linear classification 3 Overview of Gaussian Processes ECE521 Tutorial 2 ECE521 Winter 2016 Credits to Alireza / 3
3 Alireza Makhzani ECE521 Tutorial: KNN and Linear Regression 2 / 18
4 Alireza Makhzani ECE521 Tutorial: KNN and Linear Regression 3 / 18
5 Alireza Makhzani ECE521 Tutorial: KNN and Linear Regression 4 / 18
6 Alireza Makhzani ECE521 Tutorial: KNN and Linear Regression 5 / 18
7 Gradient Descent Algorithm Gradient descent is based on the observation that a function decreases fastest if one goes in the direction of negative gradient. x n+1 = x n µ n f(x n ) Alireza Makhzani ECE521 Tutorial: KNN and Linear Regression 6 / 18
8 Momentum Momentum simply adds a fraction m of the previous weight update to the current one. v k = m k v k-1 η k f(x k ) x k+1 = x k + v k (1) Alireza Makhzani ECE521 Tutorial: KNN and Linear Regression 7 / 18
9 Validation Set Alireza Makhzani ECE521 Tutorial: KNN and Linear Regression 8 / 18
10 K-nearest neighbour Alireza Makhzani ECE521 Tutorial: KNN and Linear Regression 9 / 18
11 Alireza Makhzani ECE521 Tutorial: KNN and Linear Regression 10 / 18
12 Nonlinearity Alireza Makhzani ECE521 Tutorial: KNN and Linear Regression 11 / 18
13 Polynomial Curve Fitting Alireza Makhzani ECE521 Tutorial: KNN and Linear Regression 12 / 18
14 Alireza Makhzani ECE521 Tutorial: KNN and Linear Regression 13 / 18
15 Alireza Makhzani ECE521 Tutorial: KNN and Linear Regression 14 / 18
16 Alireza Makhzani ECE521 Tutorial: KNN and Linear Regression 15 / 18
17 Alireza Makhzani ECE521 Tutorial: KNN and Linear Regression 16 / 18
18 Preventing Overfitting Alireza Makhzani ECE521 Tutorial: KNN and Linear Regression 17 / 18
19 Alireza Makhzani ECE521 Tutorial: KNN and Linear Regression 18 / 18
20 Alireza Makhzani ECE521 Tutorial 2: Linear Classification 4 / 24
21 K-nearest neighbour K-nearest neighbour P (C k ) = K k K Alireza Makhzani ECE521 Tutorial 2: Linear Classification 6 / 24
22 Linear regression for classification Alireza Makhzani ECE521 Tutorial 2: Linear Classification 8 / 24
23 Linear Classification Linear regression for classification x T w + w 0 = w 0 + w 1 x 1 + w 2 x 2 Alireza Makhzani ECE521 Tutorial 2: Linear Classification 9 / 24
24 Linear regression for classification Linear regression for classification Alireza Makhzani ECE521 Tutorial 2: Linear Classification 10 / 24
25 Linear regression for classification Linear regression for classification Alireza Makhzani ECE521 Tutorial 2: Linear Classification 11 / 24
26 Methods for Regression We are given n data points x and corresponding regression targets y: D = {x (i), y i } n i=1 Two approaches 1 Restrict the class of functions (e.g., only linear) 2 Assign a probability to every possible function A. G. Schwing & R. Zemel (UofT) CSC 412/2506: Probabilistic Graphical Models / 42
27 1. Restrict the class of functions (e.g., only linear) Use data to learn the parameters w of a parametric model p(y x, w) Use parameters and test data x to obtain prediction y = arg max y p(y x, w) Examples: Logistic regression Support vector machine Neural networks Problem: How to find the right model? A. G. Schwing & R. Zemel (UofT) CSC 412/2506: Probabilistic Graphical Models / 42
28 2. Assign a probability to every possible function Use data to design a posterior p(w y, X) Use posterior and test data x to obtain predictive distribution (average over all models) p(y x, X, y) = p(y x, w)p(w X, y)dw Note difference to non-bayesian approach: single parameter vs. averaging Problem: How to assign a value to every possible function? Solution: Gaussian Processes A. G. Schwing & R. Zemel (UofT) CSC 412/2506: Probabilistic Graphical Models / 42
29 Weight-space view: Bayesian treatment of linear model Dataset: D = { x (i), y i } N i=1 x (i) R D, y i R X = [ x (1),..., x (N)], y = [y 1,..., y N ] Given D we now Compute posterior p(w y, X) Compute predictive distribution p(y x, X, y) A. G. Schwing & R. Zemel (UofT) CSC 412/2506: Probabilistic Graphical Models / 42
30 Posterior for standard linear model Likelihood: y = f (x) + ɛ f (x) = x w ɛ N (0, σ 2 n) p(y X, w) = i = i = p(y i x (i), w) ( 1 exp 1 2πσ 2 n 2σ 2 n ) (y i w x (i) ) ( 1 (2πσn) 2 exp 1 ) n/2 2σn 2 y X w 2 2 = N (X w, σ 2 ni) Prior: p(w) = N (0, Σ p ) A. G. Schwing & R. Zemel (UofT) CSC 412/2506: Probabilistic Graphical Models / 42
31 Posterior for standard linear model Likelihood: Prior: p(y X, w) = N (X w, σ 2 ni) p(w) = N (0, Σ p ) Expression for posterior: p(w y, X) = p(y X, w)p(w) p(y X) Expression for marginal likelihood: p(y X) = p(y X, w)p(w)dw A. G. Schwing & R. Zemel (UofT) CSC 412/2506: Probabilistic Graphical Models / 42
32 Posterior for standard linear model Recall: Bayes Theorem for multivariate Gaussian Given p(x) = N (µ, Λ 1 ) p(y x) = N (Ax + b, L 1 ) We obtain p(y) = N (Aµ + b, L 1 + AΛ 1 A ) ( ( ) ) p(x y) = N Γ A L(y b) + Λµ, Γ ( ) 1 Γ = Λ + A LA In our case: Posterior: p(y X, w) = N (X w, σ 2 ni) p(w) = N (0, Σ p ) p(w y, X) = N ( 1 σ 2 n ΓXy, Γ) Γ = ( Σ 1 p + 1 ) 1 σn 2 XX A. G. Schwing & R. Zemel (UofT) CSC 412/2506: Probabilistic Graphical Models / 42
33 Posterior for standard linear model p(w y, X) = N ( 1 σ 2 n ΓXy, Γ) Γ = ( Σ 1 p + 1 ) 1 σn 2 XX For Gaussian distributions: mean = mode (MAP estimate) Note similarity to Ridge regression A. G. Schwing & R. Zemel (UofT) CSC 412/2506: Probabilistic Graphical Models / 42
34 Given D we managed to Compute posterior p(w y, X) We still need to Compute predictive distribution p(y x, X, y) = p(y x, w)p(w X, y)dw A. G. Schwing & R. Zemel (UofT) CSC 412/2506: Probabilistic Graphical Models / 42
35 Predictive distribution Recall: Bayes Theorem for multivariate Gaussian Given p(x) = N (µ, Λ 1 ) p(y x) = N (Ax + b, L 1 ) We obtain p(y) = N (Aµ + b, L 1 + AΛ 1 A ) ( ( ) ) p(x y) = N Γ A L(y b) + Λµ, Γ ( ) 1 Γ = Λ + A LA Prediction: average over all parameter values, weighted by posterior p(y x, X, y) = p(y x, w)p(w X, y)dw = N (x w, σn)n 2 ( 1 σn 2 ΓXy, Γ)dw = N ( 1 σ 2 n x ΓXy, σ 2 n + x Γx ) A. G. Schwing & R. Zemel (UofT) CSC 412/2506: Probabilistic Graphical Models / 42
36 Summary: Prior: p(w) = N (0, Σ p ) Likelihood: p(y X, w) = N (X w, σ 2 ni) Posterior: p(w X, y) = N ( 1 ΓXy, Γ) with σn ( 2 Γ = Σ 1 p + 1 XX ) 1 σn 2 Predicitive distribution: p(y x, X, y) = N ( 1 σ 2 n x ΓXy, σ 2 n + x Γx ) A. G. Schwing & R. Zemel (UofT) CSC 412/2506: Probabilistic Graphical Models / 42
37 Multivariate Gaussian Distribution p(x 1,..., x n µ, Σ) = p(x µ, Σ) = 1 = ( (2π) n det(σ) exp 1 ) 2 (x µ) Σ 1 (x µ) = 1 ( (2π) n det(λ) exp 1 ) 1 2 (x µ) Λ(x µ) = p(x µ, Λ 1 ) x R n µ R n : mean vector Σ R n n : covariance matrix Λ = Σ 1 R n n : precision matrix A. G. Schwing & R. Zemel (UofT) CSC 412/2506: Probabilistic Graphical Models / 42
38 Short hand: Samples: x N (µ, Σ) = N (µ, Λ 1 ) Σ 1/2 randn(2,10000) + µ p(x1, x2 µ, Σ) x x x x 1 How do the plots look like if x 1 and x 2 are independent random variables? What are the entries of the covariance matrix? A. G. Schwing & R. Zemel (UofT) CSC 412/2506: Probabilistic Graphical Models / 42
39 Tools to deal with Gaussian distributions Completing the square: 1 2 (x µ) Σ 1 (x µ) = 1 2 x Σ 1 x + x Σ 1 µ + const express a given quadratic form as shown on the right hand side read of Σ and µ A. G. Schwing & R. Zemel (UofT) CSC 412/2506: Probabilistic Graphical Models / 42
40 Example to practice completing the square: [ ] x x = a x b N ( [ µa µ b Suppose x b given, what is p(x a x b )? ] [ Λaa Λ, ab Λ ba Λ bb ] 1 ) 1 2 ([ x a x b ] [ µa µ b ]) [ Λaa Λ ab Λ ba Λ bb ] ([ x a x b ] [ µa µ b ]) = 1 2 x a Λ aa x a + x a (Λ aa µ a Λ ab (x b µ b )) + const Covariance: Λ 1 aa Mean: µ a Λ 1 aa Λ ab (x b µ b ) p(x a x b ) = N (µ a Λ 1 aa Λ ab (x b µ b ), Λ 1 aa ) A. G. Schwing & R. Zemel (UofT) CSC 412/2506: Probabilistic Graphical Models / 42
41 Express p(x a x b ) = N (µ a Λ 1 aa Λ ab (x b µ b ), Λ 1 aa ) in terms of covariance matrix ( ) 1 ( ) Σaa Σ ab Λaa Λ = ab Σ ba Σ bb Λ ba Λ bb Matrix identity: ( A B C D with ) 1 = ( M D 1 CM M = (A BD 1 C) 1 MBD 1 D 1 + D 1 CMBD 1 ) Λ aa = (Σ aa Σ ab Σ 1 bb Σ ba) 1 Λ ab = (Σ aa Σ ab Σ 1 bb Σ ba) 1 Σ ab Σ 1 bb p(x a x b ) = N (µ a + Σ ab Σ 1 bb (x b µ b ), Σ aa Σ ab Σ 1 bb Σ ba) A. G. Schwing & R. Zemel (UofT) CSC 412/2506: Probabilistic Graphical Models / 42
42 Conditioning of Multivariate Gaussian Conditional: [ x x = a x b ] N ( [ µa µ b N ([ µa µ b ] [ Λaa Λ, ab Λ ba Λ bb ] [ Σaa Σ, ab Σ ba Σ bb ] 1 ) ]) p(x a x b ) = N (µ a Λ 1 aa Λ ab (x b µ b ), Λ 1 aa ) p(x a x b ) = N (µ a + Σ ab Σ 1 bb (x b µ b ), Σ aa Σ ab Σ 1 bb Σ ba) A. G. Schwing & R. Zemel (UofT) CSC 412/2506: Probabilistic Graphical Models / 42
43 TF Tips tf.argmin(input, axis=none, name=none, dimension=none), tf.argmax(input, axis=none, name=none, dimension=none) can find the indices of the biggest/smallest elements in the tensor. tf.nn.embedding lookup(params, ids, partition strategy= mod, name=none, validate indices=true, max norm=none) can look up ids in a list of tensors. matplotlib is a package for generating plots similar to Matlab. Pyplot is a useful subpackage under matplotlib. ECE521 Tutorial 2 ECE521 Winter 2016 Credits to Alireza / 3
CSC 411: Lecture 04: Logistic Regression
CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationRegression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.
Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationBayesian Linear Regression. Sargur Srihari
Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear
More information2.3. The Gaussian Distribution
78 2. PROBABILITY DISTRIBUTIONS Figure 2.5 Plots of the Dirichlet distribution over three variables, where the two horizontal axes are coordinates in the plane of the simplex and the vertical axis corresponds
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationCSC 411: Lecture 09: Naive Bayes
CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationMultivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationCSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes
CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationDEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER 204 205 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE hour Please note that the rubric of this paper is made different from many other papers.
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationProbability Distributions
Probability Distributions Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr 1 / 25 Outline Summarize the main properties of some of
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationMachine Learning for Signal Processing Bayes Classification
Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationCSC321 Lecture 2: Linear Regression
CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationLinear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0.
Gaussian Processes Gaussian Process Stochastic process: basically, a set of random variables. may be infinite. usually related in some way. Gaussian process: each variable has a Gaussian distribution every
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationReliability Monitoring Using Log Gaussian Process Regression
COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationProbabilistic & Bayesian deep learning. Andreas Damianou
Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationProbabilistic Machine Learning
Probabilistic Machine Learning by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Probabilistic Linear Regression I... Maxiu Likelihood Solution II... Maxiu-a-Posteriori
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More information1 Bayesian Linear Regression (BLR)
Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationSystem identification and control with (deep) Gaussian processes. Andreas Damianou
System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More information1 Machine Learning Concepts (16 points)
CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions
More information