CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings
|
|
- Ella Shepherd
- 5 years ago
- Views:
Transcription
1 CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1
2 Motivation CS8803: STR Hilbert Space Embeddings 2
3 Overview Multinomial Distributions Marginal, Joint, Conditional Sum, Product, Bayes rules Hilbert Space Embeddings Marginal, Joint, Conditional Sum, Product, Bayes rules Gram/Kernel Matrices CS8803: STR Hilbert Space Embeddings 3
4 Multinomial Distributions Marginal Probabilities: P[Y ] µ Y = Y Joint Probabilities: P[Y,X] YX = X Y Conditional Probabilities: P[Y X] Y X = X CS8803: STR Hilbert Space Embeddings 4
5 Sum Rule P[Y ]= X X P[YX] µ Y = YX 1 = µ Y YX 1 CS8803: STR Hilbert Space Embeddings 5
6 Product Rule P[Y,X]=P[Y X]P[X] Y X = YX 1 XX YX = Y X XX = Y X YX 1 XX CS8803: STR Hilbert Space Embeddings 6
7 Sum Rule (Revisited) P[Y ]= X X P[Y,X] = X X P[Y X]P[X] µ Y = YX 1 = YX 1 XX µ X = µ Y YX 1 XX µ X CS8803: STR Hilbert Space Embeddings 7
8 Conditioning P[Y X = x] =P[Y X] (X = x) µ Y x = Y X µ x = µ Y x YX 1 XX µ x CS8803: STR Hilbert Space Embeddings 8
9 Bayes Rule etc. P[X Y ]= P[Y X]P[X] P[Y ] X Y =( Y X XX ) > 1 YY = XY 1 YY Y X Y Y X = ( X X )Y X Y ( Y X XX ) > 1 YY CS8803: STR Hilbert Space Embeddings 9
10 Bayes Rule etc. P[X Y ]= P[Y X]P[X] P[Y ] X Y =( Y X XX ) > 1 YY = XY 1 YY P[X Y = y] = P[Y = y X]P[X] P[Y = y] µ X y =( Y X XX ) > 1 YY µ y = XY 1 YY µ y CS8803: STR Hilbert Space Embeddings 10
11 Bayes Rule etc. P[X Y = y, Z = z] = P[X, Y = y Z] (Z = z) P[Y = y Z] (Z = z) XY z = (XY )Z 1 ZZµ YY z = (YY)Z 1 z ZZµ z µ X y,z = XY z 1 YY z µ y CS8803: STR Hilbert Space Embeddings 11
12 Learning XY Z b XY Z = 1 N NX i=1 y i x i z i YX b YX = 1 N NX i=1 y i x > i µ X ˆµ X = 1 N NX i=1 x i where x i y i z i are indicator vectors CS8803: STR Hilbert Space Embeddings 12
13 Generalization how do we make a conditional probability table out of this? how do we learn parameters? (what are the parameters??) how do we perform inference? CS8803: STR Hilbert Space Embeddings 13
14 Could Discretize the Distribution loses information, hard to learn for high cardinality CS8803: STR Hilbert Space Embeddings 14
15 Key Idea: Sufficient Statistics P[Y ] µ Y = E[Y ] Problem: lots of distributions have the same mean P[Y ] µ Y = E[Y ] E[Y 2 ] Better, but lots of distributions still have the same mean and variance!! P[Y ] µ Y = E[Y ] E[Y 2 ] E[Y 3 ] 1 A Even better, but lots of distributions still have first 3 moments! CS8803: STR Hilbert Space Embeddings 15
16 Key Idea: Sufficient Statistics P[Y ] µ Y = 0 E[Y ] E[Y 2 ] E[Y 3 ]. 1 C A CS8803: STR Hilbert Space Embeddings 16
17 Overview Multinomial Distributions Marginal, Joint, Conditional Sum, Product, Bayes rules Hilbert Space Embeddings Marginal, Joint, Conditional Sum, Product, Bayes rules Gram/Kernel Matrices CS8803: STR Hilbert Space Embeddings 17
18 David Hilbert CS8803: STR Hilbert Space Embeddings 18
19 Representation Marginal Distributions: P[Y ] Joint Distributions: P[Y,X] Conditional Distributions: P[Y X] Use kernel representations for distributions CS8803: STR Hilbert Space Embeddings 19
20 Embedding Distributions Summary statistics for distributions P[Y ] E[Y ] Mean E YY > Covariance E[ y0 (Y )] Probability P[y 0 ] E[ (Y )] Expected Features Pick a kernel k(y, y 0 )=h (y), (y 0 )i, and generate a different statistic CS8803: STR Hilbert Space Embeddings 20
21 Embedding Marginal Distributions P[Y ] (Y )=k(y, ) F (RKHS) µ Y = E[ (Y )] ˆµ Y = 1 TX (y i ) T i=1 CS8803: STR Hilbert Space Embeddings 21
22 Embedding Marginal Distributions P[Y ] (Y )=k(y, ) F (RKHS) One-to-one mapping µ Y = from E[ (Y P[Y )] ] to µ Y for certain kernels (e.g. Gaussian, Laplacian ˆµ Y = 1 TX RBF kernels ) (y i ) T Recover discrete probability i=1with delta kernel Sample average converges to true mean at O p m 1 2 CS8803: STR Hilbert Space Embeddings 21
23 Embedding Joint Distributions using outer- Embedding joint distributions P[Y,X] product feature map (Y )'(X) > µ YX = E (Y )'(X) > ˆµ YX = 1 m mx (y i )'(x i ) > i=1 µ YX is also the covariance operator C YX Recover discrete probabilities with delta kernels Empirical estimate converges at O p (m 1 2 ) CS8803: STR Hilbert Space Embeddings 22
24 Y Embedding Conditional Distributions P[Y x 1 ] P[Y x 2 ] E[ (Y ) x] µ Y x1 µ Y x2 (Y )=l(y, ) G (RKHS) x 1 x 2 X For each value X = x, return the summary statistic for P[Y X = x] Some X = x are never observed CS8803: STR Hilbert Space Embeddings 23
25 Embedding Conditional Distributions E[ (Y ) x] Y P[Y x 1 ] P[Y x 2 ] (Y )=l(y, ) G (RKHS) µ Y x1 µ Y x2 x 1 x 2 X avoid data partitioning '(x 1 ) µ Y x = U Y X '(x) '(x 2 ) '(X) =k(x, ) F (RKHS) conditional embedding operator CS8803: STR Hilbert Space Embeddings 24
26 Embedding Conditional Distributions Estimation via covariance operators U Y X := C YX C 1 XX bu Y X = (K + I) 1 > := ( (y 1 ),..., (y m )), L = > := ('(x 1 ),...,'(x m )), K = > Gaussian: covariance matrices Discrete: joint probability matrix divided by marginal Empirical estimate converges at O p ( m ) CS8803: STR Hilbert Space Embeddings 25
27 Direct Correspondence NX YX b YX = 1 N C YX b C YX = 1 N i=1 NX i=1 y i x > i (y i )'(x i ) > NX µ X ˆµ X = 1 N µ X ˆµ X = 1 N i=1 NX i=1 x i (x i ) CS8803: STR Hilbert Space Embeddings 26
28 Key Rules for Inference Sum Rule: P[Y ]= Z X P[Y X]P[X] Product Rule: P[Y,X]=P[Y X]P[X] Bayes Rule: P[X Y ]= R P[Y X]P[X] P[Y X]P[X] X Do probabilistic inference in feature space CS8803: STR Hilbert Space Embeddings 27
29 Product Rule P[Y,X]=P[Y X]P[X] Discrete Y X = YX 1 XX YX = Y X XX HSE C Y X = C YX C 1 XX C YX = C Y X C XX CS8803: STR Hilbert Space Embeddings 28
30 Sum Rule P[Y ]= X X P[Y,X] = X X P[Y X]P[X] Discrete µ Y = YX 1 = YX 1 XX µ X HSE µ Y = C YX C 1 XX µ X CS8803: STR Hilbert Space Embeddings 29
31 Bayes Rule P[X Y ]= P[Y X]P[X] P[Y ] Discrete X Y =( Y X XX ) > 1 YY = XY 1 YY HSE C X Y =(C Y X ) > C 1 YY = C XY C 1 YY CS8803: STR Hilbert Space Embeddings 30
32 Overview Multinomial Distributions Marginal, Joint, Conditional Sum, Product, Bayes rules Hilbert Space Embeddings Marginal, Joint, Conditional Sum, Product, Bayes rules Gram/Kernel Matrices CS8803: STR Hilbert Space Embeddings 31
33 Jørgen Gram CS8803: STR Hilbert Space Embeddings 32
34 Gram/Kernel Matrices bc YX = 1 N bc XX = 1 N NX (y i )'(x i ) > = 1 N Y > X 2 R 1 1 i=1 NX i=1 '(x i )'(x i ) > = 1 N X > X 2 R 1 1 µ x = '(x) 2 R 1 1 Would like to calculate: µ Y x = b C YX b C 1 XX µ x CS8803: STR Hilbert Space Embeddings 33
35 Gram/Kernel Matrices µ Y x = b C YX b C 1 XX µ x ˆµ Y x = Y > X X > X + I 1 '(x) (Woodbury) Matrix Inversion Lemma = Y ( > X X + NI) 1 > X'(x) = Y (G XX + NI) 1 G XX (:,i) where G XX = 1 N > X X 2 R N N G XX (:,i)= > X'(x i ) 2 R N 1 CS8803: STR Hilbert Space Embeddings 34
36 Hilbert Space Embeddings of Distributions An alternative to (for example) exponential families and Parzan windows (KDE) Represent arbitrary distributions in feature spaces, reason using Hilbert space sum, product, and Bayes rules Linear algebra for learning and inference Can extend state space models non-parametrically to domains defined by kernels CS8803: STR Hilbert Space Embeddings 35
22 : Hilbert Space Embeddings of Distributions
10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationReview of probability
Review of probability Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts definition of probability random variables
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More information1 Bayesian Linear Regression (BLR)
Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationReview of Probability Mark Craven and David Page Computer Sciences 760.
Review of Probability Mark Craven and David Page Computer Sciences 760 www.biostat.wisc.edu/~craven/cs760/ Goals for the lecture you should understand the following concepts definition of probability random
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationHilbert Space Embeddings of Hidden Markov Models
Hilbert Space Embeddings of Hidden Markov Models Le Song Carnegie Mellon University Joint work with Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1 Big Picture QuesJon Graphical Models! Dependent
More informationKernel Bayes Rule: Nonparametric Bayesian inference with kernels
Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationThe Multivariate Gaussian Distribution [DRAFT]
The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II
More informationNonparametric Bayesian Methods
Nonparametric Bayesian Methods Debdeep Pati Florida State University October 2, 2014 Large spatial datasets (Problem of big n) Large observational and computer-generated datasets: Often have spatial and
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationCopula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011
Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationIntroduction to Bayesian Statistics
School of Computing & Communication, UTS January, 207 Random variables Pre-university: A number is just a fixed value. When we talk about probabilities: When X is a continuous random variable, it has a
More informationDynamic models 1 Kalman filters, linearization,
Koller & Friedman: Chapter 16 Jordan: Chapters 13, 15 Uri Lerner s Thesis: Chapters 3,9 Dynamic models 1 Kalman filters, linearization, Switching KFs, Assumed density filters Probabilistic Graphical Models
More informationBayesian Support Vector Machines for Feature Ranking and Selection
Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More informationChapter 5 Joint Probability Distributions
Applied Statistics and Probability for Engineers Sixth Edition Douglas C. Montgomery George C. Runger Chapter 5 Joint Probability Distributions 5 Joint Probability Distributions CHAPTER OUTLINE 5-1 Two
More informationKernel Embeddings of Conditional Distributions
Kernel Embeddings of Conditional Distributions Le Song, Kenji Fukumizu and Arthur Gretton Georgia Institute of Technology The Institute of Statistical Mathematics University College London Abstract Many
More informationGaussian Models (9/9/13)
STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationCS145: Probability & Computing
CS45: Probability & Computing Lecture 0: Continuous Bayes Rule, Joint and Marginal Probability Densities Instructor: Eli Upfal Brown University Computer Science Figure credits: Bertsekas & Tsitsiklis,
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationMath 456: Mathematical Modeling. Tuesday, March 6th, 2018
Math 456: Mathematical Modeling Tuesday, March 6th, 2018 Markov Chains: Exit distributions and the Strong Markov Property Tuesday, March 6th, 2018 Last time 1. Weighted graphs. 2. Existence of stationary
More informationStatistical Learning Reading Assignments
Statistical Learning Reading Assignments S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Press, 2001 (Chapt. 3, hard copy). T. Evgeniou, M. Pontil, and T. Poggio, "Statistical
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More information4 Sums of Independent Random Variables
4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationMachine Learning Srihari. Gaussian Processes. Sargur Srihari
Gaussian Processes Sargur Srihari 1 Topics in Gaussian Processes 1. Examples of use of GP 2. Duality: From Basis Functions to Kernel Functions 3. GP Definition and Intuition 4. Linear regression revisited
More informationAdvances in kernel exponential families
Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate
More informationMultivariate probability distributions and linear regression
Multivariate probability distributions and linear regression Patrik Hoyer 1 Contents: Random variable, probability distribution Joint distribution Marginal distribution Conditional distribution Independence,
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationToday. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion
Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationIntroduction to Machine Learning
Introduction to Machine Learning Kernel Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 21
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationCS 7140: Advanced Machine Learning
Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L2: Instance Based Estimation Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune, January
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationStatistics for scientists and engineers
Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3
More information(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.
54 We are given the marginal pdfs of Y and Y You should note that Y gamma(4, Y exponential( E(Y = 4, V (Y = 4, E(Y =, and V (Y = 4 (a With U = Y Y, we have E(U = E(Y Y = E(Y E(Y = 4 = (b Because Y and
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationIntroduction to Machine Learning
Introduction to Machine Learning 12. Gaussian Processes Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 The Normal Distribution http://www.gaussianprocess.org/gpml/chapters/
More informationFast Direct Methods for Gaussian Processes
Fast Direct Methods for Gaussian Processes Mike O Neil Departments of Mathematics New York University oneil@cims.nyu.edu December 12, 2015 1 Collaborators This is joint work with: Siva Ambikasaran Dan
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares
CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26 Estimation: Regression and Least Squares This note explains how to use observations to estimate unobserved random variables.
More informationElements of Positive Definite Kernel and Reproducing Kernel Hilbert Space
Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department
More informationMachine Learning for Data Science (CS4786) Lecture 12
Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationProbability Review. September 25, 2015
Probability Review September 25, 2015 We need a tool to 1) Formulate a model of some phenomenon. 2) Learn an instance of the model from data. 3) Use it to infer outputs from new inputs. Why Probability?
More informationDEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY
DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY OUTLINE 3.1 Why Probability? 3.2 Random Variables 3.3 Probability Distributions 3.4 Marginal Probability 3.5 Conditional Probability 3.6 The Chain
More informationIntroduction Dual Representations Kernel Design RBF Linear Reg. GP Regression GP Classification Summary. Kernel Methods. Henrik I Christensen
Kernel Methods Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Kernel Methods 1 / 37 Outline
More information01 Probability Theory and Statistics Review
NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement
More informationKernel methods for Bayesian inference
Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20
More informationAn Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems
An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems Byron Boots and Geoffrey J. Gordon AAAI 2011 Select Lab Carnegie Mellon University What is out there?...... o
More informationJoint distribution optimal transportation for domain adaptation
Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation
More informationDD Advanced Machine Learning
Modelling Carl Henrik {chek}@csc.kth.se Royal Institute of Technology November 4, 2015 Who do I think you are? Mathematically competent linear algebra multivariate calculus Ok programmers Able to extend
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationNonparametric Regression With Gaussian Processes
Nonparametric Regression With Gaussian Processes From Chap. 45, Information Theory, Inference and Learning Algorithms, D. J. C. McKay Presented by Micha Elsner Nonparametric Regression With Gaussian Processes
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationGaussian Mixture Models
Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro
More informationLecture Note 1: Probability Theory and Statistics
Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More information[POLS 8500] Review of Linear Algebra, Probability and Information Theory
[POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming
More informationProbability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014
Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions
More informationLecture 4 February 2
4-1 EECS 281B / STAT 241B: Advanced Topics in Statistical Learning Spring 29 Lecture 4 February 2 Lecturer: Martin Wainwright Scribe: Luqman Hodgkinson Note: These lecture notes are still rough, and have
More informationMachine Learning for Signal Processing Bayes Classification
Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More information