Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

Size: px
Start display at page:

Download "Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model"

Transcription

1 Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Kai Yu Joint work with John Lafferty and Shenghuo Zhu NEC Laboratories America, Carnegie Mellon University First Prev Page 1 Next Last Go Back Full Screen Close Quit

2 Learning multiple tasks For input vector z j, and its outputs Y ij (tasks), the standard regression model is under various conditions Y ij = µ + m i (z j ) + ɛ ij, where ɛ ij iid N(0, σ 2 ), i = 1,..., M, and j = 1,..., N. First Prev Page 2 Next Last Go Back Full Screen Close Quit

3 A kernel approach to multi-task learning To model the dependency between tasks, a hierarchical Bayesian approach may assume where m i iid GP(0, Σ) Σ(z j, z j ) 0 is a shared covariance function among inputs; Many multi-task learning approaches are similar to this. a a ICML-09 tutorial, Tresp & Yu First Prev Page 3 Next Last Go Back Full Screen Close Quit

4 Using task-specific features Assuming task-specific features x are available, a more flexible approach is to model the data jointly, as Y ij = µ + m(x i, z j ) + ɛ ij, where ɛ ij iid N(0, σ 2 ), m ij = m(x i, z j ) is a relational function. First Prev Page 4 Next Last Go Back Full Screen Close Quit

5 A nonparametric kernel-based approach Assume the relational function follows a m GP(0, Ω Σ) where Ω(x i, x i ) is a covariance function on tasks; Σ(z j, z j ) is a covariance function on inputs; any sub matrix follows m N(0, Ω Σ) Cov(m ij, m i j ) = Ω ii Σ jj ; If Ω = δ, the prior reduces to m iid i GP(0, Σ). a Yu et al., 2007; Bonilla et al., 2008 First Prev Page 5 Next Last Go Back Full Screen Close Quit

6 The collaborative prediction problem This essentially a multi-task learning problem with task features; Matrix factorization using additional row/column attributes; The formulation applies to many relational prediction problems. First Prev Page 6 Next Last Go Back Full Screen Close Quit

7 Challenges to the kernel approach Computation: the cost O(M 3 N 3 ) is prohibitive. Netflix data: M = 480, 189 and N = 17, 770. Dependent noise : when Y ij cannot be fully explained by the predictors x i and z j, the conditional independence assumption is invalid, which means, p(y m, x, z) i,j p(y ij m, x i, z j ) User and movie features are weak predictors; The relational observations Y ij alone are informative to each other. First Prev Page 7 Next Last Go Back Full Screen Close Quit

8 This work Novel multi-task model using both input and task attributes; Nonparametric random effects to resolve dependent noises ; Efficient algorithm for large-scale collaborative prediction problems. First Prev Page 8 Next Last Go Back Full Screen Close Quit

9 Nonparametric random effects Y ij = µ + m ij + f ij + ɛ ij, m(x i, z j ): a function depending on known attributes; f ij : random effects for dependent noises ; modeling dependency in observations with repeated structures. Let f ij be nonparametric: dimensionality increases with data size; nonparametric matrix factorization First Prev Page 9 Next Last Go Back Full Screen Close Quit

10 Efficiency considerations in modeling To save computation, we absorb ɛ into f and obtain Y ij = µ + m ij + f ij, Introduce a special generative process for m and f... m, f, Ω 0 (x i, x i ), Σ 0 (z j, z j ), First Prev Page 10 Next Last Go Back Full Screen Close Quit

11 The row-wise generative model First Prev Page 11 Next Last Go Back Full Screen Close Quit

12 The row-wise generative model First Prev Page 12 Next Last Go Back Full Screen Close Quit

13 The row-wise generative model First Prev Page 13 Next Last Go Back Full Screen Close Quit

14 The row-wise generative model First Prev Page 14 Next Last Go Back Full Screen Close Quit

15 The row-wise generative model First Prev Page 15 Next Last Go Back Full Screen Close Quit

16 The row-wise generative model First Prev Page 16 Next Last Go Back Full Screen Close Quit

17 The row-wise generative model First Prev Page 17 Next Last Go Back Full Screen Close Quit

18 The row-wise generative model First Prev Page 18 Next Last Go Back Full Screen Close Quit

19 The column-wise generative model First Prev Page 19 Next Last Go Back Full Screen Close Quit

20 Two generative models Y ij = µ + m ij + f ij, column-wise model: Ω IWP(κ, Ω 0 + τδ), m GP(0, Ω Σ 0 ), f iid j GP(0, λω), row-wise model: Σ IWP(κ, Σ 0 + λδ), m GP(0, Ω 0 Σ), f iid i GP(0, τσ), First Prev Page 20 Next Last Go Back Full Screen Close Quit

21 Two generative models are equivalent Both models lead to the same matrix-variate Student-t process Y MTP ( κ, 0, (Ω 0 + τδ), (Σ 0 + λδ) ), The model learns both Ω and Σ simultaneously; Sometimes one model is computationally cheaper than the other. First Prev Page 21 Next Last Go Back Full Screen Close Quit

22 An idea of large-scale modeling First Prev Page 22 Next Last Go Back Full Screen Close Quit

23 Modeling large-scale data If Ω 0 (x i, x i ) = φ(x i ), φ(x i ), m GP(0, Ω 0 Σ) implies m ij = φ(x i ), β j Without loss of generality, let Ω 0 (x i, x i ) = p 1 2 xi, p 1 2 xi, x i R p. On a finite observational matrix Y R M N, M N, the row-wise model becomes Σ IW(κ, Σ 0 + λi N ), β N(0, I p Σ), where β is p N random matrix. Y i N(β x i, τσ), i = 1,..., M First Prev Page 23 Next Last Go Back Full Screen Close Quit

24 Approximate Inference - EM E-step: compute the sufficient statistics {υ i, C i } for the posterior of Y i given the current β and Σ: Q(Y) = M p(y i Y Oi, β, Σ) = i=1 M N(Y i υ i, C i ), i=1 M-step: optimize β and Σ: β, Σ = arg min β,σ and then let β β, Σ Σ. { } E Q(Y) [ log p(y, β, Σ θ)] First Prev Page 24 Next Last Go Back Full Screen Close Quit

25 Some notation Let J i {1,..., N} be the index set of the N i observed elements in the row Y i ; Σ [:,Ji ] R N N i is the matrix obtained by keeping the columns of Σ indexed by J i ; Σ [Ji,J i ] R N i N i is obtained from Σ [:,Ji ] by further keeping only the rows indexed by J i ; Similarly, we can define Σ [Ji,:], Y [i,ji ] and m [i,ji ]. First Prev Page 25 Next Last Go Back Full Screen Close Quit

26 The EM algorithm E-step: for i = 1,..., M m i = β x i, υ i = m i + Σ [:,Ji ]Σ 1 [J i,j i ](Y [i,ji ] m [i,ji ]), C i = τσ τσ [:,Ji ]Σ 1 [J i,j i ]Σ [Ji,:]. M-step: β = (x x + τi p ) 1 x υ, [ M ] τ 1 i=1 Σ (C i + υ i υ i ) υ x(x x + τi p ) 1 x υ + Σ 0 + λi N = M + 2N + p + κ On Netflix, each EM iteration takes several thousands of hours. First Prev Page 26 Next Last Go Back Full Screen Close Quit

27 Fast implementation Let U i R N N i be a column selection operator, such that Σ [:,Ji ] = ΣU i and Σ [Ji,J i ] = U i ΣU i. The M-step only needs C = M i=1 C i + υ υ and υ x from the previous E-step. To obtain them, it s unnecessary to compute υ i and C i. For example, M C i = i=1 = M i=1 M i=1 ( ) τσ τσ [:,Ji]Σ 1 [J Σ i,j i] [J i,:] ( ) τσ τσu i Σ 1 [J U i,j i] i Σ = τmσ τσ ( M i=1 U i Σ 1 [J i,j i] U i Similar tricks can be applied to υ υ and υ x. Time for each iteration is reduced from thousands of hours to 5 hours only. ) Σ First Prev Page 27 Next Last Go Back Full Screen Close Quit

28 EachMovie Data users, 1648 movies; 2, 811, 718 numeric ratings Y ij {1,..., 6}; 97.17% of the elements are missing; Use 80% ratings of each user for training and the rest for testing; This random selection is repeated 10 times independently. First Prev Page 28 Next Last Go Back Full Screen Close Quit

29 Compared methods User Mean & Movie Mean: prediction by the empirical mean; FMMMF: fast max-margin matrix factorization a ; PPCA: probabilistic principal component analysis b ; BSRM: Bayesian stochastic relational model c, BSRM-1 uses no additional user/movie attributes d ; NREM: Nonparametric random effects model, NREM-1 uses no additional attributes. a Rennie & Srebro (2005). b Tipping & Bishop (1999). c Zhu, Yu, & Gong (2009). d Top 20 eigenvectors from of binary matrix indicating if a rating is observed or not. First Prev Page 29 Next Last Go Back Full Screen Close Quit

30 Results on EachMovie TABLE: Prediction Error on EachMovie Data Method RMSE Standard Error Run Time (hours) User Mean Movie Mean FMMMF PPCA BSRM BSRM NREM NREM First Prev Page 30 Next Last Go Back Full Screen Close Quit

31 Netflix Data 100, 480, 507 ratings from 480, 189 users on 17, 770 movies; Y ij {1, 2, 3, 4, 5}; A set of validation data contain 1, 408, 395 ratings; Therefore there are 98.81% of elements missing in the rating matrix; The test set includes 2, 817, 131 ratings; First Prev Page 31 Next Last Go Back Full Screen Close Quit

32 Compared methods In addition to those compared in EachMovie experiment, there are several other methods: SVD: a method almost the same as FMMMF, using a gradient-based method for optimization a. RBM: Restricted Boltzmann Machine using contrast divergence b. PMF and BPMF: probabilistic matrix factorization c, and its Bayesian version d. PMF-VB: probabilistic matrix factorization using a variational Bayes method for inference e. a Kurucz, Benczur, & Csalogany, (2007). b Salakhutdinov, Mnih & Hinton (2007). c Salakhutdinov & Mnih (2008b). d Salakhutdinov & Mnih (2008a). e Lim & Teh (2007). First Prev Page 32 Next Last Go Back Full Screen Close Quit

33 Results on Netflix TABLE: Performance on Netflix Data Method RMSE Run Time (hours) Cinematch SVD PMF RBM PMF-VB BPMF BSRM NREM NREM First Prev Page 33 Next Last Go Back Full Screen Close Quit

34 Predictive Uncertainty Standard deviations of prediction residuals vs. standard deviations predicted by our model on EachMovie First Prev Page 34 Next Last Go Back Full Screen Close Quit

35 Related work Multi-task learning using Gaussian processes, those learn the covariance Σ shared across tasks a, and those that additionally consider the covariance Ω between tasks b Application of GP models to collaborative filtering c Low-rank matrix factorization, e.g., d. Our model is nonparametric in the sense no rank constraint is imposed. Very few matrix factorization methods use known predictors. One such a work e introduces low-rank multiplicative random effects in modeling networked observations. a Lawrence & Platt (2004); Schwaighofer, Tresp & Yu (2004); Yu, Tresp & Schwaighofer (2005). b Yu, Chu, Yu, Tresp, & Xu, (2007); Bonilla, Chai, & Williams (2008). c Schwaighofer, Tresp & Yu (2004), Yu & Chu (2007) d Salakhutdinov & Mnih (2008b). e Hoff (2005) First Prev Page 35 Next Last Go Back Full Screen Close Quit

36 Summary The model provides a novel way to use random effects and known attributes to explain the complex dependence of data; We make the nonparametric model scalable and efficient on very large-scale problems; Our experiments demonstrate that the algorithm works very well on two challenging collaborative prediction problems; In the near future, it will be promising to perform a full Bayesian inference by a parallel Gibbs sampling method. First Prev Page 36 Next Last Go Back Full Screen Close Quit

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Kai Yu John Lafferty Shenghuo Zhu Yihong Gong NEC Laboratories America, Cupertino, CA 95014 USA Carnegie Mellon University,

More information

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF) Case Study 4: Collaborative Filtering Review: Probabilistic Matrix Factorization Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 2 th, 214 Emily Fox 214 1 Probabilistic

More information

Large-scale Ordinal Collaborative Filtering

Large-scale Ordinal Collaborative Filtering Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk

More information

Restricted Boltzmann Machines for Collaborative Filtering

Restricted Boltzmann Machines for Collaborative Filtering Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem

More information

Andriy Mnih and Ruslan Salakhutdinov

Andriy Mnih and Ruslan Salakhutdinov MATRIX FACTORIZATION METHODS FOR COLLABORATIVE FILTERING Andriy Mnih and Ruslan Salakhutdinov University of Toronto, Machine Learning Group 1 What is collaborative filtering? The goal of collaborative

More information

Learning to Learn and Collaborative Filtering

Learning to Learn and Collaborative Filtering Appearing in NIPS 2005 workshop Inductive Transfer: Canada, December, 2005. 10 Years Later, Whistler, Learning to Learn and Collaborative Filtering Kai Yu, Volker Tresp Siemens AG, 81739 Munich, Germany

More information

MTTTS16 Learning from Multiple Sources

MTTTS16 Learning from Multiple Sources MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:

More information

Gaussian Process Models for Link Analysis and Transfer Learning

Gaussian Process Models for Link Analysis and Transfer Learning Gaussian Process Models for Link Analysis and Transfer Learning Kai Yu NEC Laboratories America Cupertino, CA 95014 Wei Chu Columbia University, CCLS New York, NY 10115 Abstract This paper aims to model

More information

Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains

Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains Bin Cao caobin@cse.ust.hk Nathan Nan Liu nliu@cse.ust.hk Qiang Yang qyang@cse.ust.hk Hong Kong University of Science and

More information

Social Relations Model for Collaborative Filtering

Social Relations Model for Collaborative Filtering Social Relations Model for Collaborative Filtering Wu-Jun Li and Dit-Yan Yeung Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering, Shanghai Jiao Tong

More information

Fast Nonparametric Matrix Factorization for Large-scale Collaborative Filtering

Fast Nonparametric Matrix Factorization for Large-scale Collaborative Filtering Fast Nonparametric atrix Factorization for Large-scale Collaborative Filtering Kai Yu Shenghuo Zhu John Lafferty Yihong Gong NEC Laboratories America, Cupertino, CA 95014, USA School of Computer Science,

More information

Mixed Membership Matrix Factorization

Mixed Membership Matrix Factorization Mixed Membership Matrix Factorization Lester Mackey 1 David Weiss 2 Michael I. Jordan 1 1 University of California, Berkeley 2 University of Pennsylvania International Conference on Machine Learning, 2010

More information

Mixed Membership Matrix Factorization

Mixed Membership Matrix Factorization Mixed Membership Matrix Factorization Lester Mackey University of California, Berkeley Collaborators: David Weiss, University of Pennsylvania Michael I. Jordan, University of California, Berkeley 2011

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Scalable Bayesian Matrix Factorization

Scalable Bayesian Matrix Factorization Scalable Bayesian Matrix Factorization Avijit Saha,1, Rishabh Misra,2, and Balaraman Ravindran 1 1 Department of CSE, Indian Institute of Technology Madras, India {avijit, ravi}@cse.iitm.ac.in 2 Department

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task Binary Principal Component Analysis in the Netflix Collaborative Filtering Task László Kozma, Alexander Ilin, Tapani Raiko first.last@tkk.fi Helsinki University of Technology Adaptive Informatics Research

More information

Learning Gaussian Process Kernels via Hierarchical Bayes

Learning Gaussian Process Kernels via Hierarchical Bayes Learning Gaussian Process Kernels via Hierarchical Bayes Anton Schwaighofer Fraunhofer FIRST Intelligent Data Analysis (IDA) Kekuléstrasse 7, 12489 Berlin anton@first.fhg.de Volker Tresp, Kai Yu Siemens

More information

Bayesian Matrix Factorization with Side Information and Dirichlet Process Mixtures

Bayesian Matrix Factorization with Side Information and Dirichlet Process Mixtures Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Bayesian Matrix Factorization with Side Information and Dirichlet Process Mixtures Ian Porteous and Arthur Asuncion

More information

Prediction of double gene knockout measurements

Prediction of double gene knockout measurements Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Matrix and Tensor Factorization from a Machine Learning Perspective

Matrix and Tensor Factorization from a Machine Learning Perspective Matrix and Tensor Factorization from a Machine Learning Perspective Christoph Freudenthaler Information Systems and Machine Learning Lab, University of Hildesheim Research Seminar, Vienna University of

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation

More information

Algorithms for Collaborative Filtering

Algorithms for Collaborative Filtering Algorithms for Collaborative Filtering or How to Get Half Way to Winning $1million from Netflix Todd Lipcon Advisor: Prof. Philip Klein The Real-World Problem E-commerce sites would like to make personalized

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Goal (Lecture): To present Probabilistic Principal Component Analysis (PPCA) using both Maximum Likelihood (ML) and Expectation Maximization

More information

Collaborative Filtering via Rating Concentration

Collaborative Filtering via Rating Concentration Bert Huang Computer Science Department Columbia University New York, NY 0027 bert@cscolumbiaedu Tony Jebara Computer Science Department Columbia University New York, NY 0027 jebara@cscolumbiaedu Abstract

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Variable sigma Gaussian processes: An expectation propagation perspective

Variable sigma Gaussian processes: An expectation propagation perspective Variable sigma Gaussian processes: An expectation propagation perspective Yuan (Alan) Qi Ahmed H. Abdel-Gawad CS & Statistics Departments, Purdue University ECE Department, Purdue University alanqi@cs.purdue.edu

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Talk on Bayesian Optimization

Talk on Bayesian Optimization Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Principal Component Analysis (PCA) for Sparse High-Dimensional Data

Principal Component Analysis (PCA) for Sparse High-Dimensional Data AB Principal Component Analysis (PCA) for Sparse High-Dimensional Data Tapani Raiko, Alexander Ilin, and Juha Karhunen Helsinki University of Technology, Finland Adaptive Informatics Research Center Principal

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

Gaussian Process Regression Networks

Gaussian Process Regression Networks Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Probabilistic & Bayesian deep learning. Andreas Damianou

Probabilistic & Bayesian deep learning. Andreas Damianou Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Matrix Factorizations: A Tale of Two Norms

Matrix Factorizations: A Tale of Two Norms Matrix Factorizations: A Tale of Two Norms Nati Srebro Toyota Technological Institute Chicago Maximum Margin Matrix Factorization S, Jason Rennie, Tommi Jaakkola (MIT), NIPS 2004 Rank, Trace-Norm and Max-Norm

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression

More information

PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou

PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS Chenlin Wu Yuhan Lou Background Smart grid: increasing the contribution of renewable in grid energy Solar generation: intermittent and nondispatchable

More information

A Spectral Regularization Framework for Multi-Task Structure Learning

A Spectral Regularization Framework for Multi-Task Structure Learning A Spectral Regularization Framework for Multi-Task Structure Learning Massimiliano Pontil Department of Computer Science University College London (Joint work with A. Argyriou, T. Evgeniou, C.A. Micchelli,

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Recommendation. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Recommendation. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Recommendation Tobias Scheffer Recommendation Engines Recommendation of products, music, contacts,.. Based on user features, item

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

Downloaded 09/30/17 to Redistribution subject to SIAM license or copyright; see

Downloaded 09/30/17 to Redistribution subject to SIAM license or copyright; see Latent Factor Transition for Dynamic Collaborative Filtering Downloaded 9/3/17 to 4.94.6.67. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php Chenyi Zhang

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering Jianping Shi Naiyan Wang Yang Xia Dit-Yan Yeung Irwin King Jiaya Jia Department of Computer Science and Engineering, The Chinese

More information

Principal Component Analysis (PCA) for Sparse High-Dimensional Data

Principal Component Analysis (PCA) for Sparse High-Dimensional Data AB Principal Component Analysis (PCA) for Sparse High-Dimensional Data Tapani Raiko Helsinki University of Technology, Finland Adaptive Informatics Research Center The Data Explosion We are facing an enormous

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005

Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005 Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005 Presented by Piotr Mirowski CBLL meeting, May 6, 2009 Courant Institute of Mathematical Sciences, New York University

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Multi-task Learning with Gaussian Processes, with Applications to Robot Inverse Dynamics

Multi-task Learning with Gaussian Processes, with Applications to Robot Inverse Dynamics 1 / 38 Multi-task Learning with Gaussian Processes, with Applications to Robot Inverse Dynamics Chris Williams with Kian Ming A. Chai, Stefan Klanke, Sethu Vijayakumar December 2009 Motivation 2 / 38 Examples

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Kernelized Probabilistic Matrix Factorization: Exploiting Graphs and Side Information

Kernelized Probabilistic Matrix Factorization: Exploiting Graphs and Side Information Kernelized Probabilistic Matrix Factorization: Exploiting Graphs and Side Information Tinghui Zhou Hanhuai Shan Arindam Banerjee Guillermo Sapiro Abstract We propose a new matrix completion algorithm Kernelized

More information

State Space Gaussian Processes with Non-Gaussian Likelihoods

State Space Gaussian Processes with Non-Gaussian Likelihoods State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2 Alexander Grigorievskiy 2,3 1 Philips Research, 2 Aalto University, 3 Silo.AI ICML2018 July 13, 2018 Outline

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information