Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model
|
|
- Shon Melton
- 5 years ago
- Views:
Transcription
1 Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Kai Yu Joint work with John Lafferty and Shenghuo Zhu NEC Laboratories America, Carnegie Mellon University First Prev Page 1 Next Last Go Back Full Screen Close Quit
2 Learning multiple tasks For input vector z j, and its outputs Y ij (tasks), the standard regression model is under various conditions Y ij = µ + m i (z j ) + ɛ ij, where ɛ ij iid N(0, σ 2 ), i = 1,..., M, and j = 1,..., N. First Prev Page 2 Next Last Go Back Full Screen Close Quit
3 A kernel approach to multi-task learning To model the dependency between tasks, a hierarchical Bayesian approach may assume where m i iid GP(0, Σ) Σ(z j, z j ) 0 is a shared covariance function among inputs; Many multi-task learning approaches are similar to this. a a ICML-09 tutorial, Tresp & Yu First Prev Page 3 Next Last Go Back Full Screen Close Quit
4 Using task-specific features Assuming task-specific features x are available, a more flexible approach is to model the data jointly, as Y ij = µ + m(x i, z j ) + ɛ ij, where ɛ ij iid N(0, σ 2 ), m ij = m(x i, z j ) is a relational function. First Prev Page 4 Next Last Go Back Full Screen Close Quit
5 A nonparametric kernel-based approach Assume the relational function follows a m GP(0, Ω Σ) where Ω(x i, x i ) is a covariance function on tasks; Σ(z j, z j ) is a covariance function on inputs; any sub matrix follows m N(0, Ω Σ) Cov(m ij, m i j ) = Ω ii Σ jj ; If Ω = δ, the prior reduces to m iid i GP(0, Σ). a Yu et al., 2007; Bonilla et al., 2008 First Prev Page 5 Next Last Go Back Full Screen Close Quit
6 The collaborative prediction problem This essentially a multi-task learning problem with task features; Matrix factorization using additional row/column attributes; The formulation applies to many relational prediction problems. First Prev Page 6 Next Last Go Back Full Screen Close Quit
7 Challenges to the kernel approach Computation: the cost O(M 3 N 3 ) is prohibitive. Netflix data: M = 480, 189 and N = 17, 770. Dependent noise : when Y ij cannot be fully explained by the predictors x i and z j, the conditional independence assumption is invalid, which means, p(y m, x, z) i,j p(y ij m, x i, z j ) User and movie features are weak predictors; The relational observations Y ij alone are informative to each other. First Prev Page 7 Next Last Go Back Full Screen Close Quit
8 This work Novel multi-task model using both input and task attributes; Nonparametric random effects to resolve dependent noises ; Efficient algorithm for large-scale collaborative prediction problems. First Prev Page 8 Next Last Go Back Full Screen Close Quit
9 Nonparametric random effects Y ij = µ + m ij + f ij + ɛ ij, m(x i, z j ): a function depending on known attributes; f ij : random effects for dependent noises ; modeling dependency in observations with repeated structures. Let f ij be nonparametric: dimensionality increases with data size; nonparametric matrix factorization First Prev Page 9 Next Last Go Back Full Screen Close Quit
10 Efficiency considerations in modeling To save computation, we absorb ɛ into f and obtain Y ij = µ + m ij + f ij, Introduce a special generative process for m and f... m, f, Ω 0 (x i, x i ), Σ 0 (z j, z j ), First Prev Page 10 Next Last Go Back Full Screen Close Quit
11 The row-wise generative model First Prev Page 11 Next Last Go Back Full Screen Close Quit
12 The row-wise generative model First Prev Page 12 Next Last Go Back Full Screen Close Quit
13 The row-wise generative model First Prev Page 13 Next Last Go Back Full Screen Close Quit
14 The row-wise generative model First Prev Page 14 Next Last Go Back Full Screen Close Quit
15 The row-wise generative model First Prev Page 15 Next Last Go Back Full Screen Close Quit
16 The row-wise generative model First Prev Page 16 Next Last Go Back Full Screen Close Quit
17 The row-wise generative model First Prev Page 17 Next Last Go Back Full Screen Close Quit
18 The row-wise generative model First Prev Page 18 Next Last Go Back Full Screen Close Quit
19 The column-wise generative model First Prev Page 19 Next Last Go Back Full Screen Close Quit
20 Two generative models Y ij = µ + m ij + f ij, column-wise model: Ω IWP(κ, Ω 0 + τδ), m GP(0, Ω Σ 0 ), f iid j GP(0, λω), row-wise model: Σ IWP(κ, Σ 0 + λδ), m GP(0, Ω 0 Σ), f iid i GP(0, τσ), First Prev Page 20 Next Last Go Back Full Screen Close Quit
21 Two generative models are equivalent Both models lead to the same matrix-variate Student-t process Y MTP ( κ, 0, (Ω 0 + τδ), (Σ 0 + λδ) ), The model learns both Ω and Σ simultaneously; Sometimes one model is computationally cheaper than the other. First Prev Page 21 Next Last Go Back Full Screen Close Quit
22 An idea of large-scale modeling First Prev Page 22 Next Last Go Back Full Screen Close Quit
23 Modeling large-scale data If Ω 0 (x i, x i ) = φ(x i ), φ(x i ), m GP(0, Ω 0 Σ) implies m ij = φ(x i ), β j Without loss of generality, let Ω 0 (x i, x i ) = p 1 2 xi, p 1 2 xi, x i R p. On a finite observational matrix Y R M N, M N, the row-wise model becomes Σ IW(κ, Σ 0 + λi N ), β N(0, I p Σ), where β is p N random matrix. Y i N(β x i, τσ), i = 1,..., M First Prev Page 23 Next Last Go Back Full Screen Close Quit
24 Approximate Inference - EM E-step: compute the sufficient statistics {υ i, C i } for the posterior of Y i given the current β and Σ: Q(Y) = M p(y i Y Oi, β, Σ) = i=1 M N(Y i υ i, C i ), i=1 M-step: optimize β and Σ: β, Σ = arg min β,σ and then let β β, Σ Σ. { } E Q(Y) [ log p(y, β, Σ θ)] First Prev Page 24 Next Last Go Back Full Screen Close Quit
25 Some notation Let J i {1,..., N} be the index set of the N i observed elements in the row Y i ; Σ [:,Ji ] R N N i is the matrix obtained by keeping the columns of Σ indexed by J i ; Σ [Ji,J i ] R N i N i is obtained from Σ [:,Ji ] by further keeping only the rows indexed by J i ; Similarly, we can define Σ [Ji,:], Y [i,ji ] and m [i,ji ]. First Prev Page 25 Next Last Go Back Full Screen Close Quit
26 The EM algorithm E-step: for i = 1,..., M m i = β x i, υ i = m i + Σ [:,Ji ]Σ 1 [J i,j i ](Y [i,ji ] m [i,ji ]), C i = τσ τσ [:,Ji ]Σ 1 [J i,j i ]Σ [Ji,:]. M-step: β = (x x + τi p ) 1 x υ, [ M ] τ 1 i=1 Σ (C i + υ i υ i ) υ x(x x + τi p ) 1 x υ + Σ 0 + λi N = M + 2N + p + κ On Netflix, each EM iteration takes several thousands of hours. First Prev Page 26 Next Last Go Back Full Screen Close Quit
27 Fast implementation Let U i R N N i be a column selection operator, such that Σ [:,Ji ] = ΣU i and Σ [Ji,J i ] = U i ΣU i. The M-step only needs C = M i=1 C i + υ υ and υ x from the previous E-step. To obtain them, it s unnecessary to compute υ i and C i. For example, M C i = i=1 = M i=1 M i=1 ( ) τσ τσ [:,Ji]Σ 1 [J Σ i,j i] [J i,:] ( ) τσ τσu i Σ 1 [J U i,j i] i Σ = τmσ τσ ( M i=1 U i Σ 1 [J i,j i] U i Similar tricks can be applied to υ υ and υ x. Time for each iteration is reduced from thousands of hours to 5 hours only. ) Σ First Prev Page 27 Next Last Go Back Full Screen Close Quit
28 EachMovie Data users, 1648 movies; 2, 811, 718 numeric ratings Y ij {1,..., 6}; 97.17% of the elements are missing; Use 80% ratings of each user for training and the rest for testing; This random selection is repeated 10 times independently. First Prev Page 28 Next Last Go Back Full Screen Close Quit
29 Compared methods User Mean & Movie Mean: prediction by the empirical mean; FMMMF: fast max-margin matrix factorization a ; PPCA: probabilistic principal component analysis b ; BSRM: Bayesian stochastic relational model c, BSRM-1 uses no additional user/movie attributes d ; NREM: Nonparametric random effects model, NREM-1 uses no additional attributes. a Rennie & Srebro (2005). b Tipping & Bishop (1999). c Zhu, Yu, & Gong (2009). d Top 20 eigenvectors from of binary matrix indicating if a rating is observed or not. First Prev Page 29 Next Last Go Back Full Screen Close Quit
30 Results on EachMovie TABLE: Prediction Error on EachMovie Data Method RMSE Standard Error Run Time (hours) User Mean Movie Mean FMMMF PPCA BSRM BSRM NREM NREM First Prev Page 30 Next Last Go Back Full Screen Close Quit
31 Netflix Data 100, 480, 507 ratings from 480, 189 users on 17, 770 movies; Y ij {1, 2, 3, 4, 5}; A set of validation data contain 1, 408, 395 ratings; Therefore there are 98.81% of elements missing in the rating matrix; The test set includes 2, 817, 131 ratings; First Prev Page 31 Next Last Go Back Full Screen Close Quit
32 Compared methods In addition to those compared in EachMovie experiment, there are several other methods: SVD: a method almost the same as FMMMF, using a gradient-based method for optimization a. RBM: Restricted Boltzmann Machine using contrast divergence b. PMF and BPMF: probabilistic matrix factorization c, and its Bayesian version d. PMF-VB: probabilistic matrix factorization using a variational Bayes method for inference e. a Kurucz, Benczur, & Csalogany, (2007). b Salakhutdinov, Mnih & Hinton (2007). c Salakhutdinov & Mnih (2008b). d Salakhutdinov & Mnih (2008a). e Lim & Teh (2007). First Prev Page 32 Next Last Go Back Full Screen Close Quit
33 Results on Netflix TABLE: Performance on Netflix Data Method RMSE Run Time (hours) Cinematch SVD PMF RBM PMF-VB BPMF BSRM NREM NREM First Prev Page 33 Next Last Go Back Full Screen Close Quit
34 Predictive Uncertainty Standard deviations of prediction residuals vs. standard deviations predicted by our model on EachMovie First Prev Page 34 Next Last Go Back Full Screen Close Quit
35 Related work Multi-task learning using Gaussian processes, those learn the covariance Σ shared across tasks a, and those that additionally consider the covariance Ω between tasks b Application of GP models to collaborative filtering c Low-rank matrix factorization, e.g., d. Our model is nonparametric in the sense no rank constraint is imposed. Very few matrix factorization methods use known predictors. One such a work e introduces low-rank multiplicative random effects in modeling networked observations. a Lawrence & Platt (2004); Schwaighofer, Tresp & Yu (2004); Yu, Tresp & Schwaighofer (2005). b Yu, Chu, Yu, Tresp, & Xu, (2007); Bonilla, Chai, & Williams (2008). c Schwaighofer, Tresp & Yu (2004), Yu & Chu (2007) d Salakhutdinov & Mnih (2008b). e Hoff (2005) First Prev Page 35 Next Last Go Back Full Screen Close Quit
36 Summary The model provides a novel way to use random effects and known attributes to explain the complex dependence of data; We make the nonparametric model scalable and efficient on very large-scale problems; Our experiments demonstrate that the algorithm works very well on two challenging collaborative prediction problems; In the near future, it will be promising to perform a full Bayesian inference by a parallel Gibbs sampling method. First Prev Page 36 Next Last Go Back Full Screen Close Quit
Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model
Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Kai Yu John Lafferty Shenghuo Zhu Yihong Gong NEC Laboratories America, Cupertino, CA 95014 USA Carnegie Mellon University,
More informationReview: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)
Case Study 4: Collaborative Filtering Review: Probabilistic Matrix Factorization Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 2 th, 214 Emily Fox 214 1 Probabilistic
More informationLarge-scale Ordinal Collaborative Filtering
Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk
More informationRestricted Boltzmann Machines for Collaborative Filtering
Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem
More informationAndriy Mnih and Ruslan Salakhutdinov
MATRIX FACTORIZATION METHODS FOR COLLABORATIVE FILTERING Andriy Mnih and Ruslan Salakhutdinov University of Toronto, Machine Learning Group 1 What is collaborative filtering? The goal of collaborative
More informationLearning to Learn and Collaborative Filtering
Appearing in NIPS 2005 workshop Inductive Transfer: Canada, December, 2005. 10 Years Later, Whistler, Learning to Learn and Collaborative Filtering Kai Yu, Volker Tresp Siemens AG, 81739 Munich, Germany
More informationMTTTS16 Learning from Multiple Sources
MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:
More informationGaussian Process Models for Link Analysis and Transfer Learning
Gaussian Process Models for Link Analysis and Transfer Learning Kai Yu NEC Laboratories America Cupertino, CA 95014 Wei Chu Columbia University, CCLS New York, NY 10115 Abstract This paper aims to model
More informationTransfer Learning for Collective Link Prediction in Multiple Heterogenous Domains
Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains Bin Cao caobin@cse.ust.hk Nathan Nan Liu nliu@cse.ust.hk Qiang Yang qyang@cse.ust.hk Hong Kong University of Science and
More informationSocial Relations Model for Collaborative Filtering
Social Relations Model for Collaborative Filtering Wu-Jun Li and Dit-Yan Yeung Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering, Shanghai Jiao Tong
More informationFast Nonparametric Matrix Factorization for Large-scale Collaborative Filtering
Fast Nonparametric atrix Factorization for Large-scale Collaborative Filtering Kai Yu Shenghuo Zhu John Lafferty Yihong Gong NEC Laboratories America, Cupertino, CA 95014, USA School of Computer Science,
More informationMixed Membership Matrix Factorization
Mixed Membership Matrix Factorization Lester Mackey 1 David Weiss 2 Michael I. Jordan 1 1 University of California, Berkeley 2 University of Pennsylvania International Conference on Machine Learning, 2010
More informationMixed Membership Matrix Factorization
Mixed Membership Matrix Factorization Lester Mackey University of California, Berkeley Collaborators: David Weiss, University of Pennsylvania Michael I. Jordan, University of California, Berkeley 2011
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationProbabilistic Reasoning in Deep Learning
Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More informationScalable Bayesian Matrix Factorization
Scalable Bayesian Matrix Factorization Avijit Saha,1, Rishabh Misra,2, and Balaraman Ravindran 1 1 Department of CSE, Indian Institute of Technology Madras, India {avijit, ravi}@cse.iitm.ac.in 2 Department
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationBinary Principal Component Analysis in the Netflix Collaborative Filtering Task
Binary Principal Component Analysis in the Netflix Collaborative Filtering Task László Kozma, Alexander Ilin, Tapani Raiko first.last@tkk.fi Helsinki University of Technology Adaptive Informatics Research
More informationLearning Gaussian Process Kernels via Hierarchical Bayes
Learning Gaussian Process Kernels via Hierarchical Bayes Anton Schwaighofer Fraunhofer FIRST Intelligent Data Analysis (IDA) Kekuléstrasse 7, 12489 Berlin anton@first.fhg.de Volker Tresp, Kai Yu Siemens
More informationBayesian Matrix Factorization with Side Information and Dirichlet Process Mixtures
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Bayesian Matrix Factorization with Side Information and Dirichlet Process Mixtures Ian Porteous and Arthur Asuncion
More informationPrediction of double gene knockout measurements
Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationMatrix and Tensor Factorization from a Machine Learning Perspective
Matrix and Tensor Factorization from a Machine Learning Perspective Christoph Freudenthaler Information Systems and Machine Learning Lab, University of Hildesheim Research Seminar, Vienna University of
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationIntroduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond
PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation
More informationAlgorithms for Collaborative Filtering
Algorithms for Collaborative Filtering or How to Get Half Way to Winning $1million from Netflix Todd Lipcon Advisor: Prof. Philip Klein The Real-World Problem E-commerce sites would like to make personalized
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationAfternoon Meeting on Bayesian Computation 2018 University of Reading
Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Goal (Lecture): To present Probabilistic Principal Component Analysis (PPCA) using both Maximum Likelihood (ML) and Expectation Maximization
More informationCollaborative Filtering via Rating Concentration
Bert Huang Computer Science Department Columbia University New York, NY 0027 bert@cscolumbiaedu Tony Jebara Computer Science Department Columbia University New York, NY 0027 jebara@cscolumbiaedu Abstract
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationVariable sigma Gaussian processes: An expectation propagation perspective
Variable sigma Gaussian processes: An expectation propagation perspective Yuan (Alan) Qi Ahmed H. Abdel-Gawad CS & Statistics Departments, Purdue University ECE Department, Purdue University alanqi@cs.purdue.edu
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationTalk on Bayesian Optimization
Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationPrincipal Component Analysis (PCA) for Sparse High-Dimensional Data
AB Principal Component Analysis (PCA) for Sparse High-Dimensional Data Tapani Raiko, Alexander Ilin, and Juha Karhunen Helsinki University of Technology, Finland Adaptive Informatics Research Center Principal
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationGaussian Process Regression Networks
Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationProbabilistic & Bayesian deep learning. Andreas Damianou
Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationMatrix Factorizations: A Tale of Two Norms
Matrix Factorizations: A Tale of Two Norms Nati Srebro Toyota Technological Institute Chicago Maximum Margin Matrix Factorization S, Jason Rennie, Tommi Jaakkola (MIT), NIPS 2004 Rank, Trace-Norm and Max-Norm
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More informationPREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou
PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS Chenlin Wu Yuhan Lou Background Smart grid: increasing the contribution of renewable in grid energy Solar generation: intermittent and nondispatchable
More informationA Spectral Regularization Framework for Multi-Task Structure Learning
A Spectral Regularization Framework for Multi-Task Structure Learning Massimiliano Pontil Department of Computer Science University College London (Joint work with A. Argyriou, T. Evgeniou, C.A. Micchelli,
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationBayesian Deep Learning
Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Recommendation. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Recommendation Tobias Scheffer Recommendation Engines Recommendation of products, music, contacts,.. Based on user features, item
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationNeutron inverse kinetics via Gaussian Processes
Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques
More informationDownloaded 09/30/17 to Redistribution subject to SIAM license or copyright; see
Latent Factor Transition for Dynamic Collaborative Filtering Downloaded 9/3/17 to 4.94.6.67. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php Chenyi Zhang
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationMultivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationSCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering
SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering Jianping Shi Naiyan Wang Yang Xia Dit-Yan Yeung Irwin King Jiaya Jia Department of Computer Science and Engineering, The Chinese
More informationPrincipal Component Analysis (PCA) for Sparse High-Dimensional Data
AB Principal Component Analysis (PCA) for Sparse High-Dimensional Data Tapani Raiko Helsinki University of Technology, Finland Adaptive Informatics Research Center The Data Explosion We are facing an enormous
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More information1-bit Matrix Completion. PAC-Bayes and Variational Approximation
: PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationGaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005
Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005 Presented by Piotr Mirowski CBLL meeting, May 6, 2009 Courant Institute of Mathematical Sciences, New York University
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationMulti-task Learning with Gaussian Processes, with Applications to Robot Inverse Dynamics
1 / 38 Multi-task Learning with Gaussian Processes, with Applications to Robot Inverse Dynamics Chris Williams with Kian Ming A. Chai, Stefan Klanke, Sethu Vijayakumar December 2009 Motivation 2 / 38 Examples
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationKernelized Probabilistic Matrix Factorization: Exploiting Graphs and Side Information
Kernelized Probabilistic Matrix Factorization: Exploiting Graphs and Side Information Tinghui Zhou Hanhuai Shan Arindam Banerjee Guillermo Sapiro Abstract We propose a new matrix completion algorithm Kernelized
More informationState Space Gaussian Processes with Non-Gaussian Likelihoods
State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2 Alexander Grigorievskiy 2,3 1 Philips Research, 2 Aalto University, 3 Silo.AI ICML2018 July 13, 2018 Outline
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin
More information