Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Recommendation. Tobias Scheffer
|
|
- Allyson Moody
- 5 years ago
- Views:
Transcription
1 Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Recommendation Tobias Scheffer
2 Recommendation Engines Recommendation of products, music, contacts,.. Based on user features, item features, and past transactions: sales, reviews, clicks, User-specific recommendations, no global ranking of items. Feedback loop: choice of recommendations influences available transaction and click data. 2
3 Netflix Prize Data analysis challenge, Netflix made rating data available: 500,000 users, 18,000 movies, 100 million ratings Challenge: predict ratings that were held back for evaluation; improve by 10% over Netflix s recommendation Award: $ 1 million. 3
4 Problem Setting Users U = {1,, m} Items X = {1,, m } Ratings Y = {(u 1, x 1, y 1 ), (u n, x n, y n )} Rating space y i Y E.g., Y = 1, +1, Y = {,., } Loss function l(y i, y j ) E.g., missing a good movie is bad but watching a terrible movie is worse. Find rating model: f θ : u, x y. 4
5 Problem Setting: Matrix Notation Users U = {1,, m} Items X = {1,, m } y 11 y 12 Ratings Y = y 21 y 23 y 33 Rating space y i Υ E.g.,Υ = item Loss function l(y i, y j ) user 1 user 2 user 3 1, +1,Υ = {,., } Incomplete matrix 5
6 Problem Setting Model f θ (u, x) Find model parameters that minimize risk θ = argmin θ l y, f θ u, x p u, x, y dxdudr As usual: p u, x, y is unknown minimize regularized empirical risk θ = argmin θ l y i, f θ u i, x i n i=1 + λω(θ) 6
7 Content-Based Recommendation Idea: User may like movies that are similar to other movies which they like. Requirement: attributes of items, e.g., Tags, Genre, Actors, Director, 7
8 Content-Based Recommendation Feature space for items E.g., Φ = comedy, action, year, dir tarantino, dir cameron T φ avatar = 0, 1, 2009, 0, 1 T 8
9 Content-Based Recommendation Users U = {1,, m} Items X = {1,, m } Ratings Y = {(u 1, x 1, y 1 ), (u n, x n, y n )} Rating space y i Y E.g.,Υ = 1, +1,Υ = {,., } Loss function l(y i, y j ) E.g., missing a good movie is bad but watching a terrible movie is worse. Feature function for items: φ: x R d Find rating model: f θ : u, x y. 9
10 Independent Learning Problems for Users Minimize regularized empirical risk θ = argmin θ l y i, f θ u i, x i n i=1 One model per user: f θu x Υ One learning problem per user: θ u = argmin θu l y i, f θu x i i:u i =u + λω(θ) + λω(θ u ) 10
11 Independent Learning Problems for Users One learning problem per user: u: θ u = argmin θu l y i, f θu x i i:u i =u + λω(θ u ) Use any model class and learning mechanism; e.g., f θu x i = φ x i T θ u Logistic loss + l 2 regularization: logistic regression Hinge loss + l 2 regularization: SVM Squared loss + l 2 regularization: ridge regression 11
12 Independent Learning Problems for Users Obvious disadvantages of independent problems: Commonalities of users are not exploited, User does not benefit from ratings given by other users, Poor recommendations for users who gave few ratings. Rather use joint prediction model: Recommendations for each user should benefit from other osers ratings. 12
13 Independent Learning Problems θ 1 Parameter vectors of independent prediction models for users Regularizer θ 2 13
14 Joint Learning Problem θ 1 Regularizer Parameter vectors of independent prediction models for users θ 2 14
15 Joint Learning Problem Standard l 2 regularization follows from the assumption that model parameters are governed by normal distribution with mean vector zero. Instead assume that there is a non-zero population mean vector. 15
16 Joint Learning Problem 0 θ Σ θ 1 θ 2 θ m Graphical model of hierarchical prior 16
17 Joint Learning Problem Population mean vector θ ~N 0, 1 I λ User-specific mean vector: θ u ~N θ, 1 λ I Substitution: θ u = θ + θ u ; now θ and θ u have mean vector zero. -Log-prior = regularizer Ω θ + θ u = λ θ 2 + λ θu 2 17
18 Joint Learning Problem Joint optimization problem: min θ 1,,θ m,θ u i:u i =u l y i, f θu +θ x i + λω θ u + λ Ω(θ ) Parameters θ u are independent, θ is shared. Hence, θ u are coupled. θ u = θ + θ u Coupling strength Global regularization 18
19 Discussion Each user benefits from other users ratings. Does not take into account that users have different tastes. Two sci-fi fans may have similar preferences, but a horror-movie fan and a romantic-comedy fan do not. Idea: look at ratings to determine how similar users are. 19
20 Collaborative Filtering Idea: People like items that are liked by people who have similar preferences. People who give similar ratings to items probably have similar preferences. This is independent of item features. 20
21 Collaborative Filtering Users U = {1,, m} Items X = {1,, m } Ratings Y = {(u 1, x 1, y 1 ), (u n, x n, y n )} Rating space y i Υ E.g.,Υ = 1, +1,Υ = {,., } Loss function l(y i, y j ) Find rating model: f θ : u, x y. 21
22 Collaborative Filtering by Nearest Neighbor Define distance function on users: d(u, u ) Predicted rating: f θ u, x = k nearest neighbors u i of u 1 k y u i,x Predicted rating is the average rating of the k nearest neighbors in terms of d u, u. No learning involved. Performance hinges on d(u, u ). 22
23 Collaborative Filtering by Nearest Neighbor Define distance function on users: d u, u = 1 m m x=1 y u,x, y u,x 2 Euclidean distance between ratings for all items. Skip items that have not been rated by both users. 23
24 Extensions Normalize ratings (subtract mean rating of user, divide by user s standard deviation) Weight influence of neighbors by inverse of distance. Weight influence of neighbors with number of jointly rated items. 1 d(u, u i ) y u i,x f θ u, x = k nearest neighbors u i of u k nearest neighbors u i of u 1 d(u, u i ) 24
25 Matrix Zombiland Titanic Death Proof Collaborative Filtering: Example Y = Alice Bob Carol How much would Alice enjoy Zombiland? 25
26 M Z T D Collaborative Filtering: Example Y = d u, u = 1 m m x=1 Alice Bob Carol y u,x, y u,x 2 d A, B = d A, C = d B, C = 26
27 M Z T D Collaborative Filtering: Example Y = d u, u = 1 m m x=1 Alice Bob Carol y u,x, y u,x 2 d A, B = 2.9 d A, C = 1 d B, C =
28 M Z T D Collaborative Filtering: Example Y = f θ A, Z = Alice Bob Carol 1 2 nearest d(a,u i ) y u i,z neighbors u i of A 1 2 nearest d(a,u i ) neighbors u i of A = 28
29 M Z T D Collaborative Filtering: Example Y = f θ A, Z = Alice Bob Carol 1 2 nearest d(a,u i ) y u i,z neighbors u i of A 1 2 nearest d(a,u i ) neighbors u i of A =
30 Collaborative Filtering: Discussion K nearest neigbor and similar methods are called memory-based approaches. There are no model parameters, no optimization criterion is being optimized. Each prediction reuqires an iteration over all training instances impractical! Better to train a model by minimizing an appropriate loss function over a space of model parameter, then use model to make predictions quickly. 30
31 Latent Features Idea: Instead of ad-hoc definition of distance between users, learn features that actually represent preferences. If, for every user u, we had a feature vector ψ u that describes their preferences, Then we could learn parameters θ x for item x such that θ x T ψ u quantifies how much u enjoys x. 31
32 Latent Features Or, turned around, If, for every item x we had a feature vector φ x that characterizes its properties, We could learn parameters θ u such that θ u T φ x quantifies how much u enjoys x. In practice some user attributes ψ u and item attributes φ x are usually available, but they are insufficient to understand u s preferences and x s relevant properties. 32
33 Latent Features Idea: construct user attributes ψ u and item attributes φ x such that ratings in training data can be predicted accurately. Decision function: f Ψ,Φ u, x = ψ u T φ x Prediction is product of user preferences and item properties. Model parameters: Matrix Ψ of user features ψ u for all users, Matrix Φ of item features φ x for all items. 33
34 Latent Features Optimization criterion: Ψ, Φ = argminψ,φ x,u + λ ψ u 2 u l(yu,x, f Ψ,Φ u, x ) + φ x 2 x Feature vectors of all users and all Items are regularized 34
35 Latent Features Both item and user features are the solution of an optimization problem. Number of features k has to be set. Meaning of the features is not pre-determined. Sometimes they turn out to be interpretable. 35
36 Matrix Factorization Decision function: f Ψ,Φ u, x In matrix notation: Matrix elements: y 11 y 1m y m1 y mm = ψ u T φ x Y Ψ,Φ = ΨΦ T = ψ 11 ψ 1k ψ m1 ψ mk φ 11 φ m 1 φ 1k φ m k 36
37 M Z T D Matrix Factorization Decision function in matrix notation: Predicted rating of Matrix for Alice y 11 y 1m y m1 y mm = ψ 11 ψ 1k ψ m1 Alice Bob Carol ψ mk φ 11 φ m 1 φ 1k φ m k Latent features of Alice Latent features of Matrix 37
38 M Z T D Matrix Factorization Decision function in matrix notation: Latent features of Alice y 11 y 1m y m1 y mm = ψ 11 ψ 1k ψ m1 Alice Bob Carol ψ mk φ 11 φ m 1 φ 1k φ m k Latent features of Carol Latent features of Matrix Latent features of Death Proof 38
39 Matrix Factorization Optimization criterion: Ψ, Φ = argminψ,φ x,u + λ Ψ 2 + Φ 2 Criterion is not convex: l(yu,x, f Ψ,Φ u, x ) For instance, multiplying all feature vectors with -1 gives an equally good solution: f Ψ,Φ u, x = ψ u T φ x = ( ψ u T )( φ x ) Limiting the number of latent features to k restricts the rank of matrix Y. 39
40 Matrix Factorization Optimization criterion: Ψ, Φ = argminψ,φ Optimization by x,u + λ Ψ 2 + Φ 2 l(yx,u, Stochastic gradient descent or Alternating least squares. f Ψ,Φ u, x ) 40
41 Matrix Factorization by Stochastic Gradient Descent Iterate through ratings y u,x in training sample Let ψ u ψ u α f Ψ,Φ(u,x) ψ u Let φ x φ x α f Ψ,Φ(u,x) φ x Until convergence. Requires differentiable loss function; e.g., squared loss, 41
42 Matrix Factorization by Alternating Least Squares For squared loss and parallel architectures. Alternate between 2 optimization processes: Keep Φ fixed, optimize ψ u in parallel for all u. Keep Ψ fixed, optimize φ x in parallel for all x. 42
43 Matrix Factorization by Alternating Least Squares For squared loss and parallel architectures. Alternate between 2 optimization processes: Keep Φ fixed, optimize ψ u in parallel for all u. Keep Ψ fixed, optimize φ x in parallel for all x. Optimization criterion for Ψ: ψ u = argmin ψu y u y u 2 λ ψ u 2 = argmin ψu y u ψ u T Φ T 2 λ ψ u 2 y u1 y um = ψ u1 ψ uk φ 11 φ m 1 φ 1k φ m k 43
44 Matrix Factorization by Alternating Least Squares For squared loss and parallel architectures. Alternate between 2 optimization processes: Keep Φ fixed, optimize ψ u in parallel for all u. Keep Ψ fixed, optimize φ x in parallel for all x. Optimization criterion for Ψ: ψ u = argmin ψu y u ψ u T Φ T 2 λ ψ u 2 ψ u = ΦΦ T + λi 1 Φy u 44
45 Matrix Factorization by Alternating Least Squares For squared loss and parallel architectures. Alternate between 2 optimization processes: Keep Φ fixed, optimize ψ u in parallel for all u. Keep Ψ fixed, optimize φ x in parallel for all x. Optimization criterion for Ψ: ψ u = argmin ψu y u ψ u T Φ T 2 λ ψ u 2 ψ u = ΦΦ T + λi 1 Φy u Optimization criterion for Φ: φ x = argmin φx y x φ x T Ψ T 2 λ φ x 2 φ x = ΨΨ T + λi 1 Ψy x 45
46 Matrix Factorization by Alternating Least Squares For squared loss and parallel architectures. Initialize Ψ, Φ randomly. Repeat until convergence: Keep Ψ fixed, for all u in parallel: ψ u = ΦΦ T + λi 1 Φy u Keep Φ fixed, for all x in parallel: φ u = ΨΨ T + λi 1 Ψy x 46
47 Extensions: Biases Some users just give optimistic or pessimistic ratings; some items are hyped. Decision function: f Ψ,Φ,Bu,B x u, x = b u + b x + ψ u T φ x Optimization criterion: Ψ, Φ, B u, B x = argmin Ψ,Φ l(y x,u, f Ψ,Φ,Bu,B x u, x ) x,u + λ Ψ 2 + Φ 2 + B u 2 + Bx 2 47
48 Extensions: Explicit Features Often, explicit user and item features are available. Concatenate vectors ψ u and φ x ; explicit features are fixed, latent features are free paremeters. 48
49 Extensions: Temporal Dynamics How much a user likes an item depends on the point in time when the rating takes place. f Ψ,Φ,Bu,B x,t u, x = b u t + b x (t) + ψ u t T φ x 49
50 Summary Purely content-based recommendation: users don t benefit from other users ratings. Collaborative filtering by nearest neighbors: fixed definition of similarity of users. No model parameters, no learning. Has to iterate over data to make recommendation. Latent factor models, matrix factorization: user preferences and item properties are free parameters, optimized to minimized discrepancy between inferred and actual ratings. 50
Models, Data, Learning Problems
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Models, Data, Learning Problems Tobias Scheffer Overview Types of learning problems: Supervised Learning (Classification, Regression,
More informationMatrix Factorization Techniques For Recommender Systems. Collaborative Filtering
Matrix Factorization Techniques For Recommender Systems Collaborative Filtering Markus Freitag, Jan-Felix Schwarz 28 April 2011 Agenda 2 1. Paper Backgrounds 2. Latent Factor Models 3. Overfitting & Regularization
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationCollaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine
More informationMatrix Factorization Techniques for Recommender Systems
Matrix Factorization Techniques for Recommender Systems Patrick Seemann, December 16 th, 2014 16.12.2014 Fachbereich Informatik Recommender Systems Seminar Patrick Seemann Topics Intro New-User / New-Item
More informationLarge-Scale Matrix Factorization with Distributed Stochastic Gradient Descent
Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent KDD 2011 Rainer Gemulla, Peter J. Haas, Erik Nijkamp and Yannis Sismanis Presenter: Jiawen Yao Dept. CSE, UT Arlington 1 1
More informationRecommendation Systems
Recommendation Systems Pawan Goyal CSE, IITKGP October 21, 2014 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 21, 2014 1 / 52 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationPreliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!
Data Mining The art of extracting knowledge from large bodies of structured data. Let s put it to use! 1 Recommendations 2 Basic Recommendations with Collaborative Filtering Making Recommendations 4 The
More informationLinear Classifiers IV
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationMatrix Factorization Techniques for Recommender Systems
Matrix Factorization Techniques for Recommender Systems By Yehuda Koren Robert Bell Chris Volinsky Presented by Peng Xu Supervised by Prof. Michel Desmarais 1 Contents 1. Introduction 4. A Basic Matrix
More informationPoint-of-Interest Recommendations: Learning Potential Check-ins from Friends
Point-of-Interest Recommendations: Learning Potential Check-ins from Friends Huayu Li, Yong Ge +, Richang Hong, Hengshu Zhu University of North Carolina at Charlotte + University of Arizona Hefei University
More informationAndriy Mnih and Ruslan Salakhutdinov
MATRIX FACTORIZATION METHODS FOR COLLABORATIVE FILTERING Andriy Mnih and Ruslan Salakhutdinov University of Toronto, Machine Learning Group 1 What is collaborative filtering? The goal of collaborative
More informationRecommendation Systems
Recommendation Systems Popularity Recommendation Systems Predicting user responses to options Offering news articles based on users interests Offering suggestions on what the user might like to buy/consume
More informationCS425: Algorithms for Web Scale Data
CS: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS. The original slides can be accessed at: www.mmds.org Customer
More informationRecommendation Systems
Recommendation Systems Pawan Goyal CSE, IITKGP October 29-30, 2015 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 29-30, 2015 1 / 61 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation
More information* Matrix Factorization and Recommendation Systems
Matrix Factorization and Recommendation Systems Originally presented at HLF Workshop on Matrix Factorization with Loren Anderson (University of Minnesota Twin Cities) on 25 th September, 2017 15 th March,
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationCollaborative Filtering
Collaborative Filtering Nicholas Ruozzi University of Texas at Dallas based on the slides of Alex Smola & Narges Razavian Collaborative Filtering Combining information among collaborating entities to make
More informationAlgorithms for Collaborative Filtering
Algorithms for Collaborative Filtering or How to Get Half Way to Winning $1million from Netflix Todd Lipcon Advisor: Prof. Philip Klein The Real-World Problem E-commerce sites would like to make personalized
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationLarge-scale Collaborative Prediction Using a Nonparametric Random Effects Model
Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Kai Yu Joint work with John Lafferty and Shenghuo Zhu NEC Laboratories America, Carnegie Mellon University First Prev Page
More informationMatrix Factorization and Collaborative Filtering
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Matrix Factorization and Collaborative Filtering MF Readings: (Koren et al., 2009)
More informationBinary Principal Component Analysis in the Netflix Collaborative Filtering Task
Binary Principal Component Analysis in the Netflix Collaborative Filtering Task László Kozma, Alexander Ilin, Tapani Raiko first.last@tkk.fi Helsinki University of Technology Adaptive Informatics Research
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Recommender Systems Instructor: Yizhou Sun yzsun@cs.ucla.edu May 17, 2017 Methods Learnt: Last Lecture Classification Clustering Vector Data Text Data Recommender System Decision
More informationGenerative Models for Discrete Data
Generative Models for Discrete Data ddebarr@uw.edu 2016-04-21 Agenda Bayesian Concept Learning Beta-Binomial Model Dirichlet-Multinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers
More informationMatrix Factorization and Factorization Machines for Recommender Systems
Talk at SDM workshop on Machine Learning Methods on Recommender Systems, May 2, 215 Chih-Jen Lin (National Taiwan Univ.) 1 / 54 Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationMachine Teaching. for Personalized Education, Security, Interactive Machine Learning. Jerry Zhu
Machine Teaching for Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS 2015 Workshop on Machine Learning from and for Adaptive User Technologies Supervised Learning Review D:
More informationCollaborative topic models: motivations cont
Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.
More informationRestricted Boltzmann Machines for Collaborative Filtering
Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem
More informationContents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016
ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技法 ) Lecture 15: Matrix Factorization Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University
More informationLarge-scale Collaborative Ranking in Near-Linear Time
Large-scale Collaborative Ranking in Near-Linear Time Liwei Wu Depts of Statistics and Computer Science UC Davis KDD 17, Halifax, Canada August 13-17, 2017 Joint work with Cho-Jui Hsieh and James Sharpnack
More informationNotes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces
Notes on the framework of Ando and Zhang (2005 Karl Stratos 1 Beyond learning good functions: learning good spaces 1.1 A single binary classification problem Let X denote the problem domain. Suppose we
More information6.034 Introduction to Artificial Intelligence
6.34 Introduction to Artificial Intelligence Tommi Jaakkola MIT CSAIL The world is drowning in data... The world is drowning in data...... access to information is based on recommendations Recommending
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Language Models Tobias Scheffer Stochastic Language Models A stochastic language model is a probability distribution over words.
More informationLearning MN Parameters with Alternative Objective Functions. Sargur Srihari
Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient
More informationLecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron
CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationCollaborative Filtering Matrix Completion Alternating Least Squares
Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016
More informationPresentation in Convex Optimization
Dec 22, 2014 Introduction Sample size selection in optimization methods for machine learning Introduction Sample size selection in optimization methods for machine learning Main results: presents a methodology
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationRecommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University
2018 EE448, Big Data Mining, Lecture 10 Recommender Systems Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html Content of This Course Overview of
More informationA Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation
A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation Yue Ning 1 Yue Shi 2 Liangjie Hong 2 Huzefa Rangwala 3 Naren Ramakrishnan 1 1 Virginia Tech 2 Yahoo Research. Yue Shi
More informationNonnegative Matrix Factorization
Nonnegative Matrix Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationMaximum Margin Matrix Factorization for Collaborative Ranking
Maximum Margin Matrix Factorization for Collaborative Ranking Joint work with Quoc Le, Alexandros Karatzoglou and Markus Weimer Alexander J. Smola sml.nicta.com.au Statistical Machine Learning Program
More informationMachine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regularization, Sparsity & Lasso
Machine Learning Data Mining CS/CS/EE 155 Lecture 4: Regularization, Sparsity Lasso 1 Recap: Complete Pipeline S = {(x i, y i )} Training Data f (x, b) = T x b Model Class(es) L(a, b) = (a b) 2 Loss Function,b
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks
More informationAPPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS
APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS Yizhou Sun College of Computer and Information Science Northeastern University yzsun@ccs.neu.edu July 25, 2015 Heterogeneous Information Networks
More informationLarge-scale Ordinal Collaborative Filtering
Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit
More informationFrom Non-Negative Matrix Factorization to Deep Learning
The Math!! From Non-Negative Matrix Factorization to Deep Learning Intuitions and some Math too! luissarmento@gmailcom https://wwwlinkedincom/in/luissarmento/ October 18, 2017 The Math!! Introduction Disclaimer
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Intelligent Data Analysis Decision Trees Paul Prasse, Niels Landwehr, Tobias Scheffer Decision Trees One of many applications:
More informationCS 175: Project in Artificial Intelligence. Slides 4: Collaborative Filtering
CS 175: Project in Artificial Intelligence Slides 4: Collaborative Filtering 1 Topic 6: Collaborative Filtering Some slides taken from Prof. Smyth (with slight modifications) 2 Outline General aspects
More informationDecoupled Collaborative Ranking
Decoupled Collaborative Ranking Jun Hu, Ping Li April 24, 2017 Jun Hu, Ping Li WWW2017 April 24, 2017 1 / 36 Recommender Systems Recommendation system is an information filtering technique, which provides
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Predictive Models Predictive Models 1 / 34 Outline
More informationRecommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007
Recommender Systems Dipanjan Das Language Technologies Institute Carnegie Mellon University 20 November, 2007 Today s Outline What are Recommender Systems? Two approaches Content Based Methods Collaborative
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationOptimization Methods for Machine Learning
Optimization Methods for Machine Learning Sathiya Keerthi Microsoft Talks given at UC Santa Cruz February 21-23, 2017 The slides for the talks will be made available at: http://www.keerthis.com/ Introduction
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationarxiv: v2 [cs.ir] 14 May 2018
A Probabilistic Model for the Cold-Start Problem in Rating Prediction using Click Data ThaiBinh Nguyen 1 and Atsuhiro Takasu 1, 1 Department of Informatics, SOKENDAI (The Graduate University for Advanced
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 11, 2016 Paper presentations and final project proposal Send me the names of your group member (2 or 3 students) before October 15 (this Friday)
More informationModeling User Rating Profiles For Collaborative Filtering
Modeling User Rating Profiles For Collaborative Filtering Benjamin Marlin Department of Computer Science University of Toronto Toronto, ON, M5S 3H5, CANADA marlin@cs.toronto.edu Abstract In this paper
More informationData Science Mastery Program
Data Science Mastery Program Copyright Policy All content included on the Site or third-party platforms as part of the class, such as text, graphics, logos, button icons, images, audio clips, video clips,
More informationIntroduction to Logistic Regression
Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationl 1 and l 2 Regularization
David Rosenberg New York University February 5, 2015 David Rosenberg (New York University) DS-GA 1003 February 5, 2015 1 / 32 Tikhonov and Ivanov Regularization Hypothesis Spaces We ve spoken vaguely about
More informationCS425: Algorithms for Web Scale Data
CS: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS. The original slides can be accessed at: www.mmds.org J. Leskovec,
More informationPredictive analysis on Multivariate, Time Series datasets using Shapelets
1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,
More informationLecture Notes 10: Matrix Factorization
Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To
More informationMixed Membership Matrix Factorization
Mixed Membership Matrix Factorization Lester Mackey University of California, Berkeley Collaborators: David Weiss, University of Pennsylvania Michael I. Jordan, University of California, Berkeley 2011
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More informationLarge-scale Information Processing, Summer Recommender Systems (part 2)
Large-scale Information Processing, Summer 2015 5 th Exercise Recommender Systems (part 2) Emmanouil Tzouridis tzouridis@kma.informatik.tu-darmstadt.de Knowledge Mining & Assessment SVM question When a
More informationCollaborative Filtering on Ordinal User Feedback
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Collaborative Filtering on Ordinal User Feedback Yehuda Koren Google yehudako@gmail.com Joseph Sill Analytics Consultant
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationSQL-Rank: A Listwise Approach to Collaborative Ranking
SQL-Rank: A Listwise Approach to Collaborative Ranking Liwei Wu Depts of Statistics and Computer Science UC Davis ICML 18, Stockholm, Sweden July 10-15, 2017 Joint work with Cho-Jui Hsieh and James Sharpnack
More informationUsing SVD to Recommend Movies
Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last
More informationStochastic Gradient Descent. Ryan Tibshirani Convex Optimization
Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationCollaborative Recommendation with Multiclass Preference Context
Collaborative Recommendation with Multiclass Preference Context Weike Pan and Zhong Ming {panweike,mingz}@szu.edu.cn College of Computer Science and Software Engineering Shenzhen University Pan and Ming
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 21: Review Jan-Willem van de Meent Schedule Topics for Exam Pre-Midterm Probability Information Theory Linear Regression Classification Clustering
More informationINFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from
INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 8: Evaluation & SVD Paul Ginsparg Cornell University, Ithaca, NY 20 Sep 2011
More informationMixed Membership Matrix Factorization
Mixed Membership Matrix Factorization Lester Mackey 1 David Weiss 2 Michael I. Jordan 1 1 University of California, Berkeley 2 University of Pennsylvania International Conference on Machine Learning, 2010
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationTechniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods
Techniques for Dimensionality Reduction PCA and Other Matrix Factorization Methods Outline Principle Compoments Analysis (PCA) Example (Bishop, ch 12) PCA as a mixture model variant With a continuous latent
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationCollaborative Filtering: A Machine Learning Perspective
Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics
More information1-Bit Matrix Completion
1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More information