Recommender systems, matrix factorization, variable selection and social graph data
|
|
- Helen Armstrong
- 6 years ago
- Views:
Transcription
1 Recommender systems, matrix factorization, variable selection and social graph data Julien Delporte & Stéphane Canu StatLearn, april 205, Grenoble
2 Road map Model selection for matrix factorization Recommender systems Factorize to recommend Model selection 2 Socially enabled preference learning Tuenti s problem Modeling social preferences Alternating least square minimization Experimental results on real data
3 Recommendations
4 Recommend items, movies, lines of code...
5 Recommendation as matrix completion V Y U d? d F = <U ; V > hui, Vj i estimates the desire of user i for the product j
6 Problem Model Y = F + R observation = information + noise F regular and R noise matrix Problem: build F to estimate F min F L(Y, }{{ F) } data loss + λ Ω( F) }{{} penalization for λ well chosen F = UV t low rank factorization
7 Problem Model Y = F + R observation = information + noise F regular and R noise matrix Problem: build F to estimate F min F L(Y, }{{ F) } data loss + λ Ω( F) }{{} penalization for λ well chosen F = UV t low rank factorization
8 How to estimate U et V? Low rank matrix Y approximation min F with L(Y, F) := Y F 2 F rank( F) = d Eckart & Young 936 Solution: principal component analysis (PCA) F = UV t Y n = normalise(y ) Large scale adaptation C = Y n Y n OUT OF MEMORY d = 3 we need more 200 V, Σ = eig(c, d) full sparse SVD F = UV
9 How to estimate U et V? Low rank matrix Y approximation min F with L(Y, F) := Y F 2 F rank( F) = d Eckart & Young 936 Solution: principal component analysis (PCA) F = UV t Y n = normalise(y ) Large scale adaptation C = Y n Y n OUT OF MEMORY d = 3 we need more 200 V, Σ = eig(c, d) full sparse SVD F = UV
10 How to implement PCA (to retrieve U and V )? Singular value decomposition (SVD) F = U t V (= ŨΣV t ) Exact solution on sparse matrices (??? = 0) svds too slow use adapted Lanczos process bidiagonalization Y = WBZ diagonalize B (using Givens rotations GVL 8.6.) adapted software: PROPACK a a rmunk/propack/ >> size(y) ans = >> d = 200; >> tic >> [U,D,V] = lansvd(y,d, L ); >> toc Elapsed time is seconds. 27 sec at
11 Factorize for million Netflix Challenge ( ) Task: movies recommendation CineMatch: Netflix rec. engine Improve performances by 0% Criterion: square error Data users movies 00 millons scores (from to 5) sparcity rate:,3 % test on 3 millions (most recent ones) download
12 baseline: CineMatch factorization: SVD with d factors U i, V j IR d min U,V (Y ij Ui V j ) 2 + λ( U i 2 + V i 2 ) i,j normalization global mean µ, for user i b i (Y ij Ui V j µ b i p j ) 2 + λ( U i 2 + V j 2 ) min U,V i,j weight implicit feedback c ij confidence level min U,V c ij (Y ij Ui V j µ b i p j ) 2 + λ( U i 2 + V j 2 ) i,j time (Y ij U i (t) V j µ b i (t) p j (t)) 2 + λ( U i 2 + V j 2 ) min U,V i,j mixture of models
13 Netflix results initiale error: factorization - 4 % number of factors -.5 % improvments % normalization implicit feedback time bagging - 3 % = mixing 00 methods Koren, Y., Bell, R.M., Volinsky, C.: Matrix factorization techniques for recommender systems. IEEE Computer 42(8), (2009)
14 Lessons from Netflix too specific efforts to be generalized factorization is the solution implicit feedback helps take into account the specificities (time effect) stochastic gradient (thanks Yann) flexible scalable
15 Road map Model selection for matrix factorization Recommender systems Factorize to recommend Model selection 2 Socially enabled preference learning Tuenti s problem Modeling social preferences Alternating least square minimization Experimental results on real data
16 Model selection for factorization Objectif How to choose d (the number of latent variables) V Y U d? d F = <U ; V >
17 Chosing d is chosing λ Modèle Estimator F Y = F + R i.i.d. R ij N (0, s 2 ) F = argmin Y M 2 + pen(λ, Y, M) M Certain pen SVD + shrinkage F = f λ (Y ) (Candès et al, 202) Y = UΣV then F = fλ (Y ) = p k= f (k) λ (σ k)u k V k Examples f (k) λ (σ k) = max(0, σ k λ) (soft shrinkage) = max(0, min(σ, γ(σ k λ)) (firm shrinkage)
18 A criteria to choose λ model: Y = F + R R ij N (0, s 2 ) i.i.d. estimator family: F = f λ (Y ) estimator cost: prediction error C(Y, λ) = F f λ (Y ) 2 = i j (F ij f λ (Y ) ij ) 2 Oracle: λ = argmin F f λ (Y ) 2 λ Stein Unbiased Risk Estimator (Stein, 977) δ 0 (Y, λ) = Y f λ (Y ) 2 + (2div Y ( fλ (Y ) ) np)ŝ 2 If R is i.i.d. Gaussian, then δ 0 (Y, λ) is an unbiased estimator of F f λ (Y ) 2 (SURE) λ = argmin δ 0 (Y, λ) λ
19 SURE δ 0 (Y, λ) min λ δ 0 (Y, λ) = Y f λ (Y ) 2 + (2div Y ( fλ (Y ) ) np)ŝ 2 Y f λ (Y ) 2 empirical error f λ the shrinking function and f λ its derivative ŝ 2 unbiased estimator independent from s 2 div Y ( fλ (Y ) ) degrees of freedom n lines and p columns The divergence div Y ( f λ(y ) ) = p i= ( f λ (σ i) + (n p) f ) λ(σ i ) + 2 σ i p i= p σ i f λ (σ i ) σi 2 σj 2 j= j i σ i small f λ (σ i ) = 0
20 Starting point with min λ δ 0 (Y, λ) = Y f λ (Y ) 2 + (2div Y ( fλ (Y ) ) np)ŝ 2 div Y ( f (Y ) ) = p i= ( f i (σ i) + (n p) f ) i(σ i ) + 2 σ i p i= p j= j i σ i f i (σ i ) σ 2 i σ 2 j Scaling issues prior: propose other functions f λ. div doesn t scale: estimate div. variance ŝ 2 is supposed to be known
21 Thresholding F = f λ (Y ) = d k= f (k) λ (σ k)u k V k f (k) λ (σ k) = max(0, σ k λ) (σ k) = MCP f (k) λ f (k) λ (σ k) = SCAD Soft Mcp Scad Adalasso Soft Mcp Scad Adalasso penalty functions (left) and associated thresholds (right)
22 Other thresholding functions Soft Mcp Scad Ada Risque estimé λ Risk estimation as a function of λ
23 Divergence estimation Problem div Y ( f (Y ) ) = p i= ( f i (σ i) + (n p) f ) i(σ i ) + 2 σ i p i= p σ i f i (σ i ) j= j i σ 2 i σ 2 j Solution calculate k largest singular values. estimate the remaning part of the spectrum.
24 Divergence estimation (II) Theorem (Marchenko-Pastur) Let X L(p, n) a centered random matrix. The eigenvalues λ i of the the matrix n X X are distributed (when considered as random variables) according to a Marchenko-Pastur law of parameter c > 0 with probability density { P c (x) = 2πxc (x a)(b x) si a x b, 0 sinon. a = ( c) 2 and b = ( + c) 2.
25 Divergence estimation (III) divergence snr = 0, x réelle marchenko linéaire divergence λ x0 4 snr = 2 0 réelle marchenko linéaire λ Comparizon of divergence estimations
26 Correlated case (with D. Fourdrinier) Y = F + R R N (0, Σ) Elliptically Symmetric Distributions If R admit a density, x Σ n/2 g ( tr(xσ x t ) ), Invariant loss F F 2 Σ = tr(f F)Σ (F F) δ inv 0 = c Y f λ (Y ) 2 S + 2 div Y f λ (Y ) np is IE -unbiased (with IE = IE in the gaussian case) S estimating Σ c = n k p
27 Summary factorization works it is adaptable some more work remains to provide robust & efficient model selection
28 Recommendation Recommandation & liens 26 / 36 Road map Model selection for matrix factorization Recommender systems Factorize to recommend Factorization and singular values decomposition Netflix and the factorization Model selection Model selection for factorization Scaling 2 Socially enabled preference learning Tuenti s problem Modeling social preferences Alternating least square minimization Experimental results on real data
29 Recommendation Recommandation & liens 27 / 36 Tuenti s problem Size 3M people 00K places 200M social connexions user user 2 user 3 user Really sparse 4 places per user 20 friends per user
30 Recommendation Recommandation & liens Objective V Y U d? d F = <U ; V > 28 / 36
31 Recommendation Recommandation & liens Objective V Y U d? d A 28 / 36
32 Recommendation Recommandation & liens 29 / 36 Existing approaches Cost functions min L (Y, UV ) + µl 2 (A, UU ) + λω(u, V ) Yang et al. (20) U,V n min L(Y, UV ) + µ U i U k 2 F U,V F i + λω(u, V ) Ma et al. (20) i= k F i F i set of i s friends A ii = if i F i Decision function min U,V (i,j) Y ( c ij Ui V j + A ik U k V j ) 2 Y ij + ΩU,V,A Ma et al. (2009) F i k F }{{ i } ˆF ij
33 Recommendation Recommandation & liens 30 / 36 Decision function Decision function ˆF ij = U i V j + k Fi A ik U k V j F i Ma et al. (2009) () Loss function min U,V,A (i,j) Y c ij ( Ui V j + k F i A ik U k V j F i Y ij ) 2 + ΩU,V,A (2) c ij penalizing more than 0. Ω U,V,A = λ U 2 Fro + λ 2 V 2 Fro + λ 3 A 2 Fro Contribution Learn friendship influence
34 Recommendation Recommandation & liens 3 / 36 Algorithm Alternating least square minimization SE CoFi Input: Y and F intialize U,V and A Repeat For each user i U update U i For each item j V update V j For each user i U For each user k F i update A ik Until convergence
35 Recommendation Recommandation & liens 32 / 36 Tuenti and Epinions dataset Tuenti: 3 M users and 00 k places (20 friends) Epinion: 50 k users et 00 k items (5 friends)
36 Recommendation Recommandation & liens Tuenti and Epinions 0.95 imf SE CoFi LLA RSR Trust average MAP Nombre de facteurs d RANK imf SECoFi LLA RSR Trust Average Nombre de facteurs d Tuenti: 3 M users and 00 k places (20 friends) SECoFi Trust LLA RSR imf average Epinion: 50 k users et 00 k items (5 friends) MAP RANK using 24 cores with 00Go RAM Nombre de facteurs d SECoFi Trust LLA RSR imf average Nombre de facteurs d 33 / 36
37 Recommendation Recommandation & liens 34 / 36 Back on friends influence frequency of the histogram Distribution Tuenti values of ἀalpha Epinion Affinity principle (homophily effect), no negative influence
38 Recommendation Recommandation & liens 35 / 36 Conclusion Summary matrix AND graph large scale datasets recommendation improvement influence measure Perspectives take advantage of the framework take into account more contextual data GPS, places types,... towards friends recommendations (learn α)
39 Recommendation Recommandation & liens Questions? V Y U d? d F = <U ; V > Matrix factorization, recommender systems, personalized with preferences Julien DELPORTE PhD 36 / 36
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,
More informationBinary Principal Component Analysis in the Netflix Collaborative Filtering Task
Binary Principal Component Analysis in the Netflix Collaborative Filtering Task László Kozma, Alexander Ilin, Tapani Raiko first.last@tkk.fi Helsinki University of Technology Adaptive Informatics Research
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie
More information* Matrix Factorization and Recommendation Systems
Matrix Factorization and Recommendation Systems Originally presented at HLF Workshop on Matrix Factorization with Loren Anderson (University of Minnesota Twin Cities) on 25 th September, 2017 15 th March,
More informationAndriy Mnih and Ruslan Salakhutdinov
MATRIX FACTORIZATION METHODS FOR COLLABORATIVE FILTERING Andriy Mnih and Ruslan Salakhutdinov University of Toronto, Machine Learning Group 1 What is collaborative filtering? The goal of collaborative
More informationMatrix Factorization Techniques for Recommender Systems
Matrix Factorization Techniques for Recommender Systems By Yehuda Koren Robert Bell Chris Volinsky Presented by Peng Xu Supervised by Prof. Michel Desmarais 1 Contents 1. Introduction 4. A Basic Matrix
More informationRecommendation Systems
Recommendation Systems Pawan Goyal CSE, IITKGP October 21, 2014 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 21, 2014 1 / 52 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation
More informationCollaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine
More informationRecommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007
Recommender Systems Dipanjan Das Language Technologies Institute Carnegie Mellon University 20 November, 2007 Today s Outline What are Recommender Systems? Two approaches Content Based Methods Collaborative
More informationUsing SVD to Recommend Movies
Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last
More informationDecoupled Collaborative Ranking
Decoupled Collaborative Ranking Jun Hu, Ping Li April 24, 2017 Jun Hu, Ping Li WWW2017 April 24, 2017 1 / 36 Recommender Systems Recommendation system is an information filtering technique, which provides
More informationMatrix Factorization Techniques For Recommender Systems. Collaborative Filtering
Matrix Factorization Techniques For Recommender Systems Collaborative Filtering Markus Freitag, Jan-Felix Schwarz 28 April 2011 Agenda 2 1. Paper Backgrounds 2. Latent Factor Models 3. Overfitting & Regularization
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More informationHomework 1. Yuan Yao. September 18, 2011
Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationRecommendation Systems
Recommendation Systems Pawan Goyal CSE, IITKGP October 29-30, 2015 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 29-30, 2015 1 / 61 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation
More informationLearning with Singular Vectors
Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:
More informationAlgorithms for Collaborative Filtering
Algorithms for Collaborative Filtering or How to Get Half Way to Winning $1million from Netflix Todd Lipcon Advisor: Prof. Philip Klein The Real-World Problem E-commerce sites would like to make personalized
More informationAdaptive one-bit matrix completion
Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More informationPrincipal Component Analysis (PCA) for Sparse High-Dimensional Data
AB Principal Component Analysis (PCA) for Sparse High-Dimensional Data Tapani Raiko, Alexander Ilin, and Juha Karhunen Helsinki University of Technology, Finland Adaptive Informatics Research Center Principal
More informationENGG5781 Matrix Analysis and Computations Lecture 0: Overview
ENGG5781 Matrix Analysis and Computations Lecture 0: Overview Wing-Kin (Ken) Ma 2018 2019 Term 1 Department of Electronic Engineering The Chinese University of Hong Kong Course Information W.-K. Ma, ENGG5781
More informationA Modified PMF Model Incorporating Implicit Item Associations
A Modified PMF Model Incorporating Implicit Item Associations Qiang Liu Institute of Artificial Intelligence College of Computer Science Zhejiang University Hangzhou 31007, China Email: 01dtd@gmail.com
More informationLow Rank Matrix Completion Formulation and Algorithm
1 2 Low Rank Matrix Completion and Algorithm Jian Zhang Department of Computer Science, ETH Zurich zhangjianthu@gmail.com March 25, 2014 Movie Rating 1 2 Critic A 5 5 Critic B 6 5 Jian 9 8 Kind Guy B 9
More informationMatrix Completion: Fundamental Limits and Efficient Algorithms
Matrix Completion: Fundamental Limits and Efficient Algorithms Sewoong Oh PhD Defense Stanford University July 23, 2010 1 / 33 Matrix completion Find the missing entries in a huge data matrix 2 / 33 Example
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationMatrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015
Matrix Factorization In Recommender Systems Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015 Table of Contents Background: Recommender Systems (RS) Evolution of Matrix
More informationCollaborative Filtering
Collaborative Filtering Nicholas Ruozzi University of Texas at Dallas based on the slides of Alex Smola & Narges Razavian Collaborative Filtering Combining information among collaborating entities to make
More informationCS 175: Project in Artificial Intelligence. Slides 4: Collaborative Filtering
CS 175: Project in Artificial Intelligence Slides 4: Collaborative Filtering 1 Topic 6: Collaborative Filtering Some slides taken from Prof. Smyth (with slight modifications) 2 Outline General aspects
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationCPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018
CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationFunctional SVD for Big Data
Functional SVD for Big Data Pan Chao April 23, 2014 Pan Chao Functional SVD for Big Data April 23, 2014 1 / 24 Outline 1 One-Way Functional SVD a) Interpretation b) Robustness c) CV/GCV 2 Two-Way Problem
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationFast Adaptive Algorithm for Robust Evaluation of Quality of Experience
Fast Adaptive Algorithm for Robust Evaluation of Quality of Experience Qianqian Xu, Ming Yan, Yuan Yao October 2014 1 Motivation Mean Opinion Score vs. Paired Comparisons Crowdsourcing Ranking on Internet
More informationLearning to Learn and Collaborative Filtering
Appearing in NIPS 2005 workshop Inductive Transfer: Canada, December, 2005. 10 Years Later, Whistler, Learning to Learn and Collaborative Filtering Kai Yu, Volker Tresp Siemens AG, 81739 Munich, Germany
More informationInformation Retrieval
Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices
More informationApproximate Principal Components Analysis of Large Data Sets
Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis
More informationCollaborative Filtering: A Machine Learning Perspective
Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics
More informationIntroduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond
PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation
More informationLatent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology
Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,
More informationNCDREC: A Decomposability Inspired Framework for Top-N Recommendation
NCDREC: A Decomposability Inspired Framework for Top-N Recommendation Athanasios N. Nikolakopoulos,2 John D. Garofalakis,2 Computer Engineering and Informatics Department, University of Patras, Greece
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationPoint-of-Interest Recommendations: Learning Potential Check-ins from Friends
Point-of-Interest Recommendations: Learning Potential Check-ins from Friends Huayu Li, Yong Ge +, Richang Hong, Hengshu Zhu University of North Carolina at Charlotte + University of Arizona Hefei University
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationLarge-scale Collaborative Ranking in Near-Linear Time
Large-scale Collaborative Ranking in Near-Linear Time Liwei Wu Depts of Statistics and Computer Science UC Davis KDD 17, Halifax, Canada August 13-17, 2017 Joint work with Cho-Jui Hsieh and James Sharpnack
More informationMatrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University
Matrix completion: Fundamental limits and efficient algorithms Sewoong Oh Stanford University 1 / 35 Low-rank matrix completion Low-rank Data Matrix Sparse Sampled Matrix Complete the matrix from small
More informationLecture 2 Part 1 Optimization
Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss
More informationJeffrey D. Ullman Stanford University
Jeffrey D. Ullman Stanford University 2 Often, our data can be represented by an m-by-n matrix. And this matrix can be closely approximated by the product of two matrices that share a small common dimension
More informationLatent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology
Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationNonconvex penalties: Signal-to-noise ratio and algorithms
Nonconvex penalties: Signal-to-noise ratio and algorithms Patrick Breheny March 21 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/22 Introduction In today s lecture, we will return to nonconvex
More informationA few applications of the SVD
A few applications of the SVD Many methods require to approximate the original data (matrix) by a low rank matrix before attempting to solve the original problem Regularization methods require the solution
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationMatrix Factorization and Collaborative Filtering
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Matrix Factorization and Collaborative Filtering MF Readings: (Koren et al., 2009)
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationMatrix and Tensor Factorization from a Machine Learning Perspective
Matrix and Tensor Factorization from a Machine Learning Perspective Christoph Freudenthaler Information Systems and Machine Learning Lab, University of Hildesheim Research Seminar, Vienna University of
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationDepartment of Computer Science, Guiyang University, Guiyang , GuiZhou, China
doi:10.21311/002.31.12.01 A Hybrid Recommendation Algorithm with LDA and SVD++ Considering the News Timeliness Junsong Luo 1*, Can Jiang 2, Peng Tian 2 and Wei Huang 2, 3 1 College of Information Science
More informationSOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu
SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray
More information4 Bias-Variance for Ridge Regression (24 points)
2 count = 0 3 for x in self.x_test_ridge: 4 5 prediction = np.matmul(self.w_ridge,x) 6 ###ADD THE COMPUTED MEAN BACK TO THE PREDICTED VECTOR### 7 prediction = self.ss_y.inverse_transform(prediction) 8
More informationPreliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!
Data Mining The art of extracting knowledge from large bodies of structured data. Let s put it to use! 1 Recommendations 2 Basic Recommendations with Collaborative Filtering Making Recommendations 4 The
More information2.3. Clustering or vector quantization 57
Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :
More informationSCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering
SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering Jianping Shi Naiyan Wang Yang Xia Dit-Yan Yeung Irwin King Jiaya Jia Department of Computer Science and Engineering, The Chinese
More informationLecture 8: February 9
0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we
More informationRegularization Parameter Estimation for Least Squares: A Newton method using the χ 2 -distribution
Regularization Parameter Estimation for Least Squares: A Newton method using the χ 2 -distribution Rosemary Renaut, Jodi Mead Arizona State and Boise State September 2007 Renaut and Mead (ASU/Boise) Scalar
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationLearning representations
Learning representations Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 4/11/2016 General problem For a dataset of n signals X := [ x 1 x
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Recommendation. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Recommendation Tobias Scheffer Recommendation Engines Recommendation of products, music, contacts,.. Based on user features, item
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationStructured matrix factorizations. Example: Eigenfaces
Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More informationPrincipal Component Analysis (PCA) for Sparse High-Dimensional Data
AB Principal Component Analysis (PCA) for Sparse High-Dimensional Data Tapani Raiko Helsinki University of Technology, Finland Adaptive Informatics Research Center The Data Explosion We are facing an enormous
More informationCS540 Machine learning Lecture 5
CS540 Machine learning Lecture 5 1 Last time Basis functions for linear regression Normal equations QR SVD - briefly 2 This time Geometry of least squares (again) SVD more slowly LMS Ridge regression 3
More informationLarge-Scale Matrix Factorization with Distributed Stochastic Gradient Descent
Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent KDD 2011 Rainer Gemulla, Peter J. Haas, Erik Nijkamp and Yannis Sismanis Presenter: Jiawen Yao Dept. CSE, UT Arlington 1 1
More informationRecommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University
2018 EE448, Big Data Mining, Lecture 10 Recommender Systems Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html Content of This Course Overview of
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationa Short Introduction
Collaborative Filtering in Recommender Systems: a Short Introduction Norm Matloff Dept. of Computer Science University of California, Davis matloff@cs.ucdavis.edu December 3, 2016 Abstract There is a strong
More informationSingular Value Decompsition
Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost
More informationImproved Bounded Matrix Completion for Large-Scale Recommender Systems
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-7) Improved Bounded Matrix Completion for Large-Scale Recommender Systems Huang Fang, Zhen Zhang, Yiqun
More informationLinear classifiers: Overfitting and regularization
Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x
More informationSTATISTICAL LEARNING SYSTEMS
STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 21: Review Jan-Willem van de Meent Schedule Topics for Exam Pre-Midterm Probability Information Theory Linear Regression Classification Clustering
More informationMatrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang
Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring
More information1 Singular Value Decomposition and Principal Component
Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationCS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu
CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationAn efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao
More informationDimensionality Reduction
394 Chapter 11 Dimensionality Reduction There are many sources of data that can be viewed as a large matrix. We saw in Chapter 5 how the Web can be represented as a transition matrix. In Chapter 9, the
More information