Large-scale Information Processing, Summer Recommender Systems (part 2)

Similar documents
Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

The Multi-Armed Bandit Problem

Collaborative Filtering. Radek Pelánek

The Multi-Armed Bandit Problem

Bandits and Exploration: How do we (optimally) gather information? Sham M. Kakade

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3

Sparse Linear Contextual Bandits via Relevance Vector Machines

New Algorithms for Contextual Bandits

Active Learning and Optimized Information Gathering

1 [15 points] Frequent Itemsets Generation With Map-Reduce

Sequential Recommender Systems

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017

Advanced Machine Learning

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Multi-Armed Bandits. Credit: David Silver. Google DeepMind. Presenter: Tianlu Wang

Evaluation of multi armed bandit algorithms and empirical algorithm

Bandits for Online Optimization

Ad Placement Strategies

Learning with Exploration

Bayesian Contextual Multi-armed Bandits

Stochastic Contextual Bandits with Known. Reward Functions

Lecture 2: Learning from Evaluative Feedback. or Bandit Problems

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Decoupled Collaborative Ranking

Online Learning with Feedback Graphs

Counterfactual Evaluation and Learning

The Multi-Arm Bandit Framework

Collaborative topic models: motivations cont

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Matrix Factorization Techniques for Recommender Systems

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits John Langford and Tong Zhang

Learning to play K-armed bandit problems

Basics of reinforcement learning

Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems

An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

STA141C: Big Data & High Performance Statistical Computing

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati

Exploration. 2015/10/12 John Schulman

1 MDP Value Iteration Algorithm

y Xw 2 2 y Xw λ w 2 2

Learning Optimal Online Advertising Portfolios with Periodic Budgets

Outline. Offline Evaluation of Online Metrics Counterfactual Estimation Advanced Estimators. Case Studies & Demo Summary

Stochastic Analogues to Deterministic Optimizers

Hybrid Machine Learning Algorithms

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

* Matrix Factorization and Recommendation Systems

Lecture 10 : Contextual Bandits

Counterfactual Model for Learning Systems

Profile-Based Bandit with Unknown Profiles

Reinforcement Learning

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Multi-task Linear Bandits

Ad Placement Strategies

Collaborative Filtering Matrix Completion Alternating Least Squares

Lecture 15: Bandit problems. Markov Processes. Recall: Lotteries and utilities

Contextual Bandits in A Collaborative Environment

Annealing-Pareto Multi-Objective Multi-Armed Bandit Algorithm

ECE521 lecture 4: 19 January Optimization, MLE, regularization

Reducing contextual bandits to supervised learning

Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Introduction to Logistic Regression

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Sequential and reinforcement learning: Stochastic Optimization I

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning

Graphs in Machine Learning

Discover Relevant Sources : A Multi-Armed Bandit Approach

Talk on Bayesian Optimization

Linear Regression (continued)

Online Learning under Full and Bandit Information

LogUCB: An Explore-Exploit Algorithm For Comments Recommendation

An Experimental Evaluation of High-Dimensional Multi-Armed Bandits

Online Learning: Bandit Setting

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Reinforcement Learning

Online Learning and Sequential Decision Making

RL 3: Reinforcement Learning

Linear classifiers: Overfitting and regularization

ECS289: Scalable Machine Learning

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Multi-armed Bandits in the Presence of Side Observations in Social Networks

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

Machine Learning & Data Mining CMS/CS/CNS/EE 155. Lecture 2: Perceptron & Gradient Descent

Transferable Contextual Bandit for Cross-Domain Recommendation

Contextual Online Learning for Multimedia Content Aggregation

An Adaptive Algorithm for Selecting Profitable Keywords for Search-Based Advertising Services

1 A Support Vector Machine without Support Vectors

Estimation Considerations in Contextual Bandits

Bandit models: a tutorial

Transcription:

Large-scale Information Processing, Summer 2015 5 th Exercise Recommender Systems (part 2) Emmanouil Tzouridis tzouridis@kma.informatik.tu-darmstadt.de Knowledge Mining & Assessment

SVM question When a point has ξ > 0 is support vector?

SVM question Use equations from dual

SVM question Use equations from dual ξ>0 a = C support vector

What is line search? Line search

Line search What is line search? Given an update direction, we need to find a satisfactory step size? (Approximately) find minimizer of g

Line search (Approximately) find minimizer of g Exact line search Expensive! We need good step size without making too much effort on finding it

Line search (Approximately) find minimizer of g Bad steps Good steps

Wolfe conditions Line search

Line search Wolfe conditions Sufficient decrease Decrease is proportional to step length and derivative

Line search Wolfe conditions Sufficient decrease Decrease is proportional to step length and derivative But this does not eliminate very small steps

Line search Wolfe conditions Sufficient decrease Strong curvature condition Ensures that slope in new point is sufficiently greater than earlier slope Do not allow slope to be too positive

Goldstein conditions Line search

Line search Goldstein conditions Sufficient decrease + prevent small solutions Might exclude minimizers

Backtracking? Line Search

Line Search Backtracking Start with a big η and decrease it until condition is met

Multi armed Bandits Many slot machines Which machine to play? Maximize gains/rewards

Multi armed Bandits Many slot machines Which machine to play? Maximize gains/rewards Can model Recommender Systems Slot machines (or Arms) are the items Maximize Click-through-rate (or other metric)

Multi armed Bandits Exploration Exploitation dilemma Should we keep using the current best arm? Exploitation Should we we pick arms at random to gather data? Exploration Only Exploitation Make assumptions using only few data Only Exploration Only gather data without using the knowledge from them Trade-off ε-greedy UCB

ε-greedy With ε probabily explore With 1-ε probabily exploite

UCB Upper Confidence Bound Use upper Bound of reward for picking an arm Pick this one Expected reward is less but upper bound is larger

UCB Upper Confidence Bound Use upper Bound of reward for picking an arm Implications? Pick this one Expected reward is less but upper bound is larger

UCB Upper Confidence Bound Use upper Bound of reward for picking an arm Implications? Huge variance then it is picked Explore Huge expected reward then it is picked Exploit

LinUCB Contextual recommendations Personalized Context can vary User profile Item content Season information } Lederhose e.g. How likely is that a Bavarian guy wants a before Oktoberfest?

LinUCB Contextual recommendations Personalized Context can vary User profile Item content Season information } Lederhose e.g. How likely is that a Bavarian guy wants a before Oktoberfest?

LinUCB Reward is linear to the contexts Use ridge regression to learn parameters Model variance Arm selection

LinUCB

UCB New items in UCB? Cold start problem?

UCB New items in UCB? Cold start problem? No New item means high uncertainty high upper confidence bound

UCB New items in UCB? Cold start problem? No New item means high uncertainty high upper confidence bound Logarithmic regret Difference of optimal reward from reward that we got

Matrix Factorization Sophisticated collaborative filtering Reconstruct ratings using latent factors Preferences of Users Item attributes R = Q P

Matrix Factorization Objective Function q, p are unknown Non-convex Objective Assuming known q Convex minimization for p Alternating least squares

Matrix Factorization Deriving a SGD learning

Matrix Factorization Deriving a SGD learning Work on specific ratings (r_ui)

Matrix Factorization Deriving a SGD learning Work on specific ratings (r_ui)

Matrix Factorization Deriving a SGD learning Work on specific ratings (r_ui)

Matrix Factorization Extensions Bias Different users rate differently (same for items) Implicit feedback No ratings #Clicks is a metric of confidence for user preference Use confidence to adjust loss

Thank you for your attention!