Estimation of the Click Volume by Large Scale Regression Analysis May 15, / 50

Similar documents
Estimation of the Click Volume by Large Scale Regression Analysis

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction

1 Review of the dot product

(v, w) = arccos( < v, w >

(v, w) = arccos( < v, w >

Linear Algebra March 16, 2019

(v, w) = arccos( < v, w >

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

From Non-Negative Matrix Factorization to Deep Learning

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

Dot Products, Transposes, and Orthogonal Projections

Projections and Least Square Solutions. Recall that given an inner product space V with subspace W and orthogonal basis for

14 Singular Value Decomposition

Structured matrix factorizations. Example: Eigenfaces

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002

Linear Algebra, Summer 2011, pt. 2

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Math 416, Spring 2010 Gram-Schmidt, the QR-factorization, Orthogonal Matrices March 4, 2010 GRAM-SCHMIDT, THE QR-FACTORIZATION, ORTHOGONAL MATRICES

MATH 114 Calculus Notes on Chapter 2 (Limits) (pages 60-? in Stewart)

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2

March 27 Math 3260 sec. 56 Spring 2018

7. Dimension and Structure.

Linear Model Selection and Regularization

PRINCIPAL COMPONENTS ANALYSIS

Collaborative Filtering. Radek Pelánek

Faster Johnson-Lindenstrauss style reductions

Data Mining Recitation Notes Week 3

9 Searching the Internet with the SVD

12. Perturbed Matrices

13 Searching the Web with the SVD

MATH 221: SOLUTIONS TO SELECTED HOMEWORK PROBLEMS

CSC321 Lecture 4 The Perceptron Algorithm

Chapter 1: Linear Equations

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Inverse Functions. Say Thanks to the Authors Click (No sign in required)

Leverage Sparse Information in Predictive Modeling

High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Metric-based classifiers. Nuno Vasconcelos UCSD

Chapter 1: Linear Equations

Answers in blue. If you have questions or spot an error, let me know. 1. Find all matrices that commute with A =. 4 3

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

Click Prediction and Preference Ranking of RSS Feeds

Remark 3.2. The cross product only makes sense in R 3.

Lecture 5: Web Searching using the SVD

Collaborative Filtering Matrix Completion Alternating Least Squares

CS246 Final Exam, Winter 2011

CSE 494/598 Lecture-4: Correlation Analysis. **Content adapted from last year s slides

Vector Spaces, Orthogonality, and Linear Least Squares

LINEAR ALGEBRA: THEORY. Version: August 12,

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

6.1. Inner Product, Length and Orthogonality

Cambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information

Physics 202 Laboratory 5. Linear Algebra 1. Laboratory 5. Physics 202 Laboratory

All of my class notes can be found at

7 Principal Component Analysis

Sequential Decision Problems

Math 54 Homework 3 Solutions 9/

Dot product. The dot product is an inner product on a coordinate vector space (Definition 1, Theorem

Extra Problems for Math 2050 Linear Algebra I

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Introduction to Algebra: The First Week

Span and Linear Independence

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regularization, Sparsity & Lasso

Chapter 6: Orthogonality

This pre-publication material is for review purposes only. Any typographical or technical errors will be corrected prior to publication.

Recall: Dot product on R 2 : u v = (u 1, u 2 ) (v 1, v 2 ) = u 1 v 1 + u 2 v 2, u u = u u 2 2 = u 2. Geometric Meaning:

Properties of Sequences

Math 4A Notes. Written by Victoria Kala Last updated June 11, 2017

Multiclass Classification-1

Iterative solvers for linear equations

Math 416, Spring 2010 Matrix multiplication; subspaces February 2, 2010 MATRIX MULTIPLICATION; SUBSPACES. 1. Announcements

DS-GA 1002 Lecture notes 12 Fall Linear regression

Efficient Data Reduction and Summarization

Dealing with Text Databases

CS123 INTRODUCTION TO COMPUTER GRAPHICS. Linear Algebra /34

MATH 12 CLASS 2 NOTES, SEP Contents. 2. Dot product: determining the angle between two vectors 2

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

The Gram-Schmidt Process

Chapter 4. Solving Systems of Equations. Chapter 4

MATH 2331 Linear Algebra. Section 1.1 Systems of Linear Equations. Finding the solution to a set of two equations in two variables: Example 1: Solve:

CS 6820 Fall 2014 Lectures, October 3-20, 2014

Iterative solvers for linear equations

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

MLISP: Machine Learning in Signal Processing Spring Lecture 8-9 May 4-7

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

Fixed Point Theorems

Quick Introduction to Nonnegative Matrix Factorization

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 3

Algorithms for Nearest Neighbors

CS123 INTRODUCTION TO COMPUTER GRAPHICS. Linear Algebra 1/33

Section Notes 8. Integer Programming II. Applied Math 121. Week of April 5, expand your knowledge of big M s and logical constraints.

Recommendation Systems

Mathematical induction

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

Linear Independence Reading: Lay 1.7

Inner product spaces. Layers of structure:

Transcription:

Estimation of the Click Volume by Large Scale Regression Analysis Yuri Lifshits, Dirk Nowotka [International Computer Science Symposium in Russia 07, Ekaterinburg] Presented by: Aditya Menon UCSD May 15, 2008 Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 1 / 50

Outline 1 Background: Sponsored search 2 Formal description: the history table 3 Estimating click volume with linear regression 4 Efficient sparse linear regression Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 2 / 50

The setting: online advertising Most web services rely on advertising as a source of income Choosing the right ad to display is important Try to tailor it to the customer An advertising engine is used to choose the ads Some software that dynamically extracts ads based on user requests Goal: Maximize the number of user clicks Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 3 / 50

How an advertising engine operates The advertising engine keeps a database of ads, and given a request, returns a set of relevant ads Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 4 / 50

Measuring usefulness: Clickthrough rate A natural quantity of interest is the clickthrough rate The clickthrough rate represents the fraction of times that an ad was clicked when it is displayed: Clickthrough rate(a) = # times ad a is clicked # times a is displayed A high click through rate signifies a well-chosen ad Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 5 / 50

Measuring usefulness: Clickthrough rate Theoretically, we can think of the clickthrough rate as being the probability of a click given some information about an ad: Theoretical clickthrough rate(a) = Pr(Click f(a)) where f(a) captures properties of an ad, surrounding pages, the viewer, etc. Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 6 / 50

Measuring usefulness: Click volume A related theoretical quantity is the click volume This is the number of times that an ad would be clicked, assuming it is shown in all requests We can think of a request as being a potential opportunity for display; the click volume is just a sum of clickthrough rates over all requests r: Click volume(a) = r Clickthrough rate(a, r) Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 7 / 50

Measuring usefulness: Click volume More theoretically, the click volume can be expressed Click volume(a) = r Pr(Click a, r) This can be thought of as an unweighted average Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 8 / 50

How would knowing the click volume help us? Intuitively, the click volume helps us answer many relevant questions The maintainer of the engine can estimate how many clicks an ad can expect to receive, and hence set the optimal price The volume of an ad can help determine the target audience for an ad We can use the volume to predict the purchase market for an ad etc. In short: it is a useful thing to estimate! So how do we do this? Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 9 / 50

Estimating click volume A natural question to ask is how we can estimate the click volume for an ad Specifically, given an ad a, assuming it is displayed at all requests, how many of those displays will result in a click? Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 10 / 50

Aside: is the assumption justified? We said that we assumed an ad is displayed on all requests; is such an assumption justified? If all means a random sample, then Click volume(a) = r Clickthrough rate(a, r) Pr(Click a) Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 11 / 50

Where this paper fits in This paper looks at how we can estimate click volume for a new ad, given the past history of clicks on different ads The key idea is to use linear regression on this history to predict the new click volume The computational constraints are that the data here is large-scale and sparse An algorithm is proposed that solves the regression problem efficiently, with guaranteed convergence Critique #1: There is no experimental testing of the algorithm Critique #2: There is no mention of other work on linear regression Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 12 / 50

Outline 1 Background: Sponsored search 2 Formal description: the history table 3 Estimating click volume with linear regression 4 Efficient sparse linear regression Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 13 / 50

Representing ads and requests The two important components of the formal model are ads and requests We represent an ad as a vector a A, each component indicating a particular feature (e.g. text, link, phone number,...): a = (a 1,..., a m ) We also represent a request as a vector r R, similarly representing a collection of features (e.g. IP address, referrer, query,...) r = (r 1,..., r n ) Note that these vectors may be sparse e.g. bag of words model Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 14 / 50

Introducing events We define an event to be the result of the interaction between an ad and a request We can think of an event e being a function of a, r, which we also represent by a vector: e(a, r) = (e 1,..., e p ) Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 15 / 50

Creating a history table We need some structure to help study the behaviour of ads and their requests A history table can be thought of as a collection of events, and whether or not they resulted in a click If we let b i {0, 1} denote a click, then HT = {(e(a 1, r 1 ), b 1 ),..., (e(a n, r n ), b n )} Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 16 / 50

Intuitive representation As the name suggests, we can represent a history table in a tabular form Let rows denote ads and columns ad requests Ad requests Ads - + + - + - + + - + Note that each cell denotes the b i s for all events formed by the pair (a, r) Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 17 / 50

Sparsity of table An important property of the history table, when viewed in matrix form, is that it is sparse along the columns The reason is that each column usually covers only a few cells: the ads for a given request are generated by the advertising engine It is assumed that if the size of the table is m n, there are O(n) nonzero entries This is an important assumption used later to ease computation Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 18 / 50

Problem: estimating the click volume We can more formally specify our problem now Problem Given an ad a, we wish to estimate Click volume(a), assuming: the distribution of requests follows that of the history table the ad a would be shown on all the requests Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 19 / 50

Outline 1 Background: Sponsored search 2 Formal description: the history table 3 Estimating click volume with linear regression 4 Efficient sparse linear regression Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 20 / 50

Approaches that don t work: nearest neighbour Nearest neighbour solutions We could try to find nearby ads and use their click volumes to estimate the new volume Problem # 1: Similarity of ads does not tell us anything about that of volumes Problem # 2: They are too slow! Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 21 / 50

Approaches that don t work: collaborative filtering Collaborative filtering If we view the requests as users, and ads as movies, this is similar to a prediction problem! Problem # 1: We have absolutely no data about the new movie, or ad in this case (the cold-start problem) Problem # 2: This ignores extra information, such as term similarity between ads and requests Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 22 / 50

Suggested approach The approach followed in the paper is three-fold: 1 Perform dimensionality reduction on the history table 2 Aggregate the values in the reduced table 3 Perform a least-squares regression We look at the steps in turn Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 23 / 50

Reduction of table We can convert the table into a matrix by store clickthrough rates for events, rather than storing separate click values for each event We can also create generalized events e = D(e) that reduce the dimensionality of events by some function D e.g. D might merge events with similar words Transform {(e 1, b 1 ), (e 2, b 2 ),..., (e k, b k )} ( e i, b ) i k where e = D(e 1 ) = D(e 2 ) =... = D(e k ) represents the equivalence class of e 1,..., e k Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 24 / 50

Reduced history table Dimensionality reduction and aggregation gives us a reduced history table We can think of it as being represented by a vector T of generalized events, and a vector β of click through rates for each event e 1 e 2 T =., β = β 2. e n β 1 β n Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 25 / 50

Estimating clickthrough rate Problem: Given a new generalized event e m, we want to estimate its clickthrough rate β m But this is equivalent to asking: is there a function f so that f(t ) β? T serves as our training set In our case, we assume that there is a linear relation between T, β This suggests a specific choice for f... Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 26 / 50

Estimating click volume Aim: We d like to find a vector α so that T α β This is a problem of linear regression There are more equations than variables, hence this is an overconstrained problem No exact solution in general, so we want one with minimal discrepancy Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 27 / 50

Formulating problem as logistic regression We need to address a technical point: regression works over [, ], whereas we want our β to be over [0, 1] Solution: We perform a logit transform to get a new vector γ = logit(β), which now takes values in [, ] Recall: The logit function is logit : x log x 1 x The equivalent problem is min T α γ Note: This is really logistic regression! Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 28 / 50

Solution summary 1 Find the equivalence classes of events, get a matrix T 2 Aggregate click values for equivalent events, get a vector β of clickthrough proportions 3 Find the vector α which minimizes T α γ 4 The predicted click volume is then simply CV (a) = i logit 1 (α.e (a, r i )) Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 29 / 50

Are we done? Intuitively, this method will work Question: But is it efficient? Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 30 / 50

Are we done? Intuitively, this method will work Question: But is it efficient? Answer: No! Standard regression techniques involves inverting a large, non-sparse matrix We need a new technique to efficiently solve this problem Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 30 / 50

Note on sampling requests Notice that we need to sum over all requests to estimate the click volume This is potentially an intractable operation Solution: Consider only a sample of requests, e.g. over a fixed time window Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 31 / 50

Outline 1 Background: Sponsored search 2 Formal description: the history table 3 Estimating click volume with linear regression 4 Efficient sparse linear regression Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 32 / 50

A geometric interpretation of T α If we write out the matrix product T α, it is clear that T 11 α 1 +... + T 1m α m T α =. T n1 α 1 +... + T nm α m = α 1 t 1 +... + α m t m So, T α is nothing but a projection of γ onto the span of the columns of T : T α span(t 1,..., t m ) Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 33 / 50

Geometric framework for our problem When we want T α γ to be minimal, we really want to find the optimal projection of γ onto the span of T s columns γ span(t 1,..., t m) γ = T α Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 34 / 50

Sequential solution We will look at a sequence of vectors {α (k) } k N, starting from the guess α (0) = 0 Idea: Solve this problem by focussing on one column at a time Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 35 / 50

Geometrically inspired update Update rule: Choose any j, and fix all columns but j; now seek the best projection of γ onto the jth column of T To do this, we project the current error in each step of our solution onto the column vector t j This is the same as projecting γ, by linearity Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 36 / 50

Visualizing the update Find the current error in our solution Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 37 / 50

Visualizing the update Project the error onto the vector t j Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 38 / 50

Visualizing the update Now repeat this operation Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 39 / 50

Recap: vector projection Recall that the optimal projection of vector b onto a is b proj a b = a.b a 2 a b a θ a.b a Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 40 / 50

Measuring how close we are to optimal Suppose that the discrepancy at step k is denoted by ϕ (k) := T α (k) γ Suppose also that the columns we choose are J = {j 1, j 2,...} We can derive the discrepancy at step (k + 1): ϕ (k) ϕ (k+1) = ϕ(k).t jk t jk 2 t j k This means we project the previous discrepancy onto the currently chosen column Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 41 / 50

Update rule There is a duality to the changes to our α vectors Our update for α works only on its jth component: α (k+1) j = α (k) j + ϕ(k).t j t j 2 Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 42 / 50

The algorithm while change in ϕ > ɛ Choose j [1, m] arbitrarily α j α j + ϕ.t j t j 2 ϕ ϕ ϕ.t j t j 2 t j Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 43 / 50

Convergence of the updates Our updates are intuitive, but how well do they work? We have the following guarantee [Convergence theorem] Suppose we have a set of columns J = {j 1, j 2,..., }, where every column appears infinitely many times. Then, the sequence ϕ (k), as defined above, will converge to ϕ, the minimal orthogonal projection error of γ onto the span of T Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 44 / 50

Convergence of the updates Intuitively, our updates should converge: e.g. in the case where we just have two columns Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 45 / 50

Proof outline The proof is in three steps 1 Show that ϕ (k) is bounded and monotone decreasing for every k 2 Assume that ϕ (k) does not converge to ϕ, but instead converges to ψ 3 Show that ϕ k does not converge to ψ Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 46 / 50

Bounded nature of ϕ (k) First, note that the vectors ϕ (k+1) and t jk are always orthogonal: ϕ (k+1).t jk = ϕ (k).t jk ϕ(k).t jk t jk 2 t j k.t jk = 0 Then, Pythagoras theorem tells us that 0 ϕ (k+1) ϕ (k) ϕ (0) ϕ (k) λtjk ϕ (k+1) Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 47 / 50

Towards a contradiction Since we know that ϕ (k) is bounded and monotone, it must converge [Monotone Convergence theorem] Say that ϕ (k converges to ψ ϕ Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 48 / 50

Towards a contradiction We now show that ψ cannot be a good enough bound on ϕ (k) At some stage, we know that ϕ (k) must come within distance c 2 of ϕ, for any constant c Now look at the potential updates 1 t jk ψ: this will mean we get closer to ψ 2 t jk ψ: this will mean the size of ϕ (k) dramatically reduces Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 49 / 50

Case 1: t jk ψ If t jk ψ, then t jk.ψ = 0 We have ϕ (k) ψ = (ϕ (k+1) ψ) + ϕ(k).t jk t jk 2 t j k But the terms on the RHS are orthogonal, which means that ϕ (k) ψ ϕ (k+1) ψ This means we cannot escape ψ, as we only get closer to it Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 50 / 50

Case 2: t jk ψ If t jk ψ, we can lower bound the shift by But ϕ (k) ϕ (k+1), so ϕ (k).t jk t jk t 2 j k = ψ.t j k + (ϕ (k) ψ).t jk t jk 2 c t j k c t 2 j k t jk 2 c 2 t j t j k t jk ϕ (k+1) 2 ϕ (k) 2 c2 4 t jk Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 51 / 50

The contradiction But these results lead to a contradiction We have said that 1 ϕ (k) visits the c/2 neighbourhood of ψ infintely often 2 Once it s inside the neighbourhood, j k ψ updates cannot let it escape 3 Once it s inside the neighbourhood, j k ψ updates cause a substantial decrease in the size of ϕ (k) But ϕ (k) is nonnegative and monotone, so this cannot be the case Contradiction! Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 52 / 50

Where are we now? We have shown that the given approach does converge to the optimal solution This is good, but there is another natural question to ask... Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 53 / 50

Runtime analysis How efficient is this approach? We show that we can exploit sparsity of the history table: [Runtime bound] We can perform each update j in time that is linear in the number of nonzero elements in the history table Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 54 / 50

How to do the update Suppose there are q j non-zero elements in t j We do the update as follows 1 Precompute all norms t j 2 Update α k j in O(q j) time: we just need to compute the dot-product of t j and ϕ (k) 3 Update ϕ k in O(q j ) time: again we just need to compute a dot-product Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 55 / 50

Note on convergence Important caveat: We don t have any guarantee on the convergence rate All we know is that it does converge Authors state this as an open problem More concerned with introducing a new idea into the literature Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 56 / 50

Outline 1 Background: Sponsored search 2 Formal description: the history table 3 Estimating click volume with linear regression 4 Efficient sparse linear regression Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 57 / 50

Conclusion The click volume is a quantity of interest in web advertising Given past history of clicks, we can estimate the click volume of a new ad using linear regression When the data is sparse, we need a new technique to efficiently compute the regression Paper provides an algorithm for efficiently solving this problem, with proof of correctness (but lacking a convergence rate bound) Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 58 / 50

Estimation of the Click Volume by Large Scale Regression Analysis May 15, 2008 59 / 50