Learning Topical Transition Probabilities in Click Through Data with Regression Models

Size: px
Start display at page:

Download "Learning Topical Transition Probabilities in Click Through Data with Regression Models"

Transcription

1 Learning Topical Transition Probabilities in Click Through Data with Regression Xiao Zhang, Prasenjit Mitra Department of Computer Science and Engineering College of Information Sciences and Technology the Pennsylvania State University ABSTRACT The transition of search engine users intents has been studied for a long time The knowledge of intent transition, once discovered, can yield a better understanding of how different topics are related and be used in many applications, such as building recommender systems, ranking and etc In this paper, we study the problem of finding the transition probabilities of digital library users intents among different topics We use the click-through data from CiteSeerX and extract the click chains Each document in the click chain is represented by a topical vector generated by LDA models We then model the task of finding the topical transition probabilities as a multiple output linear regression problem, in which the input and output are two consecutive topical vectors in the click chain and the elements in the weight matrix correspond to the transition probabilities Given the constraints of our task, we propose a new algorithm based on the exponentiated gradient Our algorithm provides a good interpretability as well as a small sum-of-squares error comparable to existing regression methods We are particular interested in the off-diagonal elements of the learned weight matrix since they represent the transition probabilities of different topics The authors interpretation of these transitions are given at the end of the paper 1 INTRODUCTION The search intent of a search engine user may switch to a different topic even within the same search session According to previous studies, 1-3% of search engine users perform multitasking [15] [12] Multiple topics exist in over 8% of search sessions which have two or more queries [16] Thus, detecting the transition of users intents has become a crucial task for search engines The knowledge of users intent transition, once detected, can help researchers understand how different topics are related and be further used to improve query suggestion, ranking, recommendation and the overall search quality Previous studies on user intent transition have focused Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee WebDB 1 Indianapolis, IN USA Copyright 21 ACM /1/6 $1 mainly on Web search However, nowadays more and more vertical search engines have emerged and gained attention from both industry and academia, such as digital libraries, chemical search engines and travel search engines Discover the knowledge of intent transition in such search engines is equally important as in Web search engines Vertical search engines have different characteristics from the Web search engines The digital library search engine CiteSeerX 1,for example has a closed corpus Most of its documents are research papers on computer science domain Therefore the number of topics in its dataset is limited; while the documents on the Web have a much larger range of topics Such difference makes the study of intent transition between each pair of topics possible Users information needs are expressed implicitly through sequences of querying, clicking and reformulating behaviors Existing work focused on the analysis of such behaviors [9] [6] Various features, such as time interval between queries, time thresholding, and shared common terms between adjacent queries and etc are used to determine whether a transition happens However, these features are not suitable in vertical search engines Take CiteSeerX for example, documents are connected through citation links, users often start with one search query to visit one document, then follow the citation links to visit the other documents Intent transition may happen when users traverse the citation graph The cosine similarity between the documents retrieved in response to the queries were also used as features [9] [13] to determine the boundary of a search task However, they can only measure the similarity between two documents but not the topical distributions of the documents Besides, they are used to determine whether an intent transition happens, but not to discover the knowledge of intent transition between a pair of topics If we can instead find how the intents shift among all pairs of topics (in other words, the transition probabilities), we can get a clearer picture of the connections between topics In this paper, we study the problem of user intent transition among topics, discover the transition probabilities and find the relatedness among the topics We focus on the Cite- SeerX vertical search engine, the documents of which are mainly in the computer science domain We train Latent Dirichlet Allocation (LDA) [3] models on all the documents in CiteSeerX The trained LDA model is then applied on the documents in the click chains extracted from the query log and generates the topical distribution (in forms of topical vectors) for these documents We find the transition 1

2 probabilities by solving a multiple-output linear regression problem, in which the inputs and outputs are the topical vectors of each pair of documents in the click pairs, and the weight matrix to be learned is the transition probability matrix Since each weight bears the meaning of a transition probability, we have two constraints on the regression problem: (1) all the weights must be non-negative and (2) the transition probabilities from one topic to all other topics must sum up to 1 Based on the two constraints, we propose our new regression algorithm which is based on the exponentiated gradient algorithm It minimizes the sum-of-squares error under the constraints The result leads to the discovery of connections between some topics, for example people who are interested in graph problems may switch their intent to the techniques and fundamental issues to solve the graph problems, such as optimization techniques, computational cost and efficiency Such knowledge can be used to recommend related topics to people who show interests in some topics 2 RELATED WORK 21 User Intent Transition Existing work on user intent transition focused on finding a timeout to cut off between queries Different threshold values for timeouts were proposed, ranging from 5 to 12 minutes [4] [11] [14] [6] [1] However, Jones and Klinkner [9] examined all the timeout thresholds and applied them to real search engine data, and claimed that no time threshold is effective at identifying task boundaries Another set of work used the query contents Jansen et al [8] used common words in adjacent queries as a feature to segment sessions He et al [6] studied the adding and deleting of terms in queries to detect topic shifts in user query streams Ozmutlu et al [15] [12] used various features to study the user intent shifts The features include (1) the change of terms of the consecutive queries within a session, (2) time interval, which is the difference of the arrival times of two consecutive queries, and (3) order of each query in the user session They evaluated different statistical model and used them to predict the user intent transition in a Web search session However, these works rely on the search queries In digital libraries, such as CiteSeerX, users frequently follow the citation links, not the search results to visit other documents Therefore, these work do not fit well in our application 22 Linear Regression Regression models predict the value of one or more continuous target variables y given the value of a d-dimensional vector x Linear regression models learn the weights from a set of observations by minimizing the loss function If the sum-ofsquared error: L = N i=1 (y i ŷ i ) 2 L = N (y i ŷ i ) 2 i=1 is chosen as the loss function, the linear regression problem can be solved by the ordinary least squares (OLS) technique Although OLS technique minimized the sum-of-squared errors, it does not fit well in our scenario because it does not satisfy the constraints set forth in our application As will be discussed in details in Problem Definition Section, our problem is a multiple output linear regression problem with two constraints: (1) all the weights represents probabilities and thus should be non-negative, and (2) each row of the learned weight matrix should sum up to 1 OLS can satisfy neither of them Regularized linear regression places constraints on the weights by adding an additional penalty term in the loss function Two very important regularized linear regression models are Ridge [7] and Lasso [17] regression Lasso regression has a constraint on the sum of the absolute value of each weight Recently Efron et al gave an efficient solution [5] However, neither of them satisfy the constraints in our application simultaneously Exponentiated Gradient algorithm was proposed by Kivinen and Warmuth [1] This online algorithm takes a single observation each time and updates the weight vector EG algorithm uses the components of the gradient in the exponents of factors that are used in updating the weight vector multiplicatively It satisfies the constraints that all the weights are positive and they sum up to 1, which is very close to our constraints However, as will be discussed in Section 4, our application requires normalization in a different direction in the weight matrix (normalize along rows, not columns) Our proposed new algorithm is based on the Exponentiated Gradient method 3 PROBLEM FORMULATION Topical Intent Representation: Researchers have created topic hierarchies to define topics for papers However, these topics assigned to papers are not suitable in our scenario for the following reasons: (1) one research paper can cover multiple topics; while human-generated topics only covers the primary topic of the paper (2) New topics emerge over time, which may not be captured by human-generated topics (3) People use different terminologies for papers on similar topics (4) Not all papers have an assigned topic Taking these reasons into consideration, we use LDA models to find topical distributions for each document on k topics automatically A topical vector is generated for each document to represent the topics covered by this paper with probabilities In CiteSeerX, a user is given various information, such as title, snippets, authors and publishing year, about a document before actually visiting it We assume the user can make the judgments of the relevance of the document to his interests well Therefore we use the topical vector of the document visited by the user to represent the user s intent at that moment Topical Intent Transition: Since the transition of intents happens between successive visits to two documents having citation connection, we pulled the query log from CiteSeerX and extracted all pairs of successively visited documents by each user in each search session, then use the pairs as our observations We call such pair of documents click pair We assume a user s interest on each topic switches to any one of all topics in the next click with a probability The transition probability is defined below: Definition 1 (Topical Intent Transition Probability) The topical intent transition probability p ij = prob(z t+1 = j z t = i) (1 i, j k) is the transition probability of user s

3 Notation Description k dimension of input/output vectors x, y N number of observations x, y input, output vector (k 1 column vector) w weight vector (k 1 column vector) W weight matrix (k k matrix) η learning rate Table 1: Notations intent from Topic i to Topic j in successive visits to two documents The change in the topical vectors of two documents in the click pair is the result of such transition Putting all the transition probabilities together, we have a Transition Probability Matrix P: p 11 p 12 p 1k p 21 p 22 p 2k P = p k1 p k2 p kk Note that in each row, the probabilities are from one topic to all topics Therefore, we have the constraint that each row should sum up to 1, ie k j=1 pij = 1 for all i The goal of our work: given the query log and documents visited by search engine users, find topical vectors (x, y) for each click pair Furthermore, find the transition probability matrix P, such that the predicted topical vector ŷ = P T x is close to y The closeness is measured by sum-of-squares error We model the problem as a multipleoutput linear regression problem, in which the first vector x is the input, y is a multiple dimensional output vector, P is the weight matrix to learn Given N observed pairs of x and y, the learned weight matrix should minimize the total sum-of-squares error 4 MODEL DESCRIPTION 41 Notations Given an input vector x = (x 1, x 2,, x k ) T, linear regression models generate a single value output y by linearly combining the input values ŷ = k i=1 w ix i = w T x The weights w = (w 1, w 2,, w k ) T for the linear combination are learned from the training data Note that in our application, we do not introduce the constant term x = 1 into the model since they bear the meaning of probabilities We use X and Y to denote the set of observed input and output vectors, respectively X = x T 1 x T 2 x T N, Y = y T 1 y T 2 y T N The weight matrix W and the transition probability matrix P are equivalent and will be used interchangeably Note that w ij = p ji for all i, j [1, k] w 11 w 21 w k1 w 12 w 22 w k2 W = (w 1, w 2,, w k ) = = P w 1k w 2k w kk Table 1 gives the notations 42 Linear Regression Model We first consider the linear regression model which generates a single value output Given X and Y (in Y, each y i, i [1, N] is a single value output), the linear regression model finds the weight vector that minimizes the sumof-squares error over the training set D: L D = SSE = N i=1 (y i ŷ i ) 2 = N i=1 (y i w T x i ) 2 Consider minimizing the sum-of-squares error with respect to w Setting the gradient of the error function to zero and we obtain the OLS solution: w = (X T X) 1 X T Y In our application, the output is a k-dimensional vector, instead of a single value output The weights form a matrix W, instead of a vector In this case y = W T x Given the training data X and Y (each y i in Y, i [1, N], is a k-dimensional vector, thus Y is a N k matrix), the multiple output linear regression model uses the following equation to compute the weight matrix W: W = (X T X) 1 X T Y It is equavalent to taking each column of Y as output and solving each individual single output linear regression problem, and then putting the solutions together [2] Although the OLS solution minimized the sum-of-squares error for both the single and multiple output linear regression problems, it does not satisfy the constraints on the weights in our application, thus cannot be interpreted as probabilities We modify the solutions of OLS by setting the negative values to zero and normalize the rows in the matrix so that each row sums up to 1 We call this modified method normalized linear regression (nlr) 43 Exponentiated Gradient Algorithm 1 updating rule of Exponentiated Gradient algorithm Input: weight vector w t = (w t,1, w t,2,, w t,p) learning rate η > an observation (x t, y t ), where x t = (x t,1, x t,2,, x t,p ) Output: weight vector w t+1 Procedure: 1: ŷ t w T t x t; 2: for i 1 to p do 3: w t+1,i = where r i = e 2η(ŷ t y t )x t,i 4: end for 5: return w t+1 ; wt,i ri p j=1 r j The exponentiated gradient (EG) algorithm for solving

4 linear regression problem was first proposed by Kivinen and Warmuth [1] It starts from an initial guess of the weight vector After receiving each observation, it updates the 51 Data Set weight vector w EG guarantees each weight in the weight We used the documents and query logs in the CiteSeerX vector is positive and they sum up to 1 This is very close to search engine We pulled out 1,143,971 documents with their our constraints but slightly different, as will be shown later titles, abstracts and keywords We trained LDA models on The updating rule for EG algorithm is given by Kivinen and this data set We also extracted the query log of 1 months Warmuth [1] We present this updating rule here for clarity Algorithm 1 gives the updating rule when receiving an (Feb 28 to Nov 28) from the CiteSeerX search engine to extract the click pairs The entries in the query log new observation at the t-th iteration records the users behaviors The entries are grouped by EG algorithm starts with an initial guess w 1 = (w 1,1, w 1,2, w 1,p ), which satisfies i w 1,i = 1 and w 1,i > for all i The normalization in the updating rule guarantees the weights in the new weight vector sum up to 1 (ie w t+1,1 + w t+1,2 + + w t+1,p = 1) The usual choice for w 1 is the uniform probability vector (1/p, 1/p, 1/p) (ie w 1,i = 1/p for all i) A typical learning rate could be η = 2/(3R 2 ), where R = max t(max ix t,i min ix t,i) is the upper bound for the maximum difference between the components x t,i of an instance x t Algorithm 2 updating rule of multiple output, normalized Exponentiated Gradient algorithm Input: weight matrix W t = (w t,1, w t,2,, w t,k ) learning rate η > an observation (x t, y t), where x t = (x t,1, x t,2,, x t,p) and y t = (y t,1, y t,2,, y t,k ) Output: weight matrix W t+1 Procedure: 1: for j 1 to k do 2: y = y t,j; {the j-th component in vector y t} 3: ŷ = w T t,jx t; 4: for i 1 to p do 5: r i = e 2η(ŷ y)x t,i ; 6: w t+1,i,j = w t,i,jr i; 7: end for 8: w t+1,j = (w t+1,1,j, w t+1,2,j,, w t+1,p,j ) T ; 9: end for 1: W t+1 = (w t+1,1, w t+1,2, w t+1,k ); 11: normalize each row of W t+1; 12: return W t+1; We follow the same framework as the previous section to solve the multiple output linear regression problems using exponentiated gradient Given the observed output matrix Y, we extract each of the k columns of Y and solve k single output linear regression problems using basic EG algorithm, and put together the results to form the weight matrix The basic EG guarantees that each weight column vector sums up to 1 However, each row of the weight matrix consists of the transition probabilities from one topic to all topics, therefore, each row of the weight matrix should sum up to 1, instead of each column We modify the basic EG according to our constraints We removed the normalization step when updating each column weight vector and added an normalization step along each row after we updated the entire weight matrix Algorithm 2 gives the details Note that in the algorithm description, W t,i,j means the element at the i-th row and j-th column of the matrix W t 5 EXPERIMENT RESULTS users and sessions (sessions are automatically determined by the search engine) The users often start with searching key words, then visit documents in the result list and follow the citation links to visit other documents We only focus on 8 interesting user behaviors because they indicate that the user is interested in a document These behaviors are: downloading a paper, viewing summary of a paper, adding a paper to collection, viewing related papers, correcting mistakes in a paper, monitoring changes to a paper and viewing different versions of a paper Each one of them corresponds to a document, indicating that the user is interested in this document 52 Data Processing We used Phan and Nguyen s implementation of LDA 2 on the documents from CiteSeerX We removed the traditional stop words such as a and the, as well as additional stop words which are very common in research papers but do not bear topic meaning, such as author, abstract, copyright etc We identified a total of 222 such stop words In the query log, there are 25,41 sessions which contain visits to multiple documents In these sessions, 63,89 unique documents were visited A total of 97,28 click pairs were extracted 53 Model Comparison We implemented and compared 5 models: (1) basic linear regression with multiple output (blr), (2) normalized linear regression with multiple output (nlr) Normalization means resetting negative weights to zero and normalize each row, (3) basic Exponentiated Gradient Linear Regression with multiple output (beg), (4) basic Exponentiated Gradient normalized at the final step (feg), which takes the weight matrix generated by beg as input and normalize each row, (5) step-wise normalized Exponentiated Gradient, which is our proposed algorithm, described in Algorithm 2 We set 6 different values for the number of topics when training the LDA model: 2, 3, 4, 5, 1, 15 After the training, we generate the topical vector for each document in the click pairs Since we have 97,28 click pairs, we have 97,28 observations We used 1-fold cross validation to evaluate and compare different regression models Figure 1 shows the comparison of the mean squared error of each model Each figure is the results from a different setting of the number of topics We show, for each model, the mean, max and min MSE obtained from the 1-fold cross validation in each figure The cross in the middle of each line represents the mean of the errors; while the points at the top and bottom of the lines correspond to the max and min errors, respectively Table 2 gives the averaged testing MSE from 1-fold cross validation for each model under each 2

5 5 Model Comparison (# of topics =2) 5 Model Comparison (# of topics =3) 5 Model Comparison (# of topics =4) (a) Number of topics = 2 (b) Number of topics = 3 (c) Number of topics = 4 5 Model Comparison (# of topics =5) 5 Model Comparison (# of topics =1) 5 Model Comparison (# of topics =15) (d) Number of topics = 5 (e) Number of topics = 1 (f) Number of topics = 15 Figure 1: Comparisons of Mean Squared Errors of Different Regression n= n= n= n= n= n= Table 2: Averaged MSE from 1-fold cross validation settings of number of topics As can be seen from the figures and the table, the performances of different models are very close The blr, which is based on OLS, always gives the minimum MSE, since it doesn t have any constraints nlr comes the second Among the three models based on EG, when number of topic n is 1 and 15, seg gives the best performance; when n = 5 and n = 3, feg gives the best performance; when n = 2, beg beats the other two The performance of seg model, compared with the best model blr, drops 195%, 99%, 94%, 78%, 58% and 62% for n = 2, 3, 4, 5, 1, 15 respectively Although seg model cannot beat blr model in terms of mean squared error, the decrease in performance is small, and it provides good interpretability, as discussed before 54 Discussion We give some of our observations in this subsection We counted the number of click pairs in which the largest elements of the topical vectors of the two documents belong to different topics In the 97,28 click pairs, 62,646 click pairs have their largest elements on different topics Transition Probabilities Transition Probabilities % % % 5 2 1% % % % % % % Table 3: Top 1 Off-Diagonal Largest Transition Probabilities Based on our observation, in the learned weight matrices, the largest element in each row and column is on the diagonal, meaning that the transition probability of user intent from one particular topic to itself is always the largest compared to the transition probabilities to other topics The values of diagonal elements range from 1424% to 6911%, while initially all the weights were set to be 5% Other than the diagonal elements, we found the top 1 largest probabilities Table 3 gives 1 largest off-diagonal transition probabilities and their corresponding topic transitions 3 The number of topics is set to be 2 55 Interpretation of Intent Transitions We provide some interpretations for our discovered knowledge from the experiments in this section The learned weights are obtained by seg model with the number of topics set to be 2 The first transition in the top 1 list is from Topic 19 to Topic 13 Topic 19 is on problems and applications of 3 Due to space limitation, the list of the top representative terms of each topic is not presented here but is available in the full version of this paper

6 graphs and trees Its representative terms include: graph(s), tree(s), degree, vertex (ices), point(s), distance, etc Topic 13 is on optimization techniques, algorithms, computational costs and efficiencies Its representative terms include: algorithm(s), optimization/optimal, constraints, computational, cost, efficient, etc Such transition shows that people who are interested in problems and applications of graphs and trees may switch to the techniques and methods to solve these problems It is a transition from problems to solutions The second transition is from Topic 12 to Topic 3 Topic 12 is on programming and design of languages Its representative terms include: software programming/program, language, design, object, oriented, java, etc Topic 3 is on issues involved in designing a language Its representative terms include: type(s), logic, semantics, rules, terms, calculus, reasoning, etc It shows that people s interest on programming languages may shift to more fundamental issues such as typed or untyped, rules for induction, checking the semantics of the language and etc The third transition is from Topic 16 to Topic 15 Again, it shows that people who are interested in the problems will also be interested in the techniques to solve these problems The representative terms for Topic 16 include: energy, field, flow, high, phase, mass, temperature, density, surface, etc The representative terms for Topic 15 include: linear, space(s), finite, matrix, equation(s), numerical, differential, etc One interesting thing to notice is that in the 9-th and 1- th transition(13 19 and 15 16), the reversed transitions are the first and third in the list, respectively It shows the strong relatedness of the two pairs of topics The transition is not only in one direction, but also the other way around slr model produced 8 negative elements and thus nlr model produced 8 zero elements, which makes the results hard to interpret 6 CONCLUSIONS AND FUTURE WORK In this paper, we studied the problem of finding the user intent transition among different topics We used the topical vectors of documents generated by a trained LDA model to represent the users intents Given the pairs of documents visited successively by users and their topical vectors, we propose to use multiple output linear regression to model our problem, in which all the transition probabilities between any pair of topics form the weight matrix Then we propose a new algorithm based on exponentiated gradient to efficiently solve the linear regression problem Our proposed method satisfies the constraints we set forth in the regression problem It gives a good performance in terms of the sumof-squared errors, compared with the ordinary least squares technique Besides, it provides a good interpretability compared with other regression models The effectiveness of our method has been proved in the experiments we conducted In the future, we would like to extend our work by considering a longer click chain, instead of a click pair Besides, we can incorporate personal information so that our learned transition probability matrix is optimized for individuals or groups to satisfy their information needs Finally, we would like to consider the temporal information, since new topics may emerge over time and people s interests in one topic may change over time We will continue to report our progress in future work 7 REFERENCES [1] P Anick Using terminological feedback for web search refinement: a log-based study In SIGIR, 23 [2] C M Bishop Pattern Recognition and Machine Learning (Information Science and Statistics) Springer, 27 [3] D M Blei, A Y Ng, and M I Jordan Latent dirichlet allocation J Mach Learn Res, 23 [4] L D Catledge and J E Pitkow Characterizing browsing strategies in the world-wide web Comput Netw ISDN Syst, pages , 1995 [5] B Efron, T Hastie, I Johnstone, and R Tibshirani Least angle regression Annals of Statistics, 24 [6] D He, A Göker, and D J Harper Combining evidence for automatic web session identification Inf Process Manage, 22 [7] A E Hoerl and R W Kennard Ridge regression: Biased estimation for nonorthogonal problems Technometrics, 197 [8] B J Jansen, A Spink, C Blakely, and S Koshman Defining a session on web search engines: Research articles J Am Soc Inf Sci Technol, 27 [9] R Jones and K L Klinkner Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs In CIKM, 28 [1] J Kivinen and M K Warmuth Exponentiated gradient versus gradient descent for linear predictors Information and Computation, 1997 [11] A L Montgomery and C Faloutsos Identifying web browsing trends and patterns Computer, 21 [12] S Ozmutlu, H C Ozmutlu, and A Spink Multitasking web searching and implications for design Proceedings of Annual Meeting of the American Society for Information Science and Technology, 23 [13] F Radlinski and T Joachims Query chains: learning to rank from implicit feedback In KDD, 25 [14] C Silverstein, H Marais, M Henzinger, and M Moricz Analysis of a very large web search engine query log SIGIR Forum, pages 6 12, 1999 [15] A Spink, H C Ozmutlu, and S Ozmutlu Multitasking information seeking and searching processes J Am Soc Inf Sci Technol, 22 [16] A Spink, M Park, B J Jansen, and J Pedersen Multitasking during web search sessions Inf Process Manage, 26 [17] R Tibshirani Regression shrinkage and selection via the lasso Journal of the Royal Statistical Society, Series B, 1994

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Prediction of Citations for Academic Papers

Prediction of Citations for Academic Papers 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing 1 Zhong-Yuan Zhang, 2 Chris Ding, 3 Jie Tang *1, Corresponding Author School of Statistics,

More information

Latent Dirichlet Allocation Based Multi-Document Summarization

Latent Dirichlet Allocation Based Multi-Document Summarization Latent Dirichlet Allocation Based Multi-Document Summarization Rachit Arora Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India. rachitar@cse.iitm.ernet.in

More information

Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm

Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm Zhixiang Chen (chen@cs.panam.edu) Department of Computer Science, University of Texas-Pan American, 1201 West University

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

Variable Selection in Data Mining Project

Variable Selection in Data Mining Project Variable Selection Variable Selection in Data Mining Project Gilles Godbout IFT 6266 - Algorithmes d Apprentissage Session Project Dept. Informatique et Recherche Opérationnelle Université de Montréal

More information

Collaborative Filtering via Ensembles of Matrix Factorizations

Collaborative Filtering via Ensembles of Matrix Factorizations Collaborative Ftering via Ensembles of Matrix Factorizations Mingrui Wu Max Planck Institute for Biological Cybernetics Spemannstrasse 38, 72076 Tübingen, Germany mingrui.wu@tuebingen.mpg.de ABSTRACT We

More information

PENALIZING YOUR MODELS

PENALIZING YOUR MODELS PENALIZING YOUR MODELS AN OVERVIEW OF THE GENERALIZED REGRESSION PLATFORM Michael Crotty & Clay Barker Research Statisticians JMP Division, SAS Institute Copyr i g ht 2012, SAS Ins titut e Inc. All rights

More information

Comparative Summarization via Latent Dirichlet Allocation

Comparative Summarization via Latent Dirichlet Allocation Comparative Summarization via Latent Dirichlet Allocation Michal Campr and Karel Jezek Department of Computer Science and Engineering, FAV, University of West Bohemia, 11 February 2013, 301 00, Plzen,

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Manning & Schuetze, FSNLP, (c)

Manning & Schuetze, FSNLP, (c) page 554 554 15 Topics in Information Retrieval co-occurrence Latent Semantic Indexing Term 1 Term 2 Term 3 Term 4 Query user interface Document 1 user interface HCI interaction Document 2 HCI interaction

More information

Data Mining Recitation Notes Week 3

Data Mining Recitation Notes Week 3 Data Mining Recitation Notes Week 3 Jack Rae January 28, 2013 1 Information Retrieval Given a set of documents, pull the (k) most similar document(s) to a given query. 1.1 Setup Say we have D documents

More information

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015 MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates

More information

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,

More information

Manning & Schuetze, FSNLP (c) 1999,2000

Manning & Schuetze, FSNLP (c) 1999,2000 558 15 Topics in Information Retrieval (15.10) y 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Figure 15.7 An example of linear regression. The line y = 0.25x + 1 is the best least-squares fit for the four points (1,1),

More information

Matrix Factorization Techniques for Recommender Systems

Matrix Factorization Techniques for Recommender Systems Matrix Factorization Techniques for Recommender Systems Patrick Seemann, December 16 th, 2014 16.12.2014 Fachbereich Informatik Recommender Systems Seminar Patrick Seemann Topics Intro New-User / New-Item

More information

BAYESIAN MULTINOMIAL LOGISTIC REGRESSION FOR AUTHOR IDENTIFICATION

BAYESIAN MULTINOMIAL LOGISTIC REGRESSION FOR AUTHOR IDENTIFICATION BAYESIAN MULTINOMIAL LOGISTIC REGRESSION FOR AUTHOR IDENTIFICATION David Madigan,, Alexander Genkin, David D. Lewis and Dmitriy Fradkin, DIMACS, Rutgers University Department of Statistics, Rutgers University

More information

Statistical Ranking Problem

Statistical Ranking Problem Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking Problems Rank a set of items and display to users in corresponding order. Two issues: performance on top and dealing

More information

Linear Regression. Volker Tresp 2018

Linear Regression. Volker Tresp 2018 Linear Regression Volker Tresp 2018 1 Learning Machine: The Linear Model / ADALINE As with the Perceptron we start with an activation functions that is a linearly weighted sum of the inputs h = M j=0 w

More information

Can Vector Space Bases Model Context?

Can Vector Space Bases Model Context? Can Vector Space Bases Model Context? Massimo Melucci University of Padua Department of Information Engineering Via Gradenigo, 6/a 35031 Padova Italy melo@dei.unipd.it Abstract Current Information Retrieval

More information

Collaborative Topic Modeling for Recommending Scientific Articles

Collaborative Topic Modeling for Recommending Scientific Articles Collaborative Topic Modeling for Recommending Scientific Articles Chong Wang and David M. Blei Best student paper award at KDD 2011 Computer Science Department, Princeton University Presented by Tian Cao

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Pawan Goyal CSE, IITKGP October 21, 2014 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 21, 2014 1 / 52 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS

MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS Advances in Adaptive Data Analysis Vol. 2, No. 4 (2010) 451 462 c World Scientific Publishing Company DOI: 10.1142/S1793536910000574 MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS STAN LIPOVETSKY

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent KDD 2011 Rainer Gemulla, Peter J. Haas, Erik Nijkamp and Yannis Sismanis Presenter: Jiawen Yao Dept. CSE, UT Arlington 1 1

More information

Total Ordering on Subgroups and Cosets

Total Ordering on Subgroups and Cosets Total Ordering on Subgroups and Cosets Alexander Hulpke Department of Mathematics Colorado State University 1874 Campus Delivery Fort Collins, CO 80523-1874 hulpke@math.colostate.edu Steve Linton Centre

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Comparing Relevance Feedback Techniques on German News Articles

Comparing Relevance Feedback Techniques on German News Articles B. Mitschang et al. (Hrsg.): BTW 2017 Workshopband, Lecture Notes in Informatics (LNI), Gesellschaft für Informatik, Bonn 2017 301 Comparing Relevance Feedback Techniques on German News Articles Julia

More information

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University 2018 EE448, Big Data Mining, Lecture 10 Recommender Systems Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html Content of This Course Overview of

More information

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged. Web Data Mining PageRank, University of Szeged Why ranking web pages is useful? We are starving for knowledge It earns Google a bunch of money. How? How does the Web looks like? Big strongly connected

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Analysis of Fast Input Selection: Application in Time Series Prediction

Analysis of Fast Input Selection: Application in Time Series Prediction Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday! Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:

More information

IPSJ SIG Technical Report Vol.2014-MPS-100 No /9/25 1,a) 1 1 SNS / / / / / / Time Series Topic Model Considering Dependence to Multiple Topics S

IPSJ SIG Technical Report Vol.2014-MPS-100 No /9/25 1,a) 1 1 SNS / / / / / / Time Series Topic Model Considering Dependence to Multiple Topics S 1,a) 1 1 SNS /// / // Time Series Topic Model Considering Dependence to Multiple Topics Sasaki Kentaro 1,a) Yoshikawa Tomohiro 1 Furuhashi Takeshi 1 Abstract: This pater proposes a topic model that considers

More information

Modeling User Rating Profiles For Collaborative Filtering

Modeling User Rating Profiles For Collaborative Filtering Modeling User Rating Profiles For Collaborative Filtering Benjamin Marlin Department of Computer Science University of Toronto Toronto, ON, M5S 3H5, CANADA marlin@cs.toronto.edu Abstract In this paper

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Test Generation for Designs with Multiple Clocks

Test Generation for Designs with Multiple Clocks 39.1 Test Generation for Designs with Multiple Clocks Xijiang Lin and Rob Thompson Mentor Graphics Corp. 8005 SW Boeckman Rd. Wilsonville, OR 97070 Abstract To improve the system performance, designs with

More information

Copyright 1972, by the author(s). All rights reserved.

Copyright 1972, by the author(s). All rights reserved. Copyright 1972, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Pawan Goyal CSE, IITKGP October 29-30, 2015 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 29-30, 2015 1 / 61 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation

More information

Using Both Latent and Supervised Shared Topics for Multitask Learning

Using Both Latent and Supervised Shared Topics for Multitask Learning Using Both Latent and Supervised Shared Topics for Multitask Learning Ayan Acharya, Aditya Rawal, Raymond J. Mooney, Eduardo R. Hruschka UT Austin, Dept. of ECE September 21, 2013 Problem Definition An

More information

Content-based Recommendation

Content-based Recommendation Content-based Recommendation Suthee Chaidaroon June 13, 2016 Contents 1 Introduction 1 1.1 Matrix Factorization......................... 2 2 slda 2 2.1 Model................................. 3 3 flda 3

More information

CÁTEDRA ENDESA DE LA UNIVERSIDAD DE SEVILLA

CÁTEDRA ENDESA DE LA UNIVERSIDAD DE SEVILLA Detection of System Disturbances Using Sparsely Placed Phasor Measurements Ali Abur Department of Electrical and Computer Engineering Northeastern University, Boston abur@ece.neu.edu CÁTEDRA ENDESA DE

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n.

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n. University of Groningen Geographically constrained information retrieval Andogah, Geoffrey IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Logistic Regression Logistic

Logistic Regression Logistic Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

ECEN 689 Special Topics in Data Science for Communications Networks

ECEN 689 Special Topics in Data Science for Communications Networks ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 8 Random Walks, Matrices and PageRank Graphs

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Lawrence Livermore National Laboratory Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Keith Henderson and Tina Eliassi-Rad keith@llnl.gov and eliassi@llnl.gov This work was performed

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Understanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014

Understanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014 Understanding Comments Submitted to FCC on Net Neutrality Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014 Abstract We aim to understand and summarize themes in the 1.65 million

More information

Collaborative Filtering. Radek Pelánek

Collaborative Filtering. Radek Pelánek Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine

More information

The Perceptron Algorithm 1

The Perceptron Algorithm 1 CS 64: Machine Learning Spring 5 College of Computer and Information Science Northeastern University Lecture 5 March, 6 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu Introduction The Perceptron

More information

Algorithms for Learning Good Step Sizes

Algorithms for Learning Good Step Sizes 1 Algorithms for Learning Good Step Sizes Brian Zhang (bhz) and Manikant Tiwari (manikant) with the guidance of Prof. Tim Roughgarden I. MOTIVATION AND PREVIOUS WORK Many common algorithms in machine learning,

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 11, 2016 Paper presentations and final project proposal Send me the names of your group member (2 or 3 students) before October 15 (this Friday)

More information

Databases. DBMS Architecture: Hashing Techniques (RDBMS) and Inverted Indexes (IR)

Databases. DBMS Architecture: Hashing Techniques (RDBMS) and Inverted Indexes (IR) Databases DBMS Architecture: Hashing Techniques (RDBMS) and Inverted Indexes (IR) References Hashing Techniques: Elmasri, 7th Ed. Chapter 16, section 8. Cormen, 3rd Ed. Chapter 11. Inverted indexing: Elmasri,

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

A Study of the Dirichlet Priors for Term Frequency Normalisation

A Study of the Dirichlet Priors for Term Frequency Normalisation A Study of the Dirichlet Priors for Term Frequency Normalisation ABSTRACT Ben He Department of Computing Science University of Glasgow Glasgow, United Kingdom ben@dcs.gla.ac.uk In Information Retrieval

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Generic Text Summarization

Generic Text Summarization June 27, 2012 Outline Introduction 1 Introduction Notation and Terminology 2 3 4 5 6 Text Summarization Introduction Notation and Terminology Two Types of Text Summarization Query-Relevant Summarization:

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 8: Evaluation & SVD Paul Ginsparg Cornell University, Ithaca, NY 20 Sep 2011

More information

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis

More information

Factor Modeling for Advertisement Targeting

Factor Modeling for Advertisement Targeting Ye Chen 1, Michael Kapralov 2, Dmitry Pavlov 3, John F. Canny 4 1 ebay Inc, 2 Stanford University, 3 Yandex Labs, 4 UC Berkeley NIPS-2009 Presented by Miao Liu May 27, 2010 Introduction GaP model Sponsored

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Collaborative Hotel Recommendation based on Topic and Sentiment of Review Comments

Collaborative Hotel Recommendation based on Topic and Sentiment of Review Comments DEIM Forum 2017 P6-2 Collaborative Hotel Recommendation based on Topic and Sentiment of Abstract Review Comments Zhan ZHANG and Yasuhiko MORIMOTO Graduate School of Engineering, Hiroshima University Hiroshima

More information

COMP 551 Applied Machine Learning Lecture 2: Linear Regression

COMP 551 Applied Machine Learning Lecture 2: Linear Regression COMP 551 Applied Machine Learning Lecture 2: Linear Regression Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise

More information

COMP 551 Applied Machine Learning Lecture 2: Linear regression

COMP 551 Applied Machine Learning Lecture 2: Linear regression COMP 551 Applied Machine Learning Lecture 2: Linear regression Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Least Angle Regression, Forward Stagewise and the Lasso

Least Angle Regression, Forward Stagewise and the Lasso January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,

More information

Classification with Perceptrons. Reading:

Classification with Perceptrons. Reading: Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Fast Logistic Regression for Text Categorization with Variable-Length N-grams

Fast Logistic Regression for Text Categorization with Variable-Length N-grams Fast Logistic Regression for Text Categorization with Variable-Length N-grams Georgiana Ifrim *, Gökhan Bakır +, Gerhard Weikum * * Max-Planck Institute for Informatics Saarbrücken, Germany + Google Switzerland

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Linear Regression. Volker Tresp 2014

Linear Regression. Volker Tresp 2014 Linear Regression Volker Tresp 2014 1 Learning Machine: The Linear Model / ADALINE As with the Perceptron we start with an activation functions that is a linearly weighted sum of the inputs h i = M 1 j=0

More information

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs) ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality

More information

Notes on Markov Networks

Notes on Markov Networks Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

Math 350: An exploration of HMMs through doodles.

Math 350: An exploration of HMMs through doodles. Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or

More information