Prediction of Citations for Academic Papers

Size: px
Start display at page:

Download "Prediction of Citations for Academic Papers"


1 Prediction of Citations for Academic Papers Utkarsh Simha Sindhura Raghavan Sai Spoorthy Padigi University of California, San Diego Abstract We aim to approach the problem of literature search for an unpublished academic paper through a predictive technique of paper citation. In this project, we undertake the task of predicting whether one academic paper will cite another paper, given a pair of academic papers. We explore the characteristics of a subset of the DBLP dataset, a computer science bibliography that provides a comprehensive list of research papers in computer science. We examine the correlation between features of a pair of papers, such as the author similarity, popularity of publication conference, co-citation, etc., with respect to the classification label. We experiment with various models such as Logistic Regression, Multilayer Perceptron for the task of classification. 1 Introduction The process of finding research literature and related work can be simplified by predicting citations from a current paper, based on the paper s abstract and content. To achieve this, we model paper citation as the task of predicting whether a one-directional citation is possible in a pair of academic papers. In the first section, we analyze the data in an exploratory fashion to understand the available features of each paper publication, the scope of each of these features, and the relevance of these features to each other. In the next section, we use this information to explore the tasks of predicting if one paper would cite another given a pair of papers. For this, prediction task we further explore relevant features that could affect the probability of citation for two papers. We next explore the task of predicting if a paper A would cite a paper B. We investigate the correlation with features such as years of publication of both the papers, the author similarity of both the papers, citation counts for the conference venues and the co-citation count of both the papers. For the task of prediction, we begin by using logistic regression for classification based on a simplistic feature set. We then expand the same model using a feature set that was built after exploring and analyzing the dataset to find correlation between two papers. Lastly, we draw conclusions from the results of our experiments and delineate the reasons for why our improvements and feature selection worked. 1.1 Notation We shall refer to a paper p i citing a paper p j as p i p j. Henceforth we shall refer to each such pair of paper as p i and p j unless mentioned otherwise. We shall use set notation to refer to in-cite or out-cite sets. In-cite set for a paper p j can be written as { p k p k p j }, that is the number of papers p k that cite a paper p j. Similarly, out-cite set for a paper p i can be written as { p k p i p k }, that is the number of papers p k that have been cited by a paper p i. 1

2 Exploratory Analysis In this section, we examine the characteristics of the data and the various features of the dataset as a whole. 2.1 The Dataset The DBLP computer science bibliography contains the metadata of over 3.6 million publications, written by over 1 million authors in several thousands of journals or conference proceedings series. The dataset [1] has all important journals of computer science since Each paper contains the following metadata fields: 1. ID: The ID of the paper, a unique identifier for each paper in the integer format 2. Title: The title of the paper 3. Abstract: The abstract of the paper 4. Authors: The list of authors for the paper 5. Conference: The conference venue at which the paper was presented 6. References: The list papers that the paper cites as references 7. Year: The year the paper was published We observed that a lot of papers did not have the abstract present. Hence, we considered only the set of papers that had an abstract to be a part of our dataset. Next, we obtain a dataset for our prediction task, by creating 600K pairs of papers from the existing dataset of the form (p i, p j ). For each pair, we created the labels by assigning 1 if p i cites p j and a 0 if it doesnt. 2.2 Features We gather features from the meta-data of each paper pair. The premise of a good feature is one that helps us predict whether p i p j. Features that can be obtained from the metadata can be broadly text-based and citation-statistics based. While semantic similarity between abstracts of the paper and the TF-IDF representations of the abstract are useful to model text-based correlations, the citation-statistics based features capture statistics of citations with respect to the authors of both papers and conferences in which they are published. Neither text-based nor citation-based features performed well in isolation as explored in Strohman [2] Following are the list of features we modeled for our prediction task for a pair of papers: 1. Author similarity: It is the Jaccard similarity between the author sets of paper p i and paper ai aj p 2. This can be represented in set notation as follows: a i a Where a j i and a j represent the set of authors for papers p i and p j respectively 2. Author history: The history of the authors of the paper p i for citing the paper p j. That is, the number of times authors of the paper p i have cited the paper p j, through some other paper p k. This can be represented as follows: {p k p k p j } where p i and p k have been authored by the same author. 3. Venue pair citations: The number of times a paper at venue v i cits a paper at venue v j, where p i v i and p j v j. This can be represented as follows: {p i p i p j, p i v i, p j v j } 4. Co-citation score: The Jaccard similarity between the set of papers that have cited the paper p i and the set of papers that have cited the paper p j. We can represent this as follows: Let r i be the set of papers that cite p i. We can refer to this as the in-cite set of p i. That is, r i = { p k p k p i }. Similarly, for r j = { p k p k p j }. Then the co-citation score is computed as the jaccard similarity between these two sets: ri rj r i r j 5. Abstract Vector: 200 dimensional TF-IDF vectors of the abstract of the papers 2

3 Abstract similarity: This feature provides the cosine similarity between the TF-IDF vectors of the abstract between the two papers. The intuition is that if two papers belong to a similar topic, the chances are higher that paper p i cites paper p j. 7. Year 1: Year of publication of paper p i 8. Year 2: Year of publication of paper p j 2.3 Feature Selection Table 1: Statistics of Features Feature Min Max Mean Abstract Similarity Author similarity Co-citation score Venue pair citations Author history Year Year Figure 1: Histogram plots of Features. Table 2 provides the Pearson correlation score for each of the features chosen and the label, while table 3 provides the Pearson correlation between features and the co-citation count. Higher the absolute value of the Pearson correlation coefficient, greater the influence of a feature on predicting the output. From the Pearson correlation coefficients and the plots, we can observe that author, venue and abstract related features are highly correlated to the co-citation count as well as the final labels, which helps in selecting features for citation prediction. The co-citation count can be used to populate a matrix of ground-truth values for all permutations of paper pairs. This matrix can be used to train a Matrix Factorization model, while the features for paper pairs can be used for context-aware recommender systems such as Factorization Machines [4]. 3

4 Figure 2: Plot of number of citations across venues. Table 2: Pearson Correlation between features and labels Feature Pearson Correlation Co-efficient Abstract Similarity Author similarity Co-citation score Venue pair citations Author history Year Year e-06 3 Identifying the Predictive Task The task of citation prediction for literature search can be modeled in various ways such as predicting the citation count of a paper, co-citation count of a pair of papers, and lastly, predicting whether one paper cites another paper and using this information to rank papers that can be possible cited. 3.1 Co-citation prediction We initially explored the task of predicting co-citation count for a pair of papers {(p i, p j ) p i p j } using the selected features as feature space and the co-citation count as the ground truth. We experimented with regression models such as Gradient Boosted Regression, Linear Regression and recommendation algorithm - Factorization Machine. Prediction models for Co-citation prediction 1. Linear Regression: Linear regression is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (features) denoted by X. For this task the set of features present in Section 2.2 represent X. The cocitation scores between papers p i and p j is the variable y. The equation for the model is: y = θ T X (1) 2. Factorization Machines: FMs are used to learn feature-pair-wise biases along with global and feature-wise parameters. 4

5 Table 3: Pearson Correlation between features and co-citation count Feature Pearson Correlation Co-efficient Author similarity Author history Venue pair citations Abstract Similarity Year Year The model equation is: Figure 3: Scatter plots of Features vs. Co-citation count. ŷ = w 0 + n w i x i + i=1 n n i=1 j=i+1 v i, v j x i x j (2) where w 0 R, w R n and V R nxk are the global bias, weights of i-th feature and row v i describes the i-th variable with k factors, respectively. L-2 Regularization hyper-parameters for w and V can be included and tuned for, along with standard deviation initialization of the parameters. The best MSE was obtained at init stdev = 0.1, l2 reg w = 0.01 and l2 reg V =

6 Evaluation Metric The evaluation metric used for this task was Mean Squared Error (MSE) since this is a regression task. MSE = 1 n (y ŷ) 2 (3) n 3.2 Citation prediction i=1 Table 4: Evaluation of Co-citation prediction Model Train MSE Test MSE Linear Regression Gradient Boosted Regression FM In this prediction task, we aim to predict if a paper p i cites paper p j, given a pair of papers (p i p j ). This task has been explained in the next section. 4 Literature Survey 4.1 Datasets 1. CiteSeer: One of the widely used datasets for academic citation meta-data is CiteSeer. Although the dataset is free to use, it requires permission from the Penn State University to access it. Thus we couldn t obtain this dataset. We tried to write a scraper and use it to try to scrape data off the CiteSeer website. This however did not work as they limit the number of pages to scrape to 50 which was insufficient for us 2. Microsoft Academic Graph: The MAG dataset is one of the other widely used datasets for citation prediction and recommendation. The data can be accessed through an API which provides 10,000 free calls per month. As the query format for this was very complex and wouldn t cater to our task, we decided to not use it. 3. arxiv: The arxiv dataset consisted of papers in High Energy Physics which was given as part of a KDD Cup contest. Unfortunately, the size of the dataset was limited to 30,000 papers and thus was insufficient for our task. 4. NIPS: One of the datasets on Kaggle was NIPS papers. This was mostly in the field of Machine Laerning and Deep Learning. The dataset contained only 8000 samples and thus was insufficient for our task. 4.2 Features 1. h-index: The h-index is a good measure both the productivity and impact of the published work and the author. The h-index is the best value of h such that h papers of the author have received atleast h citations. 2. Content based: The novelty, diversity, popularity and quality of a paper are assessed with the using the topic distribution of it s contents. 3. Temporal statistics: Statistics collected about the author and paper over a recent period of time can be used to quantify trends over time. 4. Citation graph: Citation graph clustering and mining, along with measures such as Katz distance measure (to determine relevance of a publication within a group of clustered papers) can be used to infer characteristics of a similar set of publications. 6

7 Other predictive tasks 1. h-index: Predicting h-index is indirectly a predictive measure of the impact of the publication 2. Citation count: Prediction of citation count of a paper and it s change over time is another predictive task 3. Citation recommendation: Recommendation of papers as citations given a paper s title, abstract, venue of publication and possibly contents. 5 Model 5.1 Training We divided the dataset of 600K samples into train data and test data with a 60%-40% 1 split. The data was shuffled. The label to predict is 1 if the paper p i cites the paper p j, 0 otherwise. At first, we used all features other than the ones pertaining to author similarity and author history. This yielded in a testing accuracy of 86%. Upon adding the author features, the Logistic Regression model yielded in an accuracy of 99.8%. Although this might seem very high and probably a miscalculation, we verified these results by checking the R-score, precision and recall. We also verified that the test prediction and the test labels are matching. To further confirm this, we trained different models on data whose performance can be found below: 5.2 Model Description MODEL We have used three models for evaluation. TEST ACCURACY Logistic Regression 99.8% Deep Neural Network 99.7% Gradient Boosting Regression 99.9% 1. Logistic Regression: A simple logistic regression model was used. The predictions were converted to 1 or 0 by thresholding the predictions. 2. Deep Neural Networks: A two layer deep neural network, with 256 hidden neurons and a dropout factor of 0.5 was trained using Adagrad optimizer. 3. Gradient Boosting Regressor: A simple Gradient Boosting Regressor with decision tree stumps was used with 300 estimators and a max depth of 2 per tree stump Each of these models performed equally well on the test set. The Logistic Regression took the least amount of time to train, while the Gradient Boosting Regressor to the longest. 6 Scope for future work This model can be extended that for recommendation. Top-k papers can be selected using a naive algorithm such as k-means clustering or title similarity. The above model can be used to predict a score to rank these papers and used to recommend the top-n where n k This can also be further extended to predict the number of citations a paper would get in the next n years. 1 Train size:360,000 and Test size:240,000 7

8 References [1] The entire DBLP dataset of bibliography entries in XML format [2] T. Strohman, W. Bruce Croft, D. Jensen. Recommending Citations for Academic Papers. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR07). [3] R. Yan, J. Tang, X. Liu, D. Shan. Citation Count Prediction: Learning to Estimate Future Citations for Literature. [4] S. Rendle. Factorization Machines 8

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

CSE 258, Winter 2017: Midterm

CSE 258, Winter 2017: Midterm CSE 258, Winter 2017: Midterm Name: Student ID: Instructions The test will start at 6:40pm. Hand in your solution at or before 7:40pm. Answers should be written directly in the spaces provided. Do not

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information


MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information


CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Motivation. User. Retrieval Model Result: Query. Document Collection. Information Need. Information Retrieval / Chapter 3: Retrieval Models

Motivation. User. Retrieval Model Result: Query. Document Collection. Information Need. Information Retrieval / Chapter 3: Retrieval Models 3. Retrieval Models Motivation Information Need User Retrieval Model Result: Query 1. 2. 3. Document Collection 2 Agenda 3.1 Boolean Retrieval 3.2 Vector Space Model 3.3 Probabilistic IR 3.4 Statistical

More information

Collaborative Filtering. Radek Pelánek

Collaborative Filtering. Radek Pelánek Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

More information

Collaborative Filtering Applied to Educational Data Mining

Collaborative Filtering Applied to Educational Data Mining Journal of Machine Learning Research (200) Submitted ; Published Collaborative Filtering Applied to Educational Data Mining Andreas Töscher commendo research 8580 Köflach, Austria

More information

Nonlinear Classification

Nonlinear Classification Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

More information

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks Mohit Shridhar Stanford University, Abstract In particle physics, Higgs Boson to tau-tau

More information

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Caesar s Taxi Prediction Services

Caesar s Taxi Prediction Services 1 Caesar s Taxi Prediction Services Predicting NYC Taxi Fares, Trip Distance, and Activity Paul Jolly, Boxiao Pan, Varun Nambiar Abstract In this paper, we propose three models each predicting either taxi

More information


SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information


CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Support Vector Machine and Neural Network Instructor: Yizhou Sun April 24, 2017 Homework 1 Announcements Due end of the day of this Friday (11:59pm) Reminder

More information

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

More information

Introduction to Logistic Regression

Introduction to Logistic Regression Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Supervised Learning. George Konidaris

Supervised Learning. George Konidaris Supervised Learning George Konidaris Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,

More information

Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Course 10. Kernel methods. Classical and deep neural networks.

Course 10. Kernel methods. Classical and deep neural networks. Course 10 Kernel methods. Classical and deep neural networks. Kernel methods in similarity-based learning Following (Ionescu, 2018) The Vector Space Model ò The representation of a set of objects as vectors

More information

Ensembles of Classifiers.

Ensembles of Classifiers. Ensembles of Classifiers 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting

More information


SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Experiment presentation for CS3710:Visual Recognition Presenter: Zitao Liu University of Pittsburgh

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information



More information

Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning

Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Supplementary Material Zheyun Feng Rong Jin Anil Jain Department of Computer Science and Engineering, Michigan State University,

More information

Practicals 5 : Perceptron

Practicals 5 : Perceptron Université Paul Sabatier M2 SE Data Mining Practicals 5 : Perceptron Framework The aim of this last session is to introduce the basics of neural networks theory through the special case of the perceptron.

More information

Topic 3: Neural Networks

Topic 3: Neural Networks CS 4850/6850: Introduction to Machine Learning Fall 2018 Topic 3: Neural Networks Instructor: Daniel L. Pimentel-Alarcón c Copyright 2018 3.1 Introduction Neural networks are arguably the main reason why

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

Learning from Examples

Learning from Examples Learning from Examples Data fitting Decision trees Cross validation Computational learning theory Linear classifiers Neural networks Nonparametric methods: nearest neighbor Support vector machines Ensemble

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Neural Networks DWML, /25

Neural Networks DWML, /25 DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference

More information

Neural Networks Task Sheet 2. Due date: May

Neural Networks Task Sheet 2. Due date: May Neural Networks 2007 Task Sheet 2 1/6 University of Zurich Prof. Dr. Rolf Pfeifer, Department of Informatics, AI Lab Matej Hoffmann, Andreasstrasse 15 Marc Ziegler,

More information

Decision Trees: Overfitting

Decision Trees: Overfitting Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9

More information

text classification 3: neural networks

text classification 3: neural networks text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing Mohit Iyyer College of Information and Computer Sciences University

More information

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 15: Learning to Rank Sec. 15.4 Machine learning for IR ranking? We ve looked at methods for ranking

More information

Dreem Challenge report (team Bussanati)

Dreem Challenge report (team Bussanati) Wavelet course, MVA 04-05 Simon Bussy, Antoine Recanati, Dreem Challenge report (team Bussanati) Description and specifics of the challenge We worked on the

More information

Analysis of Fast Input Selection: Application in Time Series Prediction

Analysis of Fast Input Selection: Application in Time Series Prediction Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Département Informatique Conservatoire

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

An Ensemble Ranking Solution for the Yahoo! Learning to Rank Challenge

An Ensemble Ranking Solution for the Yahoo! Learning to Rank Challenge An Ensemble Ranking Solution for the Yahoo! Learning to Rank Challenge Ming-Feng Tsai Department of Computer Science University of Singapore 13 Computing Drive, Singapore Shang-Tse Chen Yao-Nan Chen Chun-Sung

More information

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007 Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University Jeff Schneider The Robotics Institute

More information

Machine Learning: Assignment 1

Machine Learning: Assignment 1 10-701 Machine Learning: Assignment 1 Due on Februrary 0, 014 at 1 noon Barnabas Poczos, Aarti Singh Instructions: Failure to follow these directions may result in loss of points. Your solutions for this

More information



More information

CSE 546 Midterm Exam, Fall 2014

CSE 546 Midterm Exam, Fall 2014 CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,

More information

Machine Learning (CSE 446): Neural Networks

Machine Learning (CSE 446): Neural Networks Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

Deep Convolutional Neural Networks for Pairwise Causality

Deep Convolutional Neural Networks for Pairwise Causality Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, Delhi Tata Consultancy Services Ltd. {karamjit.singh,

More information

epochs epochs

epochs epochs Neural Network Experiments To illustrate practical techniques, I chose to use the glass dataset. This dataset has 214 examples and 6 classes. Here are 4 examples from the original dataset. The last values

More information

11. Learning To Rank. Most slides were adapted from Stanford CS 276 course.

11. Learning To Rank. Most slides were adapted from Stanford CS 276 course. 11. Learning To Rank Most slides were adapted from Stanford CS 276 course. 1 Sec. 15.4 Machine learning for IR ranking? We ve looked at methods for ranking documents in IR Cosine similarity, inverse document

More information

Ensemble Methods and Random Forests

Ensemble Methods and Random Forests Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization

More information

Conquering the Complexity of Time: Machine Learning for Big Time Series Data

Conquering the Complexity of Time: Machine Learning for Big Time Series Data Conquering the Complexity of Time: Machine Learning for Big Time Series Data Yan Liu Computer Science Department University of Southern California Mini-Workshop on Theoretical Foundations of Cyber-Physical

More information

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

More information

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Learning and Neural Networks

Learning and Neural Networks Artificial Intelligence Learning and Neural Networks Readings: Chapter 19 & 20.5 of Russell & Norvig Example: A Feed-forward Network w 13 I 1 H 3 w 35 w 14 O 5 I 2 w 23 w 24 H 4 w 45 a 5 = g 5 (W 3,5 a

More information

Chicago Crime Category Classification (CCCC)

Chicago Crime Category Classification (CCCC) Chicago Crime Category Classification (CCCC) Sahil Agarwal 1, Shalini Kedlaya 2 and Ujjwal Gulecha 3 Abstract It is very important and crucial to keep a city safe from crime. If the police could be given

More information

Multi-Plant Photovoltaic Energy Forecasting Challenge with Regression Tree Ensembles and Hourly Average Forecasts

Multi-Plant Photovoltaic Energy Forecasting Challenge with Regression Tree Ensembles and Hourly Average Forecasts Multi-Plant Photovoltaic Energy Forecasting Challenge with Regression Tree Ensembles and Hourly Average Forecasts Kathrin Bujna 1 and Martin Wistuba 2 1 Paderborn University 2 IBM Research Ireland Abstract.

More information

Predicting MTA Bus Arrival Times in New York City

Predicting MTA Bus Arrival Times in New York City Predicting MTA Bus Arrival Times in New York City Man Geen Harold Li, CS 229 Final Project ABSTRACT This final project sought to outperform the Metropolitan Transportation Authority s estimated time of

More information

10701/15781 Machine Learning, Spring 2007: Homework 2

10701/15781 Machine Learning, Spring 2007: Homework 2 070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 3 Modeling Introduction to IR Models Basic Concepts The Boolean Model Term Weighting The Vector Model Probabilistic Model Retrieval Evaluation, Modern Information Retrieval,

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

Tutorial 2. Fall /21. CPSC 340: Machine Learning and Data Mining

Tutorial 2. Fall /21. CPSC 340: Machine Learning and Data Mining 1/21 Tutorial 2 CPSC 340: Machine Learning and Data Mining Fall 2016 Overview 2/21 1 Decision Tree Decision Stump Decision Tree 2 Training, Testing, and Validation Set 3 Naive Bayes Classifier Decision

More information


MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248) AIM HIGH SCHOOL Curriculum Map 2923 W. 12 Mile Road Farmington Hills, MI 48334 (248) 702-6922 COURSE TITLE: Statistics DESCRIPTION OF COURSE: PREREQUISITES: Algebra 2 Students will

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Predictive analysis on Multivariate, Time Series datasets using Shapelets 1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University Abstract Multivariate,

More information


INTRODUCTION TO DATA SCIENCE INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #13 3/9/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Mini-Project #1 is due Saturday night (3/11): Seems like people are able to do

More information

Natural Language Processing with Deep Learning CS224N/Ling284

Natural Language Processing with Deep Learning CS224N/Ling284 Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb

More information