Stance classification and Diffusion Modelling

Similar documents
Hawkes Processes for Continuous Time Sequence Classification: an Application to Rumour Stance Classification in Twitter

CSCI-567: Machine Learning (Spring 2019)

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Support Vector Machines

Logistic Regression & Neural Networks

Conditional Random Field

Hidden Markov Models Part 2: Algorithms

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

arxiv: v2 [cs.cl] 1 Jan 2019

Point Process Modelling of Rumour Dynamics in Social Media

DM-Group Meeting. Subhodip Biswas 10/16/2014

A Bivariate Point Process Model with Application to Social Media User Content Generation

Neural Architectures for Image, Language, and Speech Processing

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Final Examination CS 540-2: Introduction to Artificial Intelligence

Scaling Neighbourhood Methods

FA Homework 2 Recitation 2

An Introduction to Bayesian Machine Learning

Aspect Term Extraction with History Attention and Selective Transformation 1

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

1 Machine Learning Concepts (16 points)

Machine Learning Overview

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Behavioral Data Mining. Lecture 2

Compressed Sensing and Neural Networks

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

Simulation and Calibration of a Fully Bayesian Marked Multidimensional Hawkes Process with Dissimilar Decays

Introduction to Machine Learning Midterm Exam

Statistical Data Mining and Machine Learning Hilary Term 2016

with Local Dependencies

Conquering the Complexity of Time: Machine Learning for Big Time Series Data

CS145: INTRODUCTION TO DATA MINING

Recurrent Autoregressive Networks for Online Multi-Object Tracking. Presented By: Ishan Gupta

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

ANLP Lecture 22 Lexical Semantics with Dense Vectors

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning

CS145: INTRODUCTION TO DATA MINING

Yahoo! Labs Nov. 1 st, Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

ECE521 Lectures 9 Fully Connected Neural Networks

Click Prediction and Preference Ranking of RSS Feeds

A Statistical Analysis of Fukunaga Koontz Transform

STA 4273H: Statistical Machine Learning

From perceptrons to word embeddings. Simon Šuster University of Groningen

Memory-Augmented Attention Model for Scene Text Recognition

Multi-Class Sentiment Classification for Short Text Sequences

Behavioral Data Mining. Lecture 3 Naïve Bayes Classifier and Generalized Linear Models

arxiv: v1 [cs.lg] 12 Sep 2018

Generative Models for Discrete Data

Statistical NLP for the Web

STA 414/2104: Machine Learning

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Machine Learning Lecture 5

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Probabilistic Graphical Models

Introduction to Machine Learning Midterm, Tues April 8

Machine Learning Basics

Data Mining & Machine Learning

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Regularization in Neural Networks

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

CS6220 Data Mining Techniques Hidden Markov Models, Exponential Families, and the Forward-backward Algorithm

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Logistic Regression. COMP 527 Danushka Bollegala

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Large-scale Ordinal Collaborative Filtering

Multivariate statistical methods and data mining in particle physics

Markov Networks.

Introduction to Restricted Boltzmann Machines

ECE521 week 3: 23/26 January 2017

Tensor Methods for Feature Learning

Brief Introduction of Machine Learning Techniques for Content Analysis

Multi-theme Sentiment Analysis using Quantified Contextual

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Introduction to Machine Learning

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Bayesian Methods: Naïve Bayes

Natural Language Processing (CSEP 517): Text Classification

CSE 473: Artificial Intelligence Autumn Topics

CS249: ADVANCED DATA MINING

Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA

Generative Learning algorithms

Transcription:

Dr. Srijith P. K. CSE, IIT Hyderabad

Outline Stance Classification 1 Stance Classification 2

Stance Classification in Twitter Rumour Stance Classification Classify tweets as supporting, denying, questioning, or commenting an event. Useful for rumour truthfullness classification.

Stance Classification in Twitter Time Sensitive Sequence Classification Tweets form a conversation structure with reply tweets. Tweet classification as sequence labelling problem. Each tweet is associated with the time of occurrence.

Point Processes in Twitter Stance Classification Twitter data containing information tweet time, text, meme category and user {d n = (t n, W n, m n, i n)} N n=1. Models the spiky behaviour typically observed in social networks Self exciting point process with intensity for user λ i (t) = µ i + t l <t I(i l == i)α i κ(t t l ) Base intensity influence of past tweets Past tweets influence future tweets but decays exponentially over time κ(t t l ) = ω exp( ω(t t l ))

Point Processes in Twitter Stance Classification Twitter data containing information tweet time, text, meme category and user {d n = (t n, W n, m n, i n)} N n=1. Multivariate Hawkes models the influence of other users. λ i (t) = µ i + t l <t α i l,i κ(t t l ) Joint modelling of users and memes with multivariate Hawkes. λ i,m (t) = µ i γ m + t l <t I(m l == m)α il,i κ(t t l )

Time Sensitive Sequence Classification [Lukasik et al., 2016] Twitter data containing tweet time, text and label {d n = (t n, W n, m n, y n)} N n=1. Multivariate for stance classification Intensity modelled over labels λ y,m(t) = µ y + t l <t I(m l == m)α yl,y κ(t t l ) Classification of tweet depends on the textual Content p(w n y n) = V v=1 βwnv y nv. Likelihood of a tweet belonging to a class is proportional to p(w n y n) λ yn,m n (t n) emission likelihood transition likelihood Generative non-markovian model : Past tweet labels influence future tweet labels Generalizes Multinomial, Naive Bayes, hidden markov models.

Time Sensitive Sequence Classification Learning and Prediction Likelihood [ N N ] n=1 p(wn yn) n=1 λyn,mn (t n) p(e T ) Learning by maximum likelihood approach. Y M T y=1 m=1 0 λy,m(s)ds + N n=1 log λyn,mn (tn) + N V n=1 v=1 Wnv log βynv. Prediction of labels uses a greedy approach Datasets Dataset Tweets Supporting Denying Questioning Commenting Ottawa shooting 782 161 76 64 481 Ferguson riots 1017 161 82 94 680 Charlie Hebdo 1053 236 56 51 710 Sydney siege 1124 89 223 99 713

Time Sensitive Sequence Classification (experimental results)

Time Sensitive Sequence Classification (experimental results)

Deep learning for stance classification Convolutional Neural Network for Text Documents/sentence represented as a matrix. Row corresponds to vector representation of a word (word embeddings (word2vec/glove)) Filters slide over full rows of the matrix, width of filter same as embedding dimension.

Deep learning for stance classification Convolutional Neural Network for stance classification [chen et al., 2017] Learn word embedding as well as Glove representation. Uses different sized filters with number of filters of same size being 128.

Deep learning for stance classification Recurrent Neural Network for stance classification Use word embedding from Glove representation. Unroll over words in a tweet and predict the label.

Deep learning for stance classification Neural Network for stance classification deep learning model is the best method in terms of F1 score.

Deep learning for stance classification CNN and LSTM combination [amir et al., 2017] deep learning model is the best method in terms of F1 score.

Stance Classification Model evolution of memes over time. Model the behaviours of users, how they influence each other. Predict their popularity of memes

Predicting Rumour Popularity Stance Classification Predict rumour popularity measured as number of tweets in future time intervals Motivation: Assist officials and journalists with debunking rumours.

for Twitter [Srijith et al., 2017] Twitter contain information on conversation structure, users, and network. Model Hawkes process to consider Twitter Information.

for Twitter Stance Classification HPconversation model considering conversational structure HPconversation : Models conversational structure (spontaneous/replyto tweet) λ in,m n (t) = µ in γ mn Z nn + n l=1 I(m l == m n)z ln α il,i n κ(t t l ) Avoids performing summation over all previous terms

for Twitter Stance Classification HPdecomposition HPdecomposition : Decompose matrix to lower rank non-negative components. λ in,m n (t) = µ in γ mn Z nn + n l=1 I(m l == m n)z ln [I S ] il,i n κ(t t l ) Prevents overfitting by learning a reduced number of parameters.

for Twitter Stance Classification HPuserfeatures model considering user features HPuserfeatures : Consider user features by parameterizing influence and susceptibility matrices λ in,mn (t) = µ in γ mn Z nn + n l=1 I(m l == m n)z ln σ([x il I ] [x in S] )κ(t t l ). Learns the features determining influence and susceptibility ( (φlis +1)(φ Links ratio depends on number of followers and followees. log in +1) 2 φ out +1 ).

for Twitter Stance Classification HPconnection model considering connection information HPconnection : Consider network information by selectively regularizing matrix. Users which are not connected are less influential. HPregularization : Regularize matrix entries using l 2 norm.

for Twitter Stance Classification Learning Complete Likelihood is given by N n=1 λ i n,m n (t n) exp( R M T i=1 m=1 0 λ i,m(s)ds) inst. probabilities over all tweets survival probability Parameters are learnt by maximizing regularized log-likelihood. l(µ, γ, α, ω) = N n=1 log λ i n,m n (t n) R M T i=1 m=1 0 λ i,m(s)ds. l(µ, γ, α, ω) = N N n 1 Z n,n log(µ in γ mn ) + I(m l = m n)z l,n log α il,i n κ(t n t l ) n=1 R M T µ i γ m i=1 m=1 i=1 l=1 n=1 l=1 R N α il,i K(T t l )

for Twitter Stance Classification Prediction and Evaluation Prediction by modified thinning algorithm. Similar to rejection sampling Consider an upper bound λ λ(t) in [s, u] Generate a sample from homogenous Poisson process with rate λ Accept based on the ratio λ(t) λ Upper bound λ easy to obtain for! Evaluation using aligned mean squared error and number of predictions

for Twitter : Experimental Results (Synthetic)

for Twitter : Experimental Results (Synthetic)

for Twitter : Experimental Results (Ferguson)

Recurrent point process [du et al., 2017] Recurrent point process Hawkes process assumes that the influences from past events are linearly additive True relationship is not known Recurrent neural networks helps to learn the non-linear relations ships

Recurrent point process [du et al., 2017]

Recurrent point process [du et al., 2017]

References Srijith, P., Lukasik, M., Cohn, T., and Bontcheva, K. (2017). Longitudinal modeling of social media with hawkes process based on users and networks. In International Conference on Advances in Social Networks Analysis and Mining. Lukasik, M., Srijith, P., Cohn, T., and Bontcheva, K. (2016). Hawkes processes for continuous time sequence classification. In Association of Computational Linguistics. Yi-Chin Chen, Zhao-Yand Liu, and Hung-Yu Kao. (2017). Convolutional Neural Networks for Stance Detection and Rumor Verification In Proceedings of SemEval. ACL.. N. Du, H. Dai, R. Trivedi, U. Upadhyay, M. Gomez-Rodriguez, and L. Song. ( 2016). Recurrent Marked Temporal Point Process: Embedding Event History to Vector. In KDD.. Amir Pouran Ben Veyseh, Javid Ebrahimi, Dejing Dou, and Daniel Lowd. ( 2017). A Temporal Attentional Model for Rumor Stance Classification In CIKM.

Thank you Dr. P.K. Srijith Assistant Professor Computer Science and Engineering Indian Institute of Technology Hyderabad srijith@iith.ac.in