Distributed ML for DOSNs: giving power back to users

Similar documents
Latent Dirichlet Allocation Introduction/Overview

Recent Advances in Bayesian Inference Techniques

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Latent Dirichlet Allocation (LDA)

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Latent Dirichlet Allocation (LDA)

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Topic Models and Applications to Short Documents

Applying LDA topic model to a corpus of Italian Supreme Court decisions

Study Notes on the Latent Dirichlet Allocation

Generative Clustering, Topic Modeling, & Bayesian Inference

CS Lecture 18. Topic Models and LDA

Topic Modelling and Latent Dirichlet Allocation

topic modeling hanna m. wallach

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Content-based Recommendation

Topic Models. Charles Elkan November 20, 2008

CS145: INTRODUCTION TO DATA MINING

Understanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Collaborative topic models: motivations cont

Pattern Recognition and Machine Learning

Language Information Processing, Advanced. Topic Models

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Gaussian Mixture Model

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

Statistical Debugging with Latent Topic Models

Text Mining for Economics and Finance Latent Dirichlet Allocation

Topic Models. Material adapted from David Mimno University of Maryland INTRODUCTION. Material adapted from David Mimno UMD Topic Models 1 / 51

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Data Mining Techniques

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:

Topic Modeling Using Latent Dirichlet Allocation (LDA)

Machine Learning Techniques for Computer Vision

Kernel Density Topic Models: Visual Topics Without Visual Words

Using Both Latent and Supervised Shared Topics for Multitask Learning

Models of collective inference

Mixed-membership Models (and an introduction to variational inference)

Lecture 13 : Variational Inference: Mean Field Approximation

A Continuous-Time Model of Topic Co-occurrence Trends

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Document and Topic Models: plsa and LDA

Gaussian Models

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Dirichlet Enhanced Latent Semantic Analysis

Machine Learning

Applying hlda to Practical Topic Modeling

Non-Parametric Bayes

Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability

arxiv: v1 [cs.si] 7 Dec 2013

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Topic Modeling: Beyond Bag-of-Words

Dimension Reduction (PCA, ICA, CCA, FLD,

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS

Online Bayesian Passive-Agressive Learning

Collaborative Topic Modeling for Recommending Scientific Articles

Bayesian Models in Machine Learning

STA 4273H: Statistical Machine Learning

Non-parametric Clustering with Dirichlet Processes

CS 540: Machine Learning Lecture 1: Introduction

PROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY. Arto Klami

Click Prediction and Preference Ranking of RSS Feeds

Haupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Lecture 22 Exploratory Text Analysis & Topic Models

METHODS FOR IDENTIFYING PUBLIC HEALTH TRENDS. Mark Dredze Department of Computer Science Johns Hopkins University

Construction of Dependent Dirichlet Processes based on Poisson Processes

28 : Approximate Inference - Distributed MCMC

Query-document Relevance Topic Models

Latent Dirichlet Allocation

Latent variable models for discrete data

Evaluation Methods for Topic Models

Part IV: Monte Carlo and nonparametric Bayes

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Topic Learning and Inference Using Dirichlet Allocation Product Partition Models and Hybrid Metropolis Search

Introduction to Probabilistic Machine Learning

Unified Modeling of User Activities on Social Networking Sites

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Modeling User Rating Profiles For Collaborative Filtering

Unsupervised Learning

Distinguish between different types of scenes. Matching human perception Understanding the environment

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Welcome to CAMCOS Reports Day Fall 2011

Lecture 19, November 19, 2012

EMERGING TOPIC MODELS CAMCOS REPORT FALL 2011 NEETI MITTAL

Introduction To Machine Learning

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Web-Mining Agents Topic Analysis: plsi and LDA. Tanya Braun Ralf Möller Universität zu Lübeck Institut für Informationssysteme

PROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY

MEI: Mutual Enhanced Infinite Community-Topic Model for Analyzing Text-augmented Social Networks

IE598 Big Data Optimization Introduction

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

IPSJ SIG Technical Report Vol.2014-MPS-100 No /9/25 1,a) 1 1 SNS / / / / / / Time Series Topic Model Considering Dependence to Multiple Topics S

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

AN INTRODUCTION TO TOPIC MODELS

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Statistical Models. David M. Blei Columbia University. October 14, 2014

Transcription:

Distributed ML for DOSNs: giving power back to users Amira Soliman KTH isocial Marie Curie Initial Training Networks

Part1 Agenda DOSNs and Machine Learning DIVa: Decentralized Identity Validation for Social Networks Part2 Topic Models Latent Dirichlet Allocation (LDA) LDA for DOSNs

DOSNs and Shifting Roles Apps SNoT Trust Data Benefits of DML in DOSNs Self-adaptive components Personalized services

Distributed ML for DOSNs Challenges: Heterogeneity: Different behavioral patterns Different data generation rates Different connectivity and roles Availability Incremental Updates: Social feeds streams Lifetime of models

DIVa: Decentralized Identity Validation 1. DIVa is a decentralized identity validation model. 2. DIVa provides users with community-aware validation rules that conceptualize users identities better than the centralized approach.

DIVa: Main Steps {City, University} {University, School} School : ABC University: LCAS X City: Y School : ABC University: X City: Y LCAS Community1 School : ABC University: X LCAS City: Y Degree: PhD Eng Job: PostDoc Employer: AXYZ LCAS Degree: Bsc Eng LCAS Job: Developer Employer: AXYZ Interests: Football, Climbing, Reading Community2 LCAS Job: Comm. Eng City: Z Interests: LCAS Football, Climbing, Waterskiing Job: Administrator Employer: AXYZ Interests: Music, Acting LCAS LCAS Degree: Msc Eng Job: Developer LCAS Employer: AXYZ Interests: Football, Climbing, Swimming Job: Accountant City: Z Interests: LCAS Football, Waterskiing, Climbing Job: System Analyst LCAS City: Z Interests: Football, Climbing, Waterskiing {Employer, Degree} {Degree, Interests} {City, Interests} Community3

DIVa: Main Steps (cont.) 1. Association Rule Mining 2. Community Detection Degree("Eng.") (Eng, X) s T Employer ("X") 3 0.75 4 Decision Rules: 2 7 3. Community-level Dominant CID Aggregation among direct friends Max CID among direct friends 4 9 1 0 5 10 3 11 6 12.... 2 4 9 1 0 5 10 3 7 6 11 12 8 8

Results DIVa achieved improvements over Centralized Loss ratio if Centralized validation is applied WWW15 feedback on DIVa: Deeper analysis about attributes PCA Overlapping communities Soft Clustering Incremental updates Community detection Community-level aggregation

TOPIC MODELING isocial Marie Curie Initial Training Networks isocial meeting 27-28/1/2015 Crete http://isocial-itn.eu/

Document clustering: 1. Uncover hidden topics, 2. Annotate documents according to those topics, Topic Models 3. Use annotation to organize, summarize and understand these documents

Each document is a bag of words How many clusters? Document Clustering Bayesian nonparametric methods (E.G., Dirichlet Processes) automatically detect how many clusters there are.

Latent Dirichlet Allocation (LDA)

LDA Generative Model LDA Model from Blei (2011) Each document is a random mixture of corpus-wide topics Each word is drawn from one of those topics

LDA Graphical model z a For each document d = 1,,M Generate d ~ Dir( a) For each position n = 1,, N d generate z n ~ Mult( d ) generate w n ~ Mult( zn ) w N M

Dirichlet Distribution The Dirichlet Distribution is parameterised by a set of concentration constants a defined over the k-simplex (a multinomial probability distribution): a ( a 1 a ) a 0 i {1 k} k i 2 1 0.2 0.5 0.3 1 2 3 i i 1 3 1 0 1 1 1 0 0 1 2 3 i i 1

LDA Inference Treat data as observations that arise from a generative probabilistic process that includes hidden variables For documents, the hidden variables reflect the thematic structure of the collection. Infer the hidden structure using posterior inference

LDA Inference We want to calculate the posterior: Two main ways to get posterior: - Sampling methods - Time consuming - Lots of black magic in sampling tricks - Variational methods - An approximation - Faster

Topic models of LDA LDA Unsupervised LDA Supervised LDA OSN Batch Incremental Gibbs Sampling Variational methods Parallel Online

Topic Models for DOSNs Issues to be addressed: 1. Short and noisy text, 2. Linked words instead of bag-of-words, 3. Global vs. localized models( communitybased models, models at most influential nodes), 4. Dynamic models.

Applications Topic Models LDA Topically-based Community Detection LDA Context-Aware Individual Recommendation System LDA DL Context-Aware Group Recommendation System LDA DL

Conclusion DOSNs creates the possibility of applying distributed and online learning in highly dynamic and heterogeneous environment, DIVa is a practical example of empowering users with customizable services, We target to implement topic models in DOSNs, and providing services such as ranking, summarization, recommendation systems.