A Framework of Detecting Burst Events from Micro-blogging Streams
|
|
- Everett Ford
- 5 years ago
- Views:
Transcription
1 , pp A Framework of Detecting Burst Events from Micro-blogging Streams Kaifang Yang, Yongbo Yu, Lizhou Zheng, Peiquan Jin School of Computer Science and Technology, University of Science and Technology of China, , Hefei, China jpq@ustc.edu.cn Abstract. Micro-blogs greatly accelerate the information flow on the Internet. Recent studies showed that burst events spread much faster in micro-blogging platform than any other media. Therefore, it is very useful to detect burst events from micro-blogging streams so that we can acquire real-time news events and use them for government and business decision making. Traditional methods of events detection were mainly focused on public news media, and are not very suitable for the micro-blogging platform. In this paper, we propose a framework of burst events detection with an emphasis on the characteristics of the micro-blogging streams. Experimental results on a real data set crawled from a commercial micro-blogging platform demonstrate the effectiveness of our method. Keywords: Micro-blog, Event Detection, Burst Event. 1 Introduction Micro-blogging platform has been very popular in people s daily life. For example, Sina Weibo is a popular social networking platform in China, where users are able to share and discuss news and events. Compare with traditional Web pages, micro-blogs have the following features: (a) the length of micro-blogs are restricted in 140 words, which make them very short and brief in describing information; (b) compare with traditional Web pages, micro-blogs are more colloquial and informal; (c) micro-blogs usually report more real-time information. Since micro-blogging service has been one of the major information sharing platforms, both governments and business companies want to extract valuable information from micro-blogs, such as burst events. However, the traditional methods for events detection are designed in according to long texts such as Web pages,and are not suitable for micro-blogs. Therefore, it is urgent to study effective method for burst events detection in micro-blogging platforms. In this paper we introduce a new method to solve this problem. Compared with the previous works, our contributions can be summarized as follows: (1) Based on the characteristics of micro-blog, we propose a framework to detect burst events from micro-blogging streams detection framework. (2)We conduct experiments on a real data set crawled from Sina.Weibo, and the ISSN: ASTL Copyright 2013 SERSC
2 experimental results demonstrate the effectiveness of our method. The rest of the paper is organized as follows: Section 2 describes the related work. Section 3 discusses the framework to detect burst events from micro-blogs. In Section 4, we give the experimental results, and conclusion and further works are in Section 5. 2 Related Work Burst events detection is defined for a specific time and place events, which extend from the definition of topics. Current popular view is that the topic generally represents a relatively specific event, a central event or activity, but also directly related events and ensuing discussion. While an event is usually refers to something that happened at a particular time and place, related to the specific subject (people, institutions, organizations and so on).in our paper there is no clear distinction between topic detection and event detection. Traditional methods for burst event detection of Microblogging can be divided into the following categories:( a) Using the traditional event detecting and tracking (TDT), combined with widely used supervised machine learning methods to get the candidate event [1, 2]. (b) Monitor word frequency mutations: In [3], through by detecting the appearance of each word in recent time slices to determine whether it is a burst term; [4] proposed using wavelet to analysis microblogging information. ( c) Combining sentiment analysis methods to detect burst events [5 ]; (d )Proposed an improved clustering method based on microblog feature, using latent semantic analysis (LSA) on the vocabulary Text TF-IDF matrix, combined with the unique microblog on the label, facial expressions and other features clustering of microblog, detecting event on microblogging and extract the event summary. The majority of existing microblogging event detection methods do not fully consider the microblogging data features, or consider them too simply [ 3,6 ].Reference methods in paper [ 6 ], we proposed the measure criteria of hot burst and burst factor. According to the criteria to filer the microblogging keywords, and then combining keywords to obtain the relevant event summary. 3 A Framework to Detect Burst Events from Micro-blogs The framework of our method is shown in Fig Copyright 2013 SERSC
3 Micrologging Message Acquisition Data Cleaning Preprocessing Segmention&&POS Feature Trajectory Discrete Fourier Transform Model Word Filtrating Word Merger Summary Summary Fig. 1. The framework of detecting burst events from micro-blogs First, we perform preprocessing on the original dataset to obtain purify text from the raw dataset. Then we define a model to get the burst area and word.finally we get the burst event summary and other parameters. 3.1 Preprocessing Micro-blogging messages acquisition. We use the API provided by Sina weibo to acquire public micro-blogging texts. Data cleaning. There is a lot of noise in acquired messages, which does not actually useful to detect burst events. In preprocessing, we remove this noise from the message contents including emotion icons, mentioned names, URLs and other non-text. Segmentation and POS-tagging. Here we select widely used ICTCLAS [7] software which uses hidden Markov models to segment text. After segmentation, we discard the stop-words. 3.2 Feature trajectory model Lexical items feature. In order to fully measure the lexical items, We use the definition [6] of feature trajectories, word w in the feature event window T trajectory(df-idf) as:y w = [y w (1), y w (2) y w (T)], y w (t) = DD w(t) log N, DFw(t) is N(t) DDw the number of microblog which contains word w in slice t. N(t) is the total number of Copyright 2013 SERSC 381
4 microblog in slice t, N is the total microblog in window slice T. DD w is the number of microblog which contains word w in windows T. Discrete Fourier Transform (DFT). By means of Discrete Fourier transform the feature trajectory can obtained corresponding Spectrogram. According to the spectrogram we define the burst hot (S w = max{ x k 2 }, k = 2,3 T,) and burst factor (burstness = s w x 1 2, x 1 2 is the average DF-IDF value of corresponding interval T.). The Spectrogram can be used for frequency-domain analysis and words filtration. Word filtering. From the spectrogram we can observe the burst degree of each word. We can filter out hot words according to its spectrogram feature. First, filter out the periodic signal, the primary cycle is less than one should be filtered out. By setting different thresholds we can detect different levels of heat and hot burst words. Also what worth mentioning is that, a lot of experimental words appeared only once or several times in a time slice, which are considered to be noise. 3.3 Words Merger and Summary Word merger. A burst event or topic may have multiple bursts word, we use clustering method to merger those burst word. We define word distance as the degree of co-occurrence. The concurrence of w1 and w2 define as: d(w 1, w 2 ) = M w1 M w2,d(r min { M w1, M w2 } 1, R 2 ) = min{d(w 1, w 2 )}, w 1 R 1, w 2 R 2 M w is the number of microblogging which contains word w. We use hierarchical clustering method to cluster and get the summary of each burst event. Summary. We can generate hot events summary after hot words merger. Event Summary has the following format: event = {burst hot word set, burst time interval, heat, burst factor}. 4 Experimental Results 4.1 Data Set We acquired a collection of 1,728,000 microblog from Sina weibo, sampling frequency is less than 0.57% In our experiment, the observation window T is 3*24 hours. Burst factor threshold value is the mean value of burst factor. The threshold value of co-occurrence is 0.5. As the existing Chinese microblogging burst event detection has no standard data set, and no standard comparison evaluation criterion. Therefore we refer to the traditional information retrieval precision and recall rate as the evaluation criterion. But due to the frequency of experimental data is less than 0.57%, it is impossible to find all the hot events. This paper uses only accuracy rate indicators: the ration between detected hot events from experiments associated with the number of real events and all detected burst hot events. 382 Copyright 2013 SERSC
5 In order to verify whether the bursting events detected by experiment are real ev ents, we use Baidu advanced search tool to verify the experimental events respectively. We use the abstract of the experimental events as search keywords and also limit the search time period as the time period of the events. If the search results show that three of the top five are related to the detected events, we regard the detected events as real ones. 4.2 Experimental Results Finally we detect and figure out 40 burst events. According to the judge method mentioned above, 30 of 40 are real news events. Thus this method can obtained correct detection rate of about 75%. The rest ten by manual checks found that four are microblogging commercial promotion advertising, and the other six others are noise data with hot burst word. The reasons that affect the accuracy can be the follows: First, the Sina microblogging contains many commercial promotion and noise data, some zombie fan use hash tags which include burst tags to attract user attention; all those affect the purity of experimental data. Secondly, the limits the frequency of crawler lead to crawling data sampling rate is not high, this will also affect the results. Furthermore, since microblogging text are informal and include many out of vocabulary new word,the traditional word segmentation tool can t Figure out new words from microblog and as a result of this error rate of participle in microblog text is higher than traditional one. 5 Conclusion In this paper, we proposed a framework for detection burst events in micro-blogging streams. The experimental results on a real micro-blogging data set showed that the proposed method is effective in burst events detection. However, our method does not consider the noise in micro-blogging streams. Therefore, the future work will be focused on micro-blogging noise data detection and elimination. Another future work is to take the social network properties in burst events detection. Acknowledgement. This paper is supported by the National Science Foundation of China (No and No ), and the National Science Foundation of Anhui Province (no MG117). References 1. Ritter, Alan, Oren Ezine, and Sam Clark. Open domain event extraction from twitter. In Proc. Of SIGKDD, ACM press, Copyright 2013 SERSC 383
6 2. Popescu, Ana-Maria, Marco Pennacchiotti, and Deepa Paranjpe. Extracting events and event descriptions from twitter. In Proc. Of WWW, ACM press, Mathioudakis, Michael, and Nick Koudas. Twitter monitor: trend detection over the twitter stream. In Proc. Of SIGMOD, ACM press, Weng, Jianshu, and Bu-Sung Lee. Event detection in Twitter. In Proc. Of the 5th International AAAI Conference on Weblogs and Social Media, Thelwall, M., Buckley, K., and Paltoglou, G., Sentiment in Twitter events. Journal of the American Society for Information Science and Technology, 62(2), , Qi He, Kuiyu Chang, and Ee-Peng Lim. Analyzing feature trajectories for event detection. In Proc. Of SIGIR, ACM press, ICTCLAS, Available at Copyright 2013 SERSC
Measurement of Burst Topic in Microblog
, pp.16-20 http://dx.doi.org/10.14257/astl.2014.78.04 Measurement of Burst Topic in Microblog Guozhong Dong 1, Xin Zou 2*, Wei Wang 1, Yaxue Hu 1, Guowei Shen 1, Korawit Orkphol 1,Wu Yang 1 1 Information
More informationDM-Group Meeting. Subhodip Biswas 10/16/2014
DM-Group Meeting Subhodip Biswas 10/16/2014 Papers to be discussed 1. Crowdsourcing Land Use Maps via Twitter Vanessa Frias-Martinez and Enrique Frias-Martinez in KDD 2014 2. Tracking Climate Change Opinions
More informationYahoo! Labs Nov. 1 st, Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University
Yahoo! Labs Nov. 1 st, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Motivation Modeling Social Streams Future work Motivation Modeling Social Streams
More informationDepartment of Computer Science, Guiyang University, Guiyang , GuiZhou, China
doi:10.21311/002.31.12.01 A Hybrid Recommendation Algorithm with LDA and SVD++ Considering the News Timeliness Junsong Luo 1*, Can Jiang 2, Peng Tian 2 and Wei Huang 2, 3 1 College of Information Science
More informationTerm Filtering with Bounded Error
Term Filtering with Bounded Error Zi Yang, Wei Li, Jie Tang, and Juanzi Li Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University, China {yangzi, tangjie, ljz}@keg.cs.tsinghua.edu.cn
More informationLiangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA
Rutgers, The State University of New Jersey Nov. 12, 2012 Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA Motivation Modeling Social Streams Future
More informationReal-time Sentiment-Based Anomaly Detection in Twitter Data Streams
Real-time Sentiment-Based Anomaly Detection in Twitter Data Streams Khantil Patel, Orland Hoeber, and Howard J. Hamilton Department of Computer Science University of Regina, Canada patel26k@uregina.ca,
More informationBeating Social Pulse: Understanding Information Propagation via Online Social Tagging Systems 1
Journal of Universal Computer Science, vol. 18, no. 8 (2012, 1022-1031 submitted: 16/9/11, accepted: 14/12/11, appeared: 28/4/12 J.UCS Beating Social Pulse: Understanding Information Propagation via Online
More informationNatural Language Processing. Topics in Information Retrieval. Updated 5/10
Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationA Novel Method for Mining Relationships of entities on Web
, pp.480-484 http://dx.doi.org/10.14257/astl.2016.121.87 A Novel Method for Mining Relationships of entities on Web Xinyan Huang 1,3, Xinjun Wang* 1,2, Hui Li 1, Yongqing Zheng 1 1 Shandong University
More informationLarge Scale Semi-supervised Linear SVM with Stochastic Gradient Descent
Journal of Computational Information Systems 9: 15 (2013) 6251 6258 Available at http://www.jofcis.com Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Xin ZHOU, Conghui ZHU, Sheng
More informationTime Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter
Time Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter Daichi Koike Yusuke Takahashi Takehito Utsuro Grad. Sch. Sys. & Inf. Eng., University of Tsukuba, Tsukuba, 305-8573,
More informationPredicting New Search-Query Cluster Volume
Predicting New Search-Query Cluster Volume Jacob Sisk, Cory Barr December 14, 2007 1 Problem Statement Search engines allow people to find information important to them, and search engine companies derive
More informationLatent Geographic Feature Extraction from Social Media
Latent Geographic Feature Extraction from Social Media Christian Sengstock* Michael Gertz Database Systems Research Group Heidelberg University, Germany November 8, 2012 Social Media is a huge and increasing
More informationVirtual network analysis of Wuhan 1+8 City Circle based on Sina microblog user relations
Ying et al. Open Geospatial Data, Software and Standards (2017) 2:2 DOI 10.1186/s40965-017-0017-4 Open Geospatial Data, Software and Standards ORIGINAL ARTICLE Virtual network analysis of Wuhan 1+8 City
More informationSpatial Extension of the Reality Mining Dataset
R&D Centre for Mobile Applications Czech Technical University in Prague Spatial Extension of the Reality Mining Dataset Michal Ficek, Lukas Kencl sponsored by Mobility-Related Applications Wanted! Urban
More informationGiovanni Stilo SAX! A Symbolic Representations of Time Series
Giovanni Stilo stilo@di.uniroma1.it SAX! A Symbolic Representations of Time Series 25.1750 25.2250 25.2500 25.2500 25.2750 25.3250 25.3500 25.3500 25.4000 25.4000 25.3250 25.2250 25.2000 25.1750.... 24.6250
More informationMobility Analytics through Social and Personal Data. Pierre Senellart
Mobility Analytics through Social and Personal Data Pierre Senellart Session: Big Data & Transport Business Convention on Big Data Université Paris-Saclay, 25 novembre 2015 Analyzing Transportation and
More informationBoolean and Vector Space Retrieval Models
Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1
More informationWITH the explosive growth of user generated
1158 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 5, MAY 2014 Interpreting the Public Sentiment Variations on Twitter Shulong Tan, Yang Li, Huan Sun, Ziyu Guan, Xifeng Yan, Member,
More informationTopic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1
Topic Models Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Low-Dimensional Space for Documents Last time: embedding space
More informationSpam ain t as Diverse as It Seems: Throttling OSN Spam with Templates Underneath
Spam ain t as Diverse as It Seems: Throttling OSN Spam with Templates Underneath Hongyu Gao, Yi Yang, Kai Bu, Yan Chen, Doug Downey, Kathy Lee, Alok Choudhary Northwestern University, USA Zhejiang University,
More informationSocViz: Visualization of Facebook Data
SocViz: Visualization of Facebook Data Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana Champaign Urbana, IL 61801 USA bhatele2@uiuc.edu Kyratso Karahalios Department of
More informationDecember 3, Dipartimento di Informatica, Università di Torino. Felicittà. Visualizing and Estimating Happiness in
: : Dipartimento di Informatica, Università di Torino December 3, 2013 : Outline : from social media Social media users generated contents represent the human behavior in everyday life, but... how to analyze
More informationBoolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).
Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval
More informationFrom Social User Activities to People Affiliation
2013 IEEE 13th International Conference on Data Mining From Social User Activities to People Affiliation Guangxiang Zeng 1, Ping uo 2, Enhong Chen 1 and Min Wang 3 1 University of Science and Technology
More informationGeneralisation and Multiple Representation of Location-Based Social Media Data
Generalisation and Multiple Representation of Location-Based Social Media Data Dirk Burghardt, Alexander Dunkel and Mathias Gröbe, Institute of Cartography Outline 1. Motivation VGI and spatial data from
More informationCED: Credible Early Detection of Social Media Rumors
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 CED: Credible Early Detection of Social Media Rumors arxiv:1811.04175v1 [cs.si] 10 Nov 2018 Changhe Song, Cunchao Tu, Cheng Yang, Zhiyuan Liu,
More informationExploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data
Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data Jinzhong Wang April 13, 2016 The UBD Group Mobile and Social Computing Laboratory School of Software, Dalian University
More information* Abstract. Keywords: Smart Card Data, Public Transportation, Land Use, Non-negative Matrix Factorization.
Analysis of Activity Trends Based on Smart Card Data of Public Transportation T. N. Maeda* 1, J. Mori 1, F. Toriumi 1, H. Ohashi 1 1 The University of Tokyo, 7-3-1 Hongo Bunkyo-ku, Tokyo, Japan *Email:
More informationMulti-wind Field Output Power Prediction Method based on Energy Internet and DBPSO-LSSVM
, pp.128-133 http://dx.doi.org/1.14257/astl.16.138.27 Multi-wind Field Output Power Prediction Method based on Energy Internet and DBPSO-LSSVM *Jianlou Lou 1, Hui Cao 1, Bin Song 2, Jizhe Xiao 1 1 School
More informationIntroduction to ArcGIS Maps for Office. Greg Ponto Scott Ball
Introduction to ArcGIS Maps for Office Greg Ponto Scott Ball Agenda What is Maps for Office? Platform overview What are Apps for the Office? ArcGIS Maps for Office features - Visualization - Geoenrichment
More informationTHE DESIGN AND IMPLEMENTATION OF A WEB SERVICES-BASED APPLICATION FRAMEWORK FOR SEA SURFACE TEMPERATURE INFORMATION
THE DESIGN AND IMPLEMENTATION OF A WEB SERVICES-BASED APPLICATION FRAMEWORK FOR SEA SURFACE TEMPERATURE INFORMATION HE Ya-wen a,b,c, SU Fen-zhen a, DU Yun-yan a, Xiao Ru-lin a,c, Sun Xiaodan d a. Institute
More informationSemantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing
Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationGOVERNMENT GIS BUILDING BASED ON THE THEORY OF INFORMATION ARCHITECTURE
GOVERNMENT GIS BUILDING BASED ON THE THEORY OF INFORMATION ARCHITECTURE Abstract SHI Lihong 1 LI Haiyong 1,2 LIU Jiping 1 LI Bin 1 1 Chinese Academy Surveying and Mapping, Beijing, China, 100039 2 Liaoning
More informationTopic Discovery Project Report
Topic Discovery Project Report Shunyu Yao and Xingjiang Yu IIIS, Tsinghua University {yao-sy15, yu-xj15}@mails.tsinghua.edu.cn Abstract In this report we present our implementations of topic discovery
More informationContent-based Recommendation
Content-based Recommendation Suthee Chaidaroon June 13, 2016 Contents 1 Introduction 1 1.1 Matrix Factorization......................... 2 2 slda 2 2.1 Model................................. 3 3 flda 3
More informationDISTRIBUTIONAL SEMANTICS
COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.
More informationA Probabilistic Model for Online Document Clustering with Application to Novelty Detection
A Probabilistic Model for Online Document Clustering with Application to Novelty Detection Jian Zhang School of Computer Science Cargenie Mellon University Pittsburgh, PA 15213 jian.zhang@cs.cmu.edu Zoubin
More informationMining Triadic Closure Patterns in Social Networks
Mining Triadic Closure Patterns in Social Networks Hong Huang, University of Goettingen Jie Tang, Tsinghua University Sen Wu, Stanford University Lu Liu, Northwestern University Xiaoming Fu, University
More informationUnified Modeling of User Activities on Social Networking Sites
Unified Modeling of User Activities on Social Networking Sites Himabindu Lakkaraju IBM Research - India Manyata Embassy Business Park Bangalore, Karnataka - 5645 klakkara@in.ibm.com Angshu Rai IBM Research
More informationApplication of Swarm Intelligent Algorithm Optimization Neural Network in Network Security Hui Xia1
4th International Conference on Machinery, Materials and Information Technology Applications (ICMMITA 06) Application of Swarm Intelligent Algorithm Optimization Neural Network in Network Security Hui
More informationDiscovering Geographical Topics in Twitter
Discovering Geographical Topics in Twitter Liangjie Hong, Lehigh University Amr Ahmed, Yahoo! Research Alexander J. Smola, Yahoo! Research Siva Gurumurthy, Twitter Kostas Tsioutsiouliklis, Twitter Overview
More informationBuilding a Timeline Action Network for Evacuation in Disaster
Building a Timeline Action Network for Evacuation in Disaster The-Minh Nguyen, Takahiro Kawamura, Yasuyuki Tahara, and Akihiko Ohsuga Graduate School of Information Systems, University of Electro-Communications,
More informationDiscovery Through Situational Awareness
Discovery Through Situational Awareness BRETT AMIDAN JIM FOLLUM NICK BETZSOLD TIM YIN (UNIVERSITY OF WYOMING) SHIKHAR PANDEY (WASHINGTON STATE UNIVERSITY) Pacific Northwest National Laboratory February
More informationFUSION METHODS BASED ON COMMON ORDER INVARIABILITY FOR META SEARCH ENGINE SYSTEMS
FUSION METHODS BASED ON COMMON ORDER INVARIABILITY FOR META SEARCH ENGINE SYSTEMS Xiaohua Yang Hui Yang Minjie Zhang Department of Computer Science School of Information Technology & Computer Science South-China
More informationStar-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory
Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory Bin Gao Tie-an Liu Wei-ing Ma Microsoft Research Asia 4F Sigma Center No. 49 hichun Road Beijing 00080
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents
More informationCollaborative topic models: motivations cont
Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.
More informationSOFTWARE ARCHITECTURE DESIGN OF GIS WEB SERVICE AGGREGATION BASED ON SERVICE GROUP
SOFTWARE ARCHITECTURE DESIGN OF GIS WEB SERVICE AGGREGATION BASED ON SERVICE GROUP LIU Jian-chuan*, YANG Jun, TAN Ming-jian, GAN Quan Sichuan Geomatics Center, Chengdu 610041, China Keywords: GIS; Web;
More informationA Unified Model for Stable and Temporal Topic Detection from Social Media Data
A Unified Model for Stable and Temporal Topic Detection from Social Media Data Hongzhi Yin Bin Cui Hua Lu Yuxin Huang Junjie Yao Department of Computer Science and Technology Key Laboratory of High Confidence
More informationExploring spatial decay effect in mass media and social media: a case study of China
Exploring spatial decay effect in mass media and social media: a case study of China 1. Introduction Yihong Yuan Department of Geography, Texas State University, San Marcos, TX, USA, 78666. Tel: +1(512)-245-3208
More informationModeling population growth in online social networks
Zhu et al. Complex Adaptive Systems Modeling 3, :4 RESEARCH Open Access Modeling population growth in online social networks Konglin Zhu *,WenzhongLi, and Xiaoming Fu *Correspondence: zhu@cs.uni-goettingen.de
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationUncovering News-Twitter Reciprocity via Interaction Patterns
Uncovering News-Twitter Reciprocity via Interaction Patterns Yue Ning 1 Sathappan Muthiah 1 Ravi Tandon 2 Naren Ramakrishnan 1 1 Discovery Analytics Center, Department of Computer Science, Virginia Tech
More informationMatrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang
Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space
More informationMultimedia analysis and retrieval
Multimedia analysis and retrieval from the masses, and for the masses Lexing Xie IBM T J Watson Research Center xlx@us.ibm.com ICIP-MIR workshop, Oct 12, 2008 2008 IBM Research multimedia information analysis
More informationMulti-theme Sentiment Analysis using Quantified Contextual
Multi-theme Sentiment Analysis using Quantified Contextual Valence Shifters Hongkun Yu, Jingbo Shang, MeichunHsu, Malú Castellanos, Jiawei Han Presented by Jingbo Shang University of Illinois at Urbana-Champaign
More informationJournal of Chemical and Pharmaceutical Research, 2014, 6(5): Research Article
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(5):266-270 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Anomaly detection of cigarette sales using ARIMA
More informationMining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee
Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee wwlee1@uiuc.edu September 28, 2004 Motivation IR on newsgroups is challenging due to lack of
More informationInternet Engineering Jacek Mazurkiewicz, PhD
Internet Engineering Jacek Mazurkiewicz, PhD Softcomputing Part 11: SoftComputing Used for Big Data Problems Agenda Climate Changes Prediction System Based on Weather Big Data Visualisation Natural Language
More information5 10 12 32 48 5 10 12 32 48 4 8 16 32 64 128 4 8 16 32 64 128 2 3 5 16 2 3 5 16 5 10 12 32 48 4 8 16 32 64 128 2 3 5 16 docid score 5 10 12 32 48 O'Neal averaged 15.2 points 9.2 rebounds and 1.0 assists
More informationIPSJ SIG Technical Report Vol.2014-MPS-100 No /9/25 1,a) 1 1 SNS / / / / / / Time Series Topic Model Considering Dependence to Multiple Topics S
1,a) 1 1 SNS /// / // Time Series Topic Model Considering Dependence to Multiple Topics Sasaki Kentaro 1,a) Yoshikawa Tomohiro 1 Furuhashi Takeshi 1 Abstract: This pater proposes a topic model that considers
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 19: Size Estimation & Duplicate Detection Hinrich Schütze Institute for Natural Language Processing, Universität Stuttgart 2008.07.08
More informationWhat Is Good for One City May Not Be Good for Another One: Evaluating Generalization for Tweet Classification Based on Semantic Abstraction
What Is Good for One City May Not Be Good for Another One: Evaluating Generalization for Tweet Classification Based on Semantic Abstraction Axel Schulz 1 and Frederik Janssen 2 1 Technische Universität
More informationInvestigation of Latent Semantic Analysis for Clustering of Czech News Articles
Investigation of Latent Semantic Analysis for Clustering of Czech News Articles Michal Rott, Petr Červa Laboratory of Computer Speech Processing 4. 9. 2014 Introduction Idea of article clustering Presumptions:
More informationVariable Latent Semantic Indexing
Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background
More informationCS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002
CS276A Text Information Retrieval, Mining, and Exploitation Lecture 4 15 Oct 2002 Recap of last time Index size Index construction techniques Dynamic indices Real world considerations 2 Back of the envelope
More informationN-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition
2010 11 5 N-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition 1 48-106413 Abstract Large-Vocabulary Continuous Speech Recognition(LVCSR) system has rapidly been growing today.
More informationFall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26
Fall 2016 CS646: Information Retrieval Lecture 6 Boolean Search and Vector Space Model Jiepu Jiang University of Massachusetts Amherst 2016/09/26 Outline Today Boolean Retrieval Vector Space Model Latent
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/26/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 More algorithms
More informationCrowdsourcing Semantics for Big Data in Geoscience Applications
Wright State University CORE Scholar Computer Science and Engineering Faculty Publications Computer Science and Engineering 2013 Crowdsourcing Semantics for Big Data in Geoscience Applications Thomas Narock
More informationGeographical Bias on Social Media and Geo-Local Contents System with Mobile Devices
212 45th Hawaii International Conference on System Sciences Geographical Bias on Social Media and Geo-Local Contents System with Mobile Devices Kazunari Ishida Hiroshima Institute of Technology k.ishida.p7@it-hiroshima.ac.jp
More informationSpatial Data Science. Soumya K Ghosh
Workshop on Data Science and Machine Learning (DSML 17) ISI Kolkata, March 28-31, 2017 Spatial Data Science Soumya K Ghosh Professor Department of Computer Science and Engineering Indian Institute of Technology,
More informationA Bivariate Point Process Model with Application to Social Media User Content Generation
1 / 33 A Bivariate Point Process Model with Application to Social Media User Content Generation Emma Jingfei Zhang ezhang@bus.miami.edu Yongtao Guan yguan@bus.miami.edu Department of Management Science
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationOn the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing
Computational Statistics and Data Analysis 52 (2008) 3913 3927 www.elsevier.com/locate/csda On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing Chris
More informationarxiv: v2 [cs.si] 6 Feb 2015
Noname manuscript No. (will be inserted by the editor) Multiscale Event Detection in Social Media Xiaowen Dong Dimitrios Mavroeidis Francesco Calabrese Pascal Frossard Received: date / Accepted: date arxiv:1404.7048v2
More informationConjoint Modeling of Temporal Dependencies in Event Streams. Ankur Parikh Asela Gunawardana Chris Meek
Conjoint Modeling of Temporal Dependencies in Event Streams Ankur Parikh Asela Gunawardana Chris Meek APPLICATION 1: MAKING ADVERTISING MORE RELEVANT Display Advertising Ads are targeted to page content
More informationCrowd-sourced Cartography: Measuring Socio-cognitive Distance for Urban Areas based on Crowd s Movement
Crowd-sourced Cartography: Measuring Socio-cognitive Distance for Urban Areas based on Crowd s Movement Shoko Wakamiya University of Hyogo Japan ne11n002@stshse.u-hyogo.ac.jp Ryong Lee National Institute
More informationUsing Social Media for Geodemographic Applications
Using Social Media for Geodemographic Applications Muhammad Adnan and Guy Lansley Department of Geography, University College London @gisandtech @GuyLansley Web: http://www.uncertaintyofidentity.com Outline
More informationPrediction of Citations for Academic Papers
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationAPPLYING BIG DATA TOOLS TO ACQUIRE AND PROCESS DATA ON CITIES
APPLYING BIG DATA TOOLS TO ACQUIRE AND PROCESS DATA ON CITIES JACEK MAŚLANKOWSKI, Ph.D. DEPARTMENT OF BUSINESS INFORMATICS FACULTY OF MANAGEMENT UNIVERSITY OF GDAŃSK, POLAND 1 AGENDA 2 Prerequisites Possible
More informationText Analytics (Text Mining)
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Assistant Professor Associate Director, MS
More informationAn Application of Link Prediction in Bipartite Graphs: Personalized Blog Feedback Prediction
An Application of Link Prediction in Bipartite Graphs: Personalized Blog Feedback Prediction Krisztian Buza Dpt. of Computer Science and Inf. Theory Budapest University of Techn. and Economics 1117 Budapest,
More informationSupplementary Information for Emotional persistence in online chatting communities
Supplementary Information for Emotional persistence in online chatting communities Antonios Garas, David Garcia, and Frank Schweitzer Chair of Systems Design, ETH Zurich, Kreuzplatz 5, 8032 Zurich, Switzerland
More informationP leiades: Subspace Clustering and Evaluation
P leiades: Subspace Clustering and Evaluation Ira Assent, Emmanuel Müller, Ralph Krieger, Timm Jansen, and Thomas Seidl Data management and exploration group, RWTH Aachen University, Germany {assent,mueller,krieger,jansen,seidl}@cs.rwth-aachen.de
More informationRational Spamming. Xinyu Cao MIT John R. Hauser MIT T. Tony Ke MIT Juanjuan Zhang MIT
Rational Spamming Xinyu Cao MIT xinyucao@mit.edu John R. Hauser MIT hauser@mit.edu T. Tony Ke MIT kete@mit.edu Juanjuan Zhang MIT jjzhang@mit.edu January 19, 017 Rational Spamming Abstract Advertising
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationStatistics for Engineering, 4C3/6C3 Assignment 2
Statistics for Engineering, 4C3/6C3 Assignment 2 Kevin Dunn, kevin.dunn@mcmaster.ca Due date: 23 January 2014 Assignment objectives: interpreting data visualizations; univariate data analysis Question
More informationStatistical Substring Reduction in Linear Time
Statistical Substring Reduction in Linear Time Xueqiang Lü Institute of Computational Linguistics Peking University, Beijing lxq@pku.edu.cn Le Zhang Institute of Computer Software & Theory Northeastern
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element
More informationRanked Retrieval (2)
Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF
More informationVirtual Beach Making Nowcast Predictions
Virtual Beach 3.0.6 Making Nowcast Predictions In this module you will learn how to: A. Create a real-time connection to Web data services through EnDDaT B. Download real-time data to make a Nowcast prediction
More informationExploring Urban Areas of Interest. Yingjie Hu and Sathya Prasad
Exploring Urban Areas of Interest Yingjie Hu and Sathya Prasad What is Urban Areas of Interest (AOIs)? What is Urban Areas of Interest (AOIs)? Urban AOIs exist in people s minds and defined by people s
More informationGLOBAL NEEDS TO BE ADDRESSED IN A STRATEGIC PLAN FOR SPACE WEATHER. Developing products and services in space weather: Space Weather Channel in China
WORLD METEOROLOGICAL ORGANIZATION COMMISSION FOR BASIC SYSTEMS COMMISSION FOR AERONAUTICAL METEOROLOGY INTER-PROGRAMME COORDINATION TEAM ON SPACE WEATHER ICTSW-4/Doc. 10.2(2) (25.XI.2013) ITEM: 10.2 FOURTH
More informationIdentification of Bursts in a Document Stream
Identification of Bursts in a Document Stream Toshiaki FUJIKI 1, Tomoyuki NANNO 1, Yasuhiro SUZUKI 1 and Manabu OKUMURA 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute
More information