A Bivariate Point Process Model with Application to Social Media User Content Generation

Size: px
Start display at page:

Download "A Bivariate Point Process Model with Application to Social Media User Content Generation"

Transcription

1 1 / 33 A Bivariate Point Process Model with Application to Social Media User Content Generation Emma Jingfei Zhang ezhang@bus.miami.edu Yongtao Guan yguan@bus.miami.edu Department of Management Science The Miami Business School, University of Miami

2 Data Description: Sina Weibo Data 2 / 33 Source: Sina Weibo, the largest twitter-type online social media in China. The dataset contains posts from 5,913 followers of the official Beijing University Guanghua MBA Weibo account. For each user, all of his/her posts during the period of Jan 1st to Jan 30th, 2014, including the time stamp of each post, have been collected. Each post can be a post with original contents or a repost.

3 Data Description: Trump s Twitter Data 3 / 33 Source: Twitter data collected from Donald Trump (@realdonaldtrump) from Jan 2013 to Apr Twitter archive of Donald Trump can be downloaded from Twitter shows the device used for each tweet; devices may be Android, Web Client, iphone, and others. We consider the tweets posted by using a Android device before and an iphone after the election. This results in a total of 17,518 tweets; the average number of monthly tweets is 278. Each tweet is either an original tweet or a retweet.

4 Data Description: Sina Weibo Data 4 / 33 User 3 User 2 User 1 01/01 01/05 01/10 01/15 01/20 01/25 01/30 date Figure : The posting times of three users.

5 Data Description: Sina Weibo Data 5 / e e e e e e hour Figure : Average empirical pair correlation function.

6 Observations from Data 6 / 33 A user s posting activity may alternate between active and inactive states. During an active state, the user may publish one or more posts (often with short inter-post time distances). During an inactive state, no post is being produced until the start of the next active state. There may be daily patterns in posting times. It s a bivariate point process (i.e., posts and reposts).

7 Graphical Illustration: Univariate Process 7 / 33 Episodes: clusters of posting time locations. Adjacent episodes are nonoverlapping and separated by the inactive period in between.

8 Graphical Illustration: Bivariate Process 8 / 33 post segment repost segment post segment episode Inactive episode Each episode contains subepisodes of posts and reposts. Posts (reposts) tend to be followed by posts (reposts). Reposts may be more clustered than posts. Number of reposts may be related to number of followees.

9 Clustered Point Process 9 / 33 Goal: Model the clustered posting times for social media posting time data (do not distinguish between posts and reposts for now). Existing Methods: Hawkes process The Neyman-Scott process Barlett-Lewis process Interrupted poisson process We propose a new class of clustered temporal point processes that is easy to interpret and also can be easily generalized to the bivariate case.

10 Model Formulation 10 / 33 For each episode, the parent event generates a Poisson number of offspring events with mean µ. Each offspring location, relative to the location of the previous event in the same cluster, follows an exponential distribution with parameter ρ. Once all the events in an episode have been observed, the parent event in the following episode is generated following a hazard function λ(t; β).

11 Model Formulation 11 / 33 By observing the daily cyclic pattern in the average pair correlation function, we may assume that p λ(t; β) = exp β 0 + [β j1 cos(ω j t) + β j2 sin(ω j t)] j=1 where ω j = 2jπ and β = {β 0, β j1, β j2 : j = 1,, p}. Other nonparametric models can also be used.

12 Model Formulation 12 / 33 Define event time locations {T l : l = 1,..., N} and indicator variables {Y l : l = 1,..., N}, where Y l = 1 denote parent events and Y l = 0 offspring events. Let T 0 = 0. Define the gap time D l = T l T l 1, l = 1,, N. Let f l0 (x) and f l1 (x) be the probability density functions of D l given that Y l = 0 and Y l = 1. Assume f l0 (x) = ρ exp( ρx), and f l1 (x) = λ(t l 1 + x; β) exp [ tl 1 +x t l 1 ] λ(t; β)dt.

13 Model Formulation 13 / 33 Assume the first event is a parent event and all events in the last episode are contained in [0, T ]. The complete-data likelihood can then be written as L(θ; t, y) = n l=1 m=0 1 [f ] [ ] k lm (d l ; θ) I(y l =m) P(N i = n i ) P(D n+1 > T t n ), where D n+1 is the gap time between t n and the next parent event, P(N i = n i ) = exp( µ)µn i, n i! and P(D n+1 > T t n ) = exp [ i=1 T t n λ(t; β)dt ].

14 Composite Likelihood Estimation 14 / 33 The observed-data likelihood is y L(θ; t, y), where the summation is over all 2 n possibilities of y!!! Divide W = [0, T ] into J non-overlapping unit windows of length s, i.e., W = J j=1 W j where W j = [(j 1)s, js). As before, we assume The first event in W j is a parent event, All events in the last episode of W j are contained in W j. Define t j = {t i : t i W j } and y j = {y i : t i W j }. Then the observed-data likelihood on W j is y j L(θ; t j, y j ). We estimate θ by maximizing the composite likelihood J L(θ; t) = L(θ; t j, y j ). j=1 yj

15 Composite Likelihood Estimation 15 / 33 Each summation in the CLE is over 2 n j terms where n j is the number of events in W j. Note that J j=1 2n j << 2 n so significant computational gains can be achieved. There is a potential bias problem since The first event in W j may not be a parent event, Not all events in the last episode of W j are contained in W j. The bias problem can be mitigated if we choose the blocks wisely. Convergence can be a problem since multiple parameters need to be estimated simultaneously and the likelihood surface is often quite flat.

16 A Composite Likelihood EM Algorithm 16 / 33 Let T j and Y j be the random version of t j and y j. In the E-Step, we take expectation of the log likelihood l(θ; t j, Y j ) with respect to the conditional distribution of Y j T j = t j, ˆθ prev, i.e., Q j (θ ˆθ prev ) = E Yj T j =t j, ˆθ prev l(θ; t j, Y j ). Define Q(θ ˆθ prev ) = J Q j (θ ˆθ prev ). j=1 In the M-step, Q(θ ˆθ prev ) is maximized with respect to θ.

17 A Composite Likelihood EM Algorithm 17 / 33 For the expectation, we need to calculate for t l W j, P θ (Y l = m T j = t j ) which is y j y l =m P θ (Y l = m T j = t j ) = L(θ; t j, y j ). y j L(θ; t j, y j ) If there are a large number of events in W j, we employ a standard Metropolis- Hasting algorithm to sample from the conditional distribution Y j T j = t j, θ for the E-step. Closed form expressions can be obtained for ˆθ (except for ˆβ) in the M-step. Convergence is no issue.

18 A Composite Likelihood EM Algorithm 18 / 33 Theorem The log-composite likelihood l(θ; t) = log L(θ; t) satisfies l(θ p ; t) l(θ p 1 ; t), p = 1, 2,..., where θ p is the pth update from the E-M algorithm. The theorem guarantees that log-composite likelihood is nondecreasing at each EM iteration. The convergence of ˆθ p to a stationary point as p is guaranteed by Theorem 2 in Wu (1983). Standard techniques such as running the EM algorithm from multiple starting point can help locate the global maximum. Consistency and asymptotic normality can be established for the global maximum (assuming the model is right).

19 Extension to Bivariate Case 19 / 33 For each episode, there are a Poisson number of subepisodes with mean γ. Post and repost episodes alternate. The first subepisode is post with probability α. There are a Poisson number of offspring in each post (repost) subepisode with mean µ 1 (µ 0 ). For each offspring in a post (repost) subepisode, its location relative to that of the previous event in the same episode follows an exponential distribution with parameter ρ 1 (ρ 0 ). Once all the events in an episode have been observed, the parent event in the following episode is generated following a hazard function λ(t; β). The composite likelihood E-M algorithm can be modified to fit the model.

20 Application to Trump s Twitter Data 20 / 33 α γ µ 1 µ ρ ρ number of tweets per episode hour episode length Figure : Parameters estimated from Donald Trump s monthly Twitter data. The two red dashed lines mark June 2015 (candidacy announcement) and Jan 2017 (assumes office), respectively.

21 Figure : Estimated parent event hazard functions from Donald Trump s monthly Twitter data. The two red dashed lines mark June 2015 (candidacy announcement) and Jan 2017 (assumes office), respectively. 21 / 33

22 / Figure : Goodness of fit plots of the model fitted for Jan From left to right are the envelop plot (first plot) with the upper and lower envelopes marked in red dashed lines, goodness of fit plots for the original offspring post (second plot), offspring repost (third plot) and parent (last plot) inter-event distances. Red solid lines are calculated from cdf of exponential distributions. The grey bands are the 95% confidence intervals.

23 Application to Sina Weibo Data 23 / 33 User 3 User 2 User 1 01/01 01/05 01/10 01/15 01/20 01/25 01/30 date Figure : The posting times of three users.

24 24 / 33 α γ µ 1 µ 0 ρ 1 ρ 0 User (0.008) (0.004) (0.010) (0.014) (7.166) (6.124) User (0.009) (0.006) (0.010) (0.010) (13.013) (21.749) User (0.006) (0.008) (0.013) (0.012) (5.882) (7.477) Table : Estimated α, γ, µ 1, µ 0, ρ 1, ρ 0 of Users 1, 2 and 3.

25 Application to Sina Weibo Data 25 / 33 intensity User 1 User 2 User 3 12 am 12 pm 12 am time Figure : Parent hazard functions of Users 1, 2 and 3.

26 Application to Sina Weibo Data 26 / 33 mean function first eigenfunction am 12pm 12am second eigenfunction 12am 12pm 12am third eigenfunction am 12pm 12am 12am 12pm 12am Figure : Plots of the mean and first three eigenfunctions of the estimated daily parent hazard functions.

27 Characterize Sina Weibo User Behavior 27 / % 26.05% 66.6% 4.2% 20.4% 75.4% 3.2% 15.6% 81.2% Figure : Groups in the average daily parent hazard (left plot), average number of posts per episode (middle plot) and average length (in hours) of an episode (right plots). The percentages at the bottom of the boxplots show the percentage of users in each group.

28 Social Effect on Users of Sina Weibo 28 / 33 For each Sina Weibo user, we were also able to collect the number of accounts the user was following (n ) and the number of accounts that were following this user (n ). We find that there is a stronger correlation between n and µ 0 (r = 0.205). These observations indicate that users who follow more accounts are more likely to have more reposts. One explanation could be that the more accounts a user follows, the more content they can repost from. Another plausible explanation is that the followers in the social media tend to repost more.

29 Social Effect on Users of Sina Weibo 29 / 33 We find that the popular users, i.e., those whose accounts have many followers, tend to post more original content. They are also more likely to initiate their Weibo engagement by posting original content. We find that users who have strong social ties, i.e., have many followers or follow many others, are more likely to use Weibo more often. We find that users with many followers are more likely to spend more time on Weibo once they start an episode of engagement.

30 Simulation Study 30 / 33 We set the observation window length T = 100, α = 0.6. With each parameter configuration, we simulate 100 event trajectories. We set the parent event hazard function as λ(t; β) = exp [β 01 + β 11 cos(2πt) + β 12 sin(2πt)]. For estimation, we use unit window length s = 1 or 5. To model λ(t, β), we consider both the true model and the nonparametric cyclic B-spline model. For the latter, we use the knot vector (0, 0.2, 0.4, 0.6, 0.8, 1).

31 Simulation Study 31 / 33

32 Simulation Study (γ, µ 1, µ 0, ρ 1, ρ 0 ) (β 01, β 11, β 12 ; s) α γ µ 1 µ 0 ρ 1 ρ 0 (0.5,0.5,0.5,10,15) (-2,-2,2; 5) (0.010) (0.013) (0.014) (0.014) (0.261) (0.365) (0.5,0.5,0.5,10,15) (-3,-3,3; 5) (0.007) (0.011) (0.012) (0.014) (0.188) (0.284) (1.0,0.5,0.5,10,15) (-2,-2,2; 5) (0.009) (0.017) (0.011) (0.012) (0.176) (0.257) (0.5,1.0,1.0,10,15) (-2,-2,2; 5) (0.008) (0.010) (0.016) (0.017) (0.171) (0.309) (0.5,0.5,0.5,20,30) (-2,-2,2; 5) (0.008) (0.012) (0.012) (0.013) (0.460) (0.717) (0.5,0.5,0.5,10,15) (-2,-2,2; 1) (0.008) (0.010) (0.014) (0.014) (0.271) (0.309) 32 / 33

33 Summary 33 / 33 We propose a new clustered temporal point process model to model user generated posts on social media. The proposed model captures both inhomogeneity in the initial posting time and the clustering pattern in the subsequent posts following the initial post. The proposed goodness of fit procedure shows that the proposed model fits the data reasonably well. The fitted models provide valuable insights on a user s content generating behavior.

New Bayesian methods for model comparison

New Bayesian methods for model comparison Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Efficient Monitoring Algorithm for Fast News Alert

Efficient Monitoring Algorithm for Fast News Alert Efficient Monitoring Algorithm for Fast News Alert Ka Cheung Richard Sia kcsia@cs.ucla.edu UCLA Backgroud Goal Monitor and collect information from the Web Answer most of users queries Challenges Billions

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

AN EM ALGORITHM FOR HAWKES PROCESS

AN EM ALGORITHM FOR HAWKES PROCESS AN EM ALGORITHM FOR HAWKES PROCESS Peter F. Halpin new york university December 17, 2012 Correspondence should be sent to Dr. Peter F. Halpin 246 Greene Street, Office 316E New York, NY 10003-6677 E-Mail:

More information

On Model Fitting Procedures for Inhomogeneous Neyman-Scott Processes

On Model Fitting Procedures for Inhomogeneous Neyman-Scott Processes On Model Fitting Procedures for Inhomogeneous Neyman-Scott Processes Yongtao Guan July 31, 2006 ABSTRACT In this paper we study computationally efficient procedures to estimate the second-order parameters

More information

EM for Spherical Gaussians

EM for Spherical Gaussians EM for Spherical Gaussians Karthekeyan Chandrasekaran Hassan Kingravi December 4, 2007 1 Introduction In this project, we examine two aspects of the behavior of the EM algorithm for mixtures of spherical

More information

On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data

On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data Yehua Li Department of Statistics University of Georgia Yongtao

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

An Assessment of Crime Forecasting Models

An Assessment of Crime Forecasting Models An Assessment of Crime Forecasting Models FCSM Research and Policy Conference Washington DC, March 9, 2018 HAUTAHI KINGI, CHRIS ZHANG, BRUNO GASPERINI, AARON HEUSER, MINH HUYNH, JAMES MOORE Introduction

More information

Two step estimation for Neyman-Scott point process with inhomogeneous cluster centers. May 2012

Two step estimation for Neyman-Scott point process with inhomogeneous cluster centers. May 2012 Two step estimation for Neyman-Scott point process with inhomogeneous cluster centers Tomáš Mrkvička, Milan Muška, Jan Kubečka May 2012 Motivation Study of the influence of covariates on the occurrence

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

Doubly Inhomogeneous Cluster Point Processes

Doubly Inhomogeneous Cluster Point Processes Doubly Inhomogeneous Cluster Point Processes Tomáš Mrkvička, Samuel Soubeyrand May 2016 Abscissa Motivation - Group dispersal model It is dispersal model, where particles are released in groups by a single

More information

1 Degree distributions and data

1 Degree distributions and data 1 Degree distributions and data A great deal of effort is often spent trying to identify what functional form best describes the degree distribution of a network, particularly the upper tail of that distribution.

More information

Stance classification and Diffusion Modelling

Stance classification and Diffusion Modelling Dr. Srijith P. K. CSE, IIT Hyderabad Outline Stance Classification 1 Stance Classification 2 Stance Classification in Twitter Rumour Stance Classification Classify tweets as supporting, denying, questioning,

More information

1 A Tutorial on Hawkes Processes

1 A Tutorial on Hawkes Processes 1 A Tutorial on Hawkes Processes for Events in Social Media arxiv:1708.06401v2 [stat.ml] 9 Oct 2017 Marian-Andrei Rizoiu, The Australian National University; Data61, CSIRO Young Lee, Data61, CSIRO; The

More information

Generalized additive modelling of hydrological sample extremes

Generalized additive modelling of hydrological sample extremes Generalized additive modelling of hydrological sample extremes Valérie Chavez-Demoulin 1 Joint work with A.C. Davison (EPFL) and Marius Hofert (ETHZ) 1 Faculty of Business and Economics, University of

More information

DM-Group Meeting. Subhodip Biswas 10/16/2014

DM-Group Meeting. Subhodip Biswas 10/16/2014 DM-Group Meeting Subhodip Biswas 10/16/2014 Papers to be discussed 1. Crowdsourcing Land Use Maps via Twitter Vanessa Frias-Martinez and Enrique Frias-Martinez in KDD 2014 2. Tracking Climate Change Opinions

More information

Computing the MLE and the EM Algorithm

Computing the MLE and the EM Algorithm ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a

More information

Quasi-likelihood Scan Statistics for Detection of

Quasi-likelihood Scan Statistics for Detection of for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Point Processes. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part II)

Point Processes. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part II) Title: patial tatistics for Point Processes and Lattice Data (Part II) Point Processes Tonglin Zhang Outline Outline imulated Examples Interesting Problems Analysis under tationarity Analysis under Nonstationarity

More information

Mining Triadic Closure Patterns in Social Networks

Mining Triadic Closure Patterns in Social Networks Mining Triadic Closure Patterns in Social Networks Hong Huang, University of Goettingen Jie Tang, Tsinghua University Sen Wu, Stanford University Lu Liu, Northwestern University Xiaoming Fu, University

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Problem (INFORMAL). Given a dynamic graph, find a set of possibly overlapping temporal subgraphs to concisely describe the given dynamic graph in a

Problem (INFORMAL). Given a dynamic graph, find a set of possibly overlapping temporal subgraphs to concisely describe the given dynamic graph in a Outlines TimeCrunch: Interpretable Dynamic Graph Summarization by Neil Shah et. al. (KDD 2015) From Micro to Macro: Uncovering and Predicting Information Cascading Process with Behavioral Dynamics by Linyun

More information

Appendix F. Computational Statistics Toolbox. The Computational Statistics Toolbox can be downloaded from:

Appendix F. Computational Statistics Toolbox. The Computational Statistics Toolbox can be downloaded from: Appendix F Computational Statistics Toolbox The Computational Statistics Toolbox can be downloaded from: http://www.infinityassociates.com http://lib.stat.cmu.edu. Please review the readme file for installation

More information

Inferring Latent Social Networks from Stock Holdings. Manual for the EM Algorithm

Inferring Latent Social Networks from Stock Holdings. Manual for the EM Algorithm Inferring Latent Social Networks from Stock Holdings Manual for the EM Algorithm Harrison Hong Jiangmin Xu September, 2017 Columbia University, NBER, CAFR (e-mail: hh2679@columbia.edu), Guanghua School

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL Jing (Selena) He Department of Computer Science, Kennesaw State University Shouling Ji,

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models

More information

Statistical Analysis of Spatio-temporal Point Process Data. Peter J Diggle

Statistical Analysis of Spatio-temporal Point Process Data. Peter J Diggle Statistical Analysis of Spatio-temporal Point Process Data Peter J Diggle Department of Medicine, Lancaster University and Department of Biostatistics, Johns Hopkins University School of Public Health

More information

Bayesian Inference for Clustered Extremes

Bayesian Inference for Clustered Extremes Newcastle University, Newcastle-upon-Tyne, U.K. lee.fawcett@ncl.ac.uk 20th TIES Conference: Bologna, Italy, July 2009 Structure of this talk 1. Motivation and background 2. Review of existing methods Limitations/difficulties

More information

Hypothesis testing: theory and methods

Hypothesis testing: theory and methods Statistical Methods Warsaw School of Economics November 3, 2017 Statistical hypothesis is the name of any conjecture about unknown parameters of a population distribution. The hypothesis should be verifiable

More information

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm

More information

ESTIMATING FUNCTIONS FOR INHOMOGENEOUS COX PROCESSES

ESTIMATING FUNCTIONS FOR INHOMOGENEOUS COX PROCESSES ESTIMATING FUNCTIONS FOR INHOMOGENEOUS COX PROCESSES Rasmus Waagepetersen Department of Mathematics, Aalborg University, Fredrik Bajersvej 7G, DK-9220 Aalborg, Denmark (rw@math.aau.dk) Abstract. Estimation

More information

Lecture 2 APPLICATION OF EXREME VALUE THEORY TO CLIMATE CHANGE. Rick Katz

Lecture 2 APPLICATION OF EXREME VALUE THEORY TO CLIMATE CHANGE. Rick Katz 1 Lecture 2 APPLICATION OF EXREME VALUE THEORY TO CLIMATE CHANGE Rick Katz Institute for Study of Society and Environment National Center for Atmospheric Research Boulder, CO USA email: rwk@ucar.edu Home

More information

Model Based Clustering of Count Processes Data

Model Based Clustering of Count Processes Data Model Based Clustering of Count Processes Data Tin Lok James Ng, Brendan Murphy Insight Centre for Data Analytics School of Mathematics and Statistics May 15, 2017 Tin Lok James Ng, Brendan Murphy (Insight)

More information

Rational Spamming. Xinyu Cao MIT John R. Hauser MIT T. Tony Ke MIT Juanjuan Zhang MIT

Rational Spamming. Xinyu Cao MIT John R. Hauser MIT T. Tony Ke MIT Juanjuan Zhang MIT Rational Spamming Xinyu Cao MIT xinyucao@mit.edu John R. Hauser MIT hauser@mit.edu T. Tony Ke MIT kete@mit.edu Juanjuan Zhang MIT jjzhang@mit.edu January 19, 017 Rational Spamming Abstract Advertising

More information

arxiv: v1 [cs.si] 15 Nov 2018

arxiv: v1 [cs.si] 15 Nov 2018 MULTIVARIATE SPATIOTEMPORAL HAWKES PROCESSES AND NETWORK RECONSTRUCTION BAICHUAN YUAN, HAO LI, ANDREA L. BERTOZZI, P. JEFFREY BRANTINGHAM, AND MASON A. PORTER arxiv:1811.06321v1 [cs.si] 15 Nov 2018 Abstract.

More information

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient

More information

Empirical Bayes Unfolding of Elementary Particle Spectra at the Large Hadron Collider

Empirical Bayes Unfolding of Elementary Particle Spectra at the Large Hadron Collider Empirical Bayes Unfolding of Elementary Particle Spectra at the Large Hadron Collider Mikael Kuusela Institute of Mathematics, EPFL Statistics Seminar, University of Bristol June 13, 2014 Joint work with

More information

EVA Tutorial #2 PEAKS OVER THRESHOLD APPROACH. Rick Katz

EVA Tutorial #2 PEAKS OVER THRESHOLD APPROACH. Rick Katz 1 EVA Tutorial #2 PEAKS OVER THRESHOLD APPROACH Rick Katz Institute for Mathematics Applied to Geosciences National Center for Atmospheric Research Boulder, CO USA email: rwk@ucar.edu Home page: www.isse.ucar.edu/staff/katz/

More information

Lecture 9 Point Processes

Lecture 9 Point Processes Lecture 9 Point Processes Dennis Sun Stats 253 July 21, 2014 Outline of Lecture 1 Last Words about the Frequency Domain 2 Point Processes in Time and Space 3 Inhomogeneous Poisson Processes 4 Second-Order

More information

Time-Sensitive Dirichlet Process Mixture Models

Time-Sensitive Dirichlet Process Mixture Models Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce

More information

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

TEORIA BAYESIANA Ralph S. Silva

TEORIA BAYESIANA Ralph S. Silva TEORIA BAYESIANA Ralph S. Silva Departamento de Métodos Estatísticos Instituto de Matemática Universidade Federal do Rio de Janeiro Sumário Numerical Integration Polynomial quadrature is intended to approximate

More information

Discovering Geographical Topics in Twitter

Discovering Geographical Topics in Twitter Discovering Geographical Topics in Twitter Liangjie Hong, Lehigh University Amr Ahmed, Yahoo! Research Alexander J. Smola, Yahoo! Research Siva Gurumurthy, Twitter Kostas Tsioutsiouliklis, Twitter Overview

More information

Statistical Properties of Marsan-Lengliné Estimates of Triggering Functions for Space-time Marked Point Processes

Statistical Properties of Marsan-Lengliné Estimates of Triggering Functions for Space-time Marked Point Processes Statistical Properties of Marsan-Lengliné Estimates of Triggering Functions for Space-time Marked Point Processes Eric W. Fox, Ph.D. Department of Statistics UCLA June 15, 2015 Hawkes-type Point Process

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

Lecture 10 Spatio-Temporal Point Processes

Lecture 10 Spatio-Temporal Point Processes Lecture 10 Spatio-Temporal Point Processes Dennis Sun Stats 253 July 23, 2014 Outline of Lecture 1 Review of Last Lecture 2 Spatio-temporal Point Processes 3 The Spatio-temporal Poisson Process 4 Modeling

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

A Conditional Approach to Modeling Multivariate Extremes

A Conditional Approach to Modeling Multivariate Extremes A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying

More information

Adjusted Empirical Likelihood for Long-memory Time Series Models

Adjusted Empirical Likelihood for Long-memory Time Series Models Adjusted Empirical Likelihood for Long-memory Time Series Models arxiv:1604.06170v1 [stat.me] 21 Apr 2016 Ramadha D. Piyadi Gamage, Wei Ning and Arjun K. Gupta Department of Mathematics and Statistics

More information

Temporal Point Processes the Conditional Intensity Function

Temporal Point Processes the Conditional Intensity Function Temporal Point Processes the Conditional Intensity Function Jakob G. Rasmussen Department of Mathematics Aalborg University Denmark February 8, 2010 1/10 Temporal point processes A temporal point process

More information

Control Variates for Markov Chain Monte Carlo

Control Variates for Markov Chain Monte Carlo Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability

More information

Modeling Recurrent Events in Panel Data Using Mixed Poisson Models

Modeling Recurrent Events in Panel Data Using Mixed Poisson Models Modeling Recurrent Events in Panel Data Using Mixed Poisson Models V. Savani and A. Zhigljavsky Abstract This paper reviews the applicability of the mixed Poisson process as a model for recurrent events

More information

Point process models for earthquakes with applications to Groningen and Kashmir data

Point process models for earthquakes with applications to Groningen and Kashmir data Point process models for earthquakes with applications to Groningen and Kashmir data Marie-Colette van Lieshout colette@cwi.nl CWI & Twente The Netherlands Point process models for earthquakes with applications

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Burstiness Scale: A Parsimonious Model for Characterizing Random Series of Events

Burstiness Scale: A Parsimonious Model for Characterizing Random Series of Events Burstiness Scale: A Parsimonious Model for Characterizing Random Series of Events Rodrigo A S Alves Departament of Applied Social Sciences CEFET-MG rodrigo@dcsa.cefetmg.br Renato Assunção Department of

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Multivariate Capability Analysis Using Statgraphics. Presented by Dr. Neil W. Polhemus

Multivariate Capability Analysis Using Statgraphics. Presented by Dr. Neil W. Polhemus Multivariate Capability Analysis Using Statgraphics Presented by Dr. Neil W. Polhemus Multivariate Capability Analysis Used to demonstrate conformance of a process to requirements or specifications that

More information

BUSI 460 Suggested Answers to Selected Review and Discussion Questions Lesson 7

BUSI 460 Suggested Answers to Selected Review and Discussion Questions Lesson 7 BUSI 460 Suggested Answers to Selected Review and Discussion Questions Lesson 7 1. The definitions follow: (a) Time series: Time series data, also known as a data series, consists of observations on a

More information

Jesper Møller ) and Kateřina Helisová )

Jesper Møller ) and Kateřina Helisová ) Jesper Møller ) and ) ) Aalborg University (Denmark) ) Czech Technical University/Charles University in Prague 5 th May 2008 Outline 1. Describing model 2. Simulation 3. Power tessellation of a union of

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

Mathematical statistics

Mathematical statistics October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation

More information

Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Interactive GIS in Veterinary Epidemiology Technology & Application in a Veterinary Diagnostic Lab

Interactive GIS in Veterinary Epidemiology Technology & Application in a Veterinary Diagnostic Lab Interactive GIS in Veterinary Epidemiology Technology & Application in a Veterinary Diagnostic Lab Basics GIS = Geographic Information System A GIS integrates hardware, software and data for capturing,

More information

Chapter 4. Theory of Tests. 4.1 Introduction

Chapter 4. Theory of Tests. 4.1 Introduction Chapter 4 Theory of Tests 4.1 Introduction Parametric model: (X, B X, P θ ), P θ P = {P θ θ Θ} where Θ = H 0 +H 1 X = K +A : K: critical region = rejection region / A: acceptance region A decision rule

More information

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros Web-based Supplementary Material for A Two-Part Joint Model for the Analysis of Survival and Longitudinal Binary Data with excess Zeros Dimitris Rizopoulos, 1 Geert Verbeke, 1 Emmanuel Lesaffre 1 and Yves

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Discovering Topical Interactions in Text-based Cascades using Hidden Markov Hawkes Processes

Discovering Topical Interactions in Text-based Cascades using Hidden Markov Hawkes Processes Discovering Topical Interactions in Text-based Cascades using Hidden Markov Hawkes Processes Jayesh Choudhari, Anirban Dasgupta IIT Gandhinagar, India Email: {choudhari.jayesh, anirbandg}@iitgn.ac.in Indrajit

More information

Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes

Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes Philip Jonathan Shell Technology Centre Thornton, Chester philip.jonathan@shell.com Paul Northrop University College

More information

Exploring spatial decay effect in mass media and social media: a case study of China

Exploring spatial decay effect in mass media and social media: a case study of China Exploring spatial decay effect in mass media and social media: a case study of China 1. Introduction Yihong Yuan Department of Geography, Texas State University, San Marcos, TX, USA, 78666. Tel: +1(512)-245-3208

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

Semi-parametric estimation of non-stationary Pickands functions

Semi-parametric estimation of non-stationary Pickands functions Semi-parametric estimation of non-stationary Pickands functions Linda Mhalla 1 Joint work with: Valérie Chavez-Demoulin 2 and Philippe Naveau 3 1 Geneva School of Economics and Management, University of

More information

A Framework of Detecting Burst Events from Micro-blogging Streams

A Framework of Detecting Burst Events from Micro-blogging Streams , pp.379-384 http://dx.doi.org/10.14257/astl.2013.29.78 A Framework of Detecting Burst Events from Micro-blogging Streams Kaifang Yang, Yongbo Yu, Lizhou Zheng, Peiquan Jin School of Computer Science and

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

Empirical likelihood and self-weighting approach for hypothesis testing of infinite variance processes and its applications

Empirical likelihood and self-weighting approach for hypothesis testing of infinite variance processes and its applications Empirical likelihood and self-weighting approach for hypothesis testing of infinite variance processes and its applications Fumiya Akashi Research Associate Department of Applied Mathematics Waseda University

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

Statistical Models for Defective Count Data

Statistical Models for Defective Count Data Statistical Models for Defective Count Data Gerhard Neubauer a, Gordana -Duraš a, and Herwig Friedl b a Statistical Applications, Joanneum Research, Graz, Austria b Institute of Statistics, University

More information

Mathematical statistics

Mathematical statistics October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That Statistics Lecture 2 August 7, 2000 Frank Porter Caltech The plan for these lectures: The Fundamentals; Point Estimation Maximum Likelihood, Least Squares and All That What is a Confidence Interval? Interval

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Modeling population growth in online social networks

Modeling population growth in online social networks Zhu et al. Complex Adaptive Systems Modeling 3, :4 RESEARCH Open Access Modeling population growth in online social networks Konglin Zhu *,WenzhongLi, and Xiaoming Fu *Correspondence: zhu@cs.uni-goettingen.de

More information