AN EM ALGORITHM FOR HAWKES PROCESS

Size: px
Start display at page:

Download "AN EM ALGORITHM FOR HAWKES PROCESS"

Transcription

1 AN EM ALGORITHM FOR HAWKES PROCESS Peter F. Halpin new york university December 17, 2012 Correspondence should be sent to Dr. Peter F. Halpin 246 Greene Street, Office 316E New York, NY Phone: (212) Fax: (212) Webpage:

2 Psychometrika Submission December 17, AN EM ALGORITHM FOR HAWKES PROCESS Abstract This manuscript addresses the EM algorithm developed in Halpin & De Boeck (in press). The runtime of the algorithm grows quadratically in the number of observations, making its application to large data sets impractical. A strategy for improving efficiency is introduced, and this results in linear growth for many applications. The performance of the modified algorithm is assessed using data simulation. Key words: Hawkes process; EM algorithm; maximum likelihood; runtime

3 Psychometrika Submission December 17, Introduction Halpin & De Boeck (in press) considered the time series analysis of bivariate event data in the context of dyadic interaction. They proposed the use of point processes, and in particular Hawkes process, as way to capture the temporal dependence between the actions of two individuals. Estimation was based on the so-called branching structure representation of Hawkes process, which they showed to be amenable to estimation via the EM algorithm (see also Veen & Schoenberg, 2008). Unfortunately, the runtime of the algorithm grows quadratically in the number observations, making its application to large data sets impractical. The present paper provides a modification of the original algorithm that substantially improves its runtime. The modification reduces the number of computations in the algorithm by tolerating a specified degree of rounding error, and this results in linear growth for many applications. The next section outlines Hawkes process in sufficient detail for this paper to be self-contained and gives an intuitive description of the problem to be addressed. The subsequent section presents the modification to the EM algorithm and illustrates some cases where this yields linear growth. The final section uses data simulation to arrive at a magnitude of rounding error that has a negligible effect on parameter recovery. Hawkes Process Under mild conditions, a point process can be uniquely defined in terms of its conditional intensity function (CIF). The main reason for specifying a point process in terms of its CIF is that this leads directly to an expression for its likelihood. A general form for the CIF is λ(t) = lim 0 E(M{(t, t + )} H t ) (1)

4 Psychometrika Submission December 17, where M{(a, b)} is random counting measure representing the number of events (i.e., isolated points) falling in the interval (a, b), E(M{(a, b)}) is the expected value, and H t is the σ-algebra generated by the time points t k, k N, occurring before time t R+ (see Daley & Vere-Jones, 2003). In this paper it is assumed that the probability of multiple events occurring simultaneously is negligible, in which case M is said to be orderly. Then for fixed t and sufficiently small values of, λ(t) is an approximation to the bernoulli probability of an event occurring in the interval (t, t + ), conditional on all of the events happening before time t. In applications, this means we are concerned with how the probability of an event changes over continuous time as a function previous events. Point processes extend immediately to the multivariate case. M{(a, b)} is then vector-valued and each univariate margin gives the number of a different type of event occurring in the time period (a, b). Although Halpin and De Boeck (in press) considered a bivariate model, this paper focusses on the univariate case since the problem to be addressed can be most simply explained in that situation. The CIF of Hawkes process can be specified as a linear causal filter: λ(t) = µ + t 0 φ(t s) dm(s). (2) The interpretation of equation (2) is unpacked in the following three points. 1. µ > 0 is a baseline, which can be a function of time but is here treated as a constant. 2. φ(u) is a response function that governs how the the process depends on its past. Hawkes process requires the following three assumptions: φ(u) 0, u 0; φ(u) = 0, u < 0; Together these assumptions imply that 0 φ(u)du 1. φ(u) = α f(u; ξ) (3)

5 Psychometrika Submission December 17, where 0 α 1 and f(u; ξ) is a probability density function on R+ with parameter ξ. Equation (3) presents a convenient method for parametrizing φ, with some common choices for f(u; ξ) being the exponential (e.g., Ogata, 1988; Truccolo, Eden, Fellows, Donoghue, & Brown, 2005), the two-parameter gamma (Halpin & Boeck, in press), and the power law distribution (Barabási, 2005; Crane & Sornette, 2008). Under this parameterization, α is referred to as the intensity parameter and f(u; ξ) the response kernel. 3. In the case that M is orderly, dm(u) = M(u + ) is representable as series of right-shifted Dirac delta functions and the integral reduces to a sum over all events in [0, t], yielding φ(t s) dm(s) = t j <t φ(t t j ). (4) Thus each new time point is associated with a response function describing how that time point affects the future of the process. Under the assumptions of Hawkes process, each new time point increases the probability of further events occurring in the immediate future (i.e., φ(u) is non-negative). The summation shows that the effect of multiple time points on the probability of further events is cumulative. For these reasons, Hawkes process is often referred to as self-exciting; the occurrence of one event increases the probability of further events, whose occurrence in turn increases the probability of even more events. In terms of applications this means that Hawkes process is appropriate for modelling clustering, which occurs when periods of high event frequency are separated by periods of relative inactivity. As noted, the CIF leads directly to an expression for the log-likelihood (see Daley & Vere-Jones, 2003): l(θ X) = ln(λ(t k )) k T 0 λ(s)ds (5) where [0, T ] is the observation period, X = t 1, t 2,... denotes the observed event times, and θ contains the parameters of the model. Substitution of equations (2) through (4) into

6 Psychometrika Submission December 17, equation (5) shows that the log-likelihood of Hawkes process contains the logarithm of a weighted sum of density functions. A similar situation occurs in finite mixture modelling (e.g., McLachlan & Peel, 2000) and nonlinear regression (e.g., Seber & Wild, 2003), where it is known to lead to numerical optimization problems related to ill-conditioning of and multiple roots in the likelihood function. In the present case the problem is aggravated by the fact that the number of densities appearing in the likelihood increases with the number observations, which is shown in equation (4). It is important to note that the number of model parameters does not grow with the number of time points; the densities are simply right-shifted. In general, if there are a total of n observed events, then there are a total of n(n 1)/2 response functions appearing in the log-likelihood of a univariate Hawkes process, not including the duplicated response functions appearing the integral. This is the source of the quadratic growth of the optimization problem, which is the issue to be dealt with in this paper. The quadratic growth is especially problematic because the EM algorithm proposed by Halpin and De Boeck (in press) requires the use of multiple starting values. This means that even moderately sized data sets cannot be estimated in a reasonable amount of time. For example, an actual runtime of over 24 hours was recorded for a problem with N 1500 events and 50 starting values (implemented in the C language on a machine with 2 GHz of processing speed). Because one of the most exciting potential applications of Hawkes process is to big data collected via computer-mediated communication (e.g., databases, twitter), it is important to have an estimation approach that is feasible for large samples. The following section outlines how that can be accomplished. Reducing Runtime by Introducing Rounding Error This section outlines the original EM algorithm suggested by Halpin and De Boeck (in press) and then considers how to reduce its runtime. The algorithm is based on alternative representation of Hawkes process, which is referred to as its branching structure. In terms

7 Psychometrika Submission December 17, of the EM algorithm, the branching structure provides the complete data representation of the model, whereas the causal filter in equation (2) is the incomplete data representation. Taking this approach, the logarithm of the sum of densities in equation (5) is replaced by the sum of their logarithms, which results in better conditioning of the numerical optimization problem and was shown to perform satisfactorily with relatively small data sets (N 400). Although the considerations of this section could also be made for equation (5), the focus is on the EM approach. The branching structure representation of Hawkes process is in terms of a cluster Poisson process. It was first proposed by Hawkes and Oakes (1974), who proved it to be equivalent to the representation given in the foregoing section. Their argument was very technical and it served to establish the existence and uniqueness of the process. The branching structure has also found more intuitive applications. For example, in ecology it is used to describe the growth of wildlife populations in terms of subsequent generations of offspring due to each immigrant (e.g., Rasmussen, 2011). In the context of disease control, it is interpreted as the number of people contaminated by each subsequent carrier (e.g., Daley & Vere-Jones, 2003). Veen and Schoenberg (2008) were the first to consider the branching structure as a strategy for obtaining maximum likelihood estimates (MLEs) of Hawkes process. For the present purpose, the effect of the branching structure is to decompose Hawkes process into n independent Poisson processes whose rate functions are given by the response functions in equation (3). These processes govern the number of offspring of each event. There is also an additional Poisson processes governing the number of immigrant events; this process has a rate function given by the baseline parameter µ. Importantly, each event t k is assumed to be due to one and only one of these independent Poisson processes: either one centered at its parent, t j, with t j < t k, or the baseline process. Consequenty, if we knew which process each event belonged to, estimation of Hawkes process would reduce to that for a collection of independent Poisson processes. It is therefore natural to introduce a

8 Psychometrika Submission December 17, missing variable that describes the specific process to which each event t k belongs, and proceed by means of the EM algorithm. As with other applications of the EM algorithm, the missing data need not correspond to the hypothesized data generating process; it can be treated merely as a tool for obtaining MLEs. The following notation is employed to set up the algorithm. Let Z = (Z 1, Z 2,, Z n ) denote the missing data. If an event t k is an offspring of event t j, t j < t k, this is denoted by setting Z k = j. If an even t k is an immigrant then Z k = 0. Also let φ j (u) denote the response functions governing each Poisson process, where it is understood that φ 0 (u) = µ. For j > 0, these response functions are identical to those introduced in equation (3) above, except the subscript serves to make explicit the centering event t j. Letting l(θ X, Z) denote the complete data log-likelihood, Halpin and De Boeck (in press) showed that Q(θ) = E Z X,θ l(θ X, Z) ( n = ln(φ j (t k t j )) Prob(Z k = j X, θ) j=0 k>j 0 T ) φ j (T t j ) (6) where Prob(Z k = j X, θ) = φ j (t k t j ) r<k φ r(t k t r ). (7) Equations (6) and (7) provide the necessary components of an EM algorithm for Hawkes process. Equation (7) is readily computed on the E step. On the M step these probabilities are treated as fixed and entered into equation (6). Using this approach, Halpin and De Boeck (in press) provided closed form solutions for the baseline parameter µ and the intensity parameter α. However, in order to obtain the parameters of the response kernel, it is necessary to numerically optimize the Q function. This is the computationally expensive part of the algorithm. Since the sum over k > j is the source of the quadratic growth of the Q function, let s

9 Psychometrika Submission December 17, first consider how this can be reduced. Recall that for j > 1, φ j (u) = α f(u; ξ) is just a weighted density on R+. For usual choices of the response kernel, f(u; ξ) 0 as u become large (i.e., response functions typically have a right tail that asymptotes at zero). Intuitively, this means that when t k t j is large, the contribution of φ j (t k t j ) to equation (6) will be negligible. In order to make this idea more formal, consider the sets W j = {k : f(t k t j ; ξ) < w} and let W denote the average of the cardinalities of the W j. Replacing the sum over k > j with the sum over k W j in equation (6) results in W n densities appearing in the double summation. This substitution will be referred to as the modified Q function and denoted Q. W is the linear growth factor of Q. The relative efficiency of Q over Q is R = W n n(n 1)/2 = 2 W /(n 1) The value of W depends on (a) ξ, which is updated throughout the optimization process, (b) w, which can be determined by the researcher, and (c) the actual observations t k, which are fixed. This makes is difficult to obtain analytical results on W. However, Table 1 provides evidence that it does not grow with n and it can be much smaller than (n 1)/2. ========================= Insert Table 1 about here ========================= The table was produced by simulating data using the inverse method (see Daley & Vere-Jones, 2003). The causal filter in equation (2) was used for simulation, not the branching structure. Three different sample sizes (N = 500, 1500, and 5000) were simulated

10 Psychometrika Submission December 17, from each of three different models. Model 1 and Model 2 used exponential response functions, with Model 1 having moderate intensity (α =.4) and Model 2 having high intensity (α =.8). This means that the data from Model 2 showed a much higher degree of clustering (i.e., a larger number of events occurring in close proximity to one another). Model 3 is also high intensity (α =.8) but used a two-parameter gamma kernel with shape parameter set to.5. The result is heavier-tailed response functions, which have been reported in various applications to human communication data (e.g., Barabási, 2005; Crane & Sornette, 2008; Halpin & Boeck, in press). The choices of intensity parameter are intended to reflect its possible range rather than realistic values; I have not seen intensity estimates greater than.5 in real data applications. For each simulated data set, Q was computed using the true parameter values and w = 1e-10. The main point to be taken from the Table 1 is that the values of W did not increase with n and therefore the rate of growth of Q was linear. The exact rate of linear growth depended on the parameters of the data generating model, with more clustered data showing faster growth. However, even at extraordinarily high intensities and even at the smallest sample size, the growth rate was much smaller than (n 1)/2. Based on these results, it reasonable to conclude that Q is more efficient to compute than Q. It should be emphasized that this depends on the type of response kernel; the approach outlined here will not work unless the response kernel has a right tail that asymptotes at zero. Table 1 does not address how the rounding error w affects the MLEs produced by the EM algorithm. That is the topic of the next section. Although this section has only focussed on the computation of the Q function, entirely similar remarks can be made about the computation of equation (7) on the E step, and about the computation of equation (5). Effect of Rounding Error on the EM algorithm This section considers how the rounding error w effects convergence and parameter recovery. Data were again simulated using the inverse method with the incomplete data

11 Psychometrika Submission December 17, model (equation (2)). The data-generating model used a two-parameter gamma density as the response kernel. The parameters of the data generating model are stated in the Table 3 and were based on the real data example reported in Halpin and De Boeck (in press). N = 250 data sets of size n = 500 time points were generated from the model. For each data set, the EM algorithm described in Halpin and De Boeck (in press) was implemented using Q in place of Q. The starting values for the estimation algorithm were obtained by randomly disturbing the data generating values, which avoided the need for multiple starting values. Convergence was evaluated using the incomplete data log-likelihood (equation (5)). The convergence criterion was an absolute difference of less than 1e-5 on subsequent M steps. The simulation compared the rounding errors w = 0, 1e-10, 1e-5, 1e-3. Because a rounding error of 0 is not possible in practice, this was implemented using w = 2.22e-16, which is the smallest double precision number representable most modern computers. Therefore the value w = 0 represents the amount of error that is intrinsic to the specific realization of the estimation process (i.e., with the given sample size, convergence criterion, etc). The remaining values of w represent the introduction of rounding error for computation efficiency. Let s first consider the role of rounding error in the convergence of the algorithm. Figure 1 shows the relationship between the log-likelihoods evaluated at the MLEs and the log-likelihoods evaluated at the data generating parameters. The relation is quite similar for the three smallest values of w, but is appreciably worse for the largest value. It is important to note that even for w = 0, the relationship is not perfect. The amount of additional error introduced by the two middle values of w is not perceptible in the figure. ========================= Insert Figure 1 about here =========================

12 Psychometrika Submission December 17, Table 2 provides a closer look at the log-likelihoods. It reports the mean and standard deviation for the differences between the log-likelihoods of the estimated models and the log-likelihoods computed using the true values. The table entries are reported as percentages of the difference between the log-likelihoods of w = 0 and of the true values (i.e., as percentages of the intrinsic estimation error). If w > 0 did not affect the convergence of the EM algorithm, all values in the table would be 100. Based on the table we can conclude that all values of w > 0 introduced additional error into the convergence of the EM algorithm. For w = 1e-10 this was less than.1 percent of the intrinsic estimation error. ========================= Insert Table 2 about here ========================= Turning now to address parameter recovery, Table 3 reports the bias and error of the MLEs for each level of rounding error. The entries are reported as percentages of the data generating parameters. It can be seen that bias and error were very similar for the lowest two values of w, but for larger values of w there is increased bias and reduced error. Figure 2 shows the distribution of estimates of the gamma response kernels for w = 0 and w = 1e-10. ========================= Insert Table 3 about here ========================= ========================= Insert Figure 2 about here =========================

13 Psychometrika Submission December 17, Based on this simulation it may be concluded that there is little to distinguish the results obtained using a rounding error of w = 1e-10 from the intrinsic error in the algorithm (i.e., w = 0). On the other hand, w 1e-5 has a relatively large influence both on the convergence of the algorithm and on the bias and error of the resulting parameter estimates. Conclusions The number of computations required by the EM algorithm proposed by Halpin and De Boeck (in press) grows quadratically in the number of observed events, making its application to large data sets infeasible. This paper has shown that the runtime of the algorithm can be reduced by introducing rounding error into the computation of the Q function (i.e. the objective function of the M step of the EM algorithm). In three applications involving response functions with right tails asymptoting at zero, this was shown to result in linear growth. The consequences for convergence of the algorithm and parameter recovery were also considered. A rounding error of 1e 10 was found to have negligible effects, but larger values did not. While more research can be done to optimize the rounding error for specific applications of the algorithm, it can be concluded that the approach presented here provides an acceptable compromise between runtime computational accuracy.

14 Psychometrika Submission December 17, References Barabási, A. L. (2005). The origin of bursts and heavy tails in human dynamics. Nature, 435, Crane, R., & Sornette, D. (2008). Robust dynamic classes revealed by measuring the response function of a social system. Proceedings of the National Academy of Sciences, 105, Daley, D. J., & Vere-Jones, D. (2003). An introduction to the theory of point processes: Elementary theory and methods (Second ed., Vol. 1). New York: Springer. Halpin, P. F., & Boeck, P. D. (in press). Modeling dyadic interaction using hawkes process. Psychometrika. Hawkes, A. G., & Oakes, D. (1974). A cluster representation of a self-exciting process. Journal of Applied Probability, 11, McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York: John Wiley and Sons. Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical Association, 83, Rasmussen, J. G. (2011). Bayesian inference for Hawkes processes. Methodology and Computing in Applied Probability, DOI: /s Seber, G. A. F., & Wild, C. J. (2003). Non-linear regression (2nd ed.). Hoboken, NJ: John Wiley & Sons. Truccolo, W., Eden, U. T., Fellows, M. R., Donoghue, J. P., & Brown, E. N. (2005). A point process framework for relating neural spiking activity to spiking history, neural ensemble, and e extrinsic covariate effects. Journal of Neurophysiology, 93, Veen, A., & Schoenberg, F. P. (2008). Estimation of space-time branching process models in seismology using an EM-type algorithm. Journal of the American Statistical

15 Psychometrika Submission December 17, Association, 103,

16 Psychometrika Submission December 17, Tables

17 Psychometrika Submission December 17, Table 1. Growth of the Q Function in Number of Time Points (Simulated Data) n = 500 n = 1500 n = 5000 Model Model Model Note: n is number of simulated time points and the table entries are the linear growth factor, W, of the modified Q function, Q, computed using the true parameter values. W n gives the number of computations required for Q and 2 W /(n 1) gives the efficiency of Q relative to the original Q function proposed by Halpin and De Boeck (in press). The models are described in the text.

18 Psychometrika Submission December 17, Table 2. Effect of Rounding Error on Log-likelihoods (Simulated Data) w = 0 w = 1e-10 w = 1e-5 w = 1e-3 Mean SD Note: Table entries are means (M) and standard deviations (SD) of differences between log-likelihoods of the estimated models and the log likelihoods computed using the true values. The means and standard deviations are reported as percentages of the values for w = 0 (i.e., percentages of the intrinsic estimation error). The MLEs were obtained using the EM algorithm described by Halpin and De Boeck (in press) with the modified Q function and the indicated levels of rounding error, w.

19 Psychometrika Submission December 17, Table 3. Effect of Rounding Error on Parameter Recovery (Simulated Data) µ α κ β True values w = (12.707) (14.282) (11.812) (49.986) w = 1e (12.725) (14.315) (11.824) (50.664) w = 1e (10.857) (11.592) (11.215) (22.937) w = 1e (9.969) (8.786) (17.114) (3.618) Note: Table entries are bias (error) of maximum likelihood estimates (MLEs) as percentages of the true values. µ denotes the baseline parameter of Hawkes process, α the intensity parameter, κ the shape parameters of the two-parameter gamma response kernel, and β its scale parameter. MLEs were obtained using the EM algorithm described by Halpin and De Boeck (in press) with the modified Q function and the indicated levels of rounding error, w.

20 Psychometrika Submission December 17, w = 0 w = 1-e10 True values r =.998 True values r = MLEs MLEs w = 1e-5 w = 1e-3 True values r =.998 True values r = MLEs MLEs Figure 1. Relation of log-likelihoods at convergence with log-likelihoods computed using the data generating values (simulated data). The model was estimated using the EM algorithm described by Halpin and De Boeck (in press) with the modified Q function presenting in this paper and the indicated levels of rounding error, w.

21 Psychometrika Submission December 17, w = 0 w = 0 Frequency Frequency MLEs: Shape MLEs: Scale w = 1e-10 w = 1e-10 Frequency Frequency MLEs: Shape MLEs: Scale Figure 2. Histograms of maximum likelihood estimates (MLEs) of the two-parameter gamma density kernel (simulated data). Bold vertical line indicates the value of the data generating parameters. MLEs were obtained using the EM algorithm described by Halpin and De Boeck (in press) with the modified Q function presented in this paper and the indicated levels of rounding error, w.

MODELLING DYADIC INTERACTION WITH HAWKES PROCESS

MODELLING DYADIC INTERACTION WITH HAWKES PROCESS MODELLING DYADIC INTERACTION WITH HAWKES PROCESS Peter F. Halpin + & Paul De Boeck university of amsterdam November 8, 2012 + Now at New York University; Now at Ohio State University. This research was

More information

Measuring Student Engagement During Collaboration

Measuring Student Engagement During Collaboration Measuring Student Engagement During Collaboration Peter F. Halpin Joint work with Alina von Davier 1 / 53 Outline Part 1: An overview of temporal point processes 1 Step 1 Defining and detecting dependence

More information

Temporal point processes: the conditional intensity function

Temporal point processes: the conditional intensity function Temporal point processes: the conditional intensity function Jakob Gulddahl Rasmussen December 21, 2009 Contents 1 Introduction 2 2 Evolutionary point processes 2 2.1 Evolutionarity..............................

More information

Likelihood Function for Multivariate Hawkes Processes

Likelihood Function for Multivariate Hawkes Processes Lielihood Function for Multivariate Hawes Processes Yuanda Chen January, 6 Abstract In this article we discuss the lielihood function for an M-variate Hawes process and derive the explicit formula for

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Simulation and Calibration of a Fully Bayesian Marked Multidimensional Hawkes Process with Dissimilar Decays

Simulation and Calibration of a Fully Bayesian Marked Multidimensional Hawkes Process with Dissimilar Decays Simulation and Calibration of a Fully Bayesian Marked Multidimensional Hawkes Process with Dissimilar Decays Kar Wai Lim, Young Lee, Leif Hanlen, Hongbiao Zhao Australian National University Data61/CSIRO

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Residuals for spatial point processes based on Voronoi tessellations

Residuals for spatial point processes based on Voronoi tessellations Residuals for spatial point processes based on Voronoi tessellations Ka Wong 1, Frederic Paik Schoenberg 2, Chris Barr 3. 1 Google, Mountanview, CA. 2 Corresponding author, 8142 Math-Science Building,

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

STATISTICAL ANALYSIS WITH MISSING DATA

STATISTICAL ANALYSIS WITH MISSING DATA STATISTICAL ANALYSIS WITH MISSING DATA SECOND EDITION Roderick J.A. Little & Donald B. Rubin WILEY SERIES IN PROBABILITY AND STATISTICS Statistical Analysis with Missing Data Second Edition WILEY SERIES

More information

Statistical Properties of Marsan-Lengliné Estimates of Triggering Functions for Space-time Marked Point Processes

Statistical Properties of Marsan-Lengliné Estimates of Triggering Functions for Space-time Marked Point Processes Statistical Properties of Marsan-Lengliné Estimates of Triggering Functions for Space-time Marked Point Processes Eric W. Fox, Ph.D. Department of Statistics UCLA June 15, 2015 Hawkes-type Point Process

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

The propensity score with continuous treatments

The propensity score with continuous treatments 7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Improved characterization of neural and behavioral response. state-space framework

Improved characterization of neural and behavioral response. state-space framework Improved characterization of neural and behavioral response properties using point-process state-space framework Anna Alexandra Dreyer Harvard-MIT Division of Health Sciences and Technology Speech and

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Discussion of Maximization by Parts in Likelihood Inference

Discussion of Maximization by Parts in Likelihood Inference Discussion of Maximization by Parts in Likelihood Inference David Ruppert School of Operations Research & Industrial Engineering, 225 Rhodes Hall, Cornell University, Ithaca, NY 4853 email: dr24@cornell.edu

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

An Assessment of Crime Forecasting Models

An Assessment of Crime Forecasting Models An Assessment of Crime Forecasting Models FCSM Research and Policy Conference Washington DC, March 9, 2018 HAUTAHI KINGI, CHRIS ZHANG, BRUNO GASPERINI, AARON HEUSER, MINH HUYNH, JAMES MOORE Introduction

More information

A Bivariate Point Process Model with Application to Social Media User Content Generation

A Bivariate Point Process Model with Application to Social Media User Content Generation 1 / 33 A Bivariate Point Process Model with Application to Social Media User Content Generation Emma Jingfei Zhang ezhang@bus.miami.edu Yongtao Guan yguan@bus.miami.edu Department of Management Science

More information

1 A Tutorial on Hawkes Processes

1 A Tutorial on Hawkes Processes 1 A Tutorial on Hawkes Processes for Events in Social Media arxiv:1708.06401v2 [stat.ml] 9 Oct 2017 Marian-Andrei Rizoiu, The Australian National University; Data61, CSIRO Young Lee, Data61, CSIRO; The

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

Monsuru Adepeju 1 and Andy Evans 2. School of Geography, University of Leeds, LS21 1HB 1

Monsuru Adepeju 1 and Andy Evans 2. School of Geography, University of Leeds, LS21 1HB 1 Investigating the impacts of training data set length (T) and the aggregation unit size (M) on the accuracy of the self-exciting point process (SEPP) hotspot method Monsuru Adepeju 1 and Andy Evans 2 1,

More information

Mixture Models and Expectation-Maximization

Mixture Models and Expectation-Maximization Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

HANDBOOK OF APPLICABLE MATHEMATICS

HANDBOOK OF APPLICABLE MATHEMATICS HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Temporal Point Processes the Conditional Intensity Function

Temporal Point Processes the Conditional Intensity Function Temporal Point Processes the Conditional Intensity Function Jakob G. Rasmussen Department of Mathematics Aalborg University Denmark February 8, 2010 1/10 Temporal point processes A temporal point process

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Introduction to Reliability Theory (part 2)

Introduction to Reliability Theory (part 2) Introduction to Reliability Theory (part 2) Frank Coolen UTOPIAE Training School II, Durham University 3 July 2018 (UTOPIAE) Introduction to Reliability Theory 1 / 21 Outline Statistical issues Software

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

arxiv: v1 [math.st] 7 Jan 2014

arxiv: v1 [math.st] 7 Jan 2014 Three Occurrences of the Hyperbolic-Secant Distribution Peng Ding Department of Statistics, Harvard University, One Oxford Street, Cambridge 02138 MA Email: pengding@fas.harvard.edu arxiv:1401.1267v1 [math.st]

More information

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

The formal relationship between analytic and bootstrap approaches to parametric inference

The formal relationship between analytic and bootstrap approaches to parametric inference The formal relationship between analytic and bootstrap approaches to parametric inference T.J. DiCiccio Cornell University, Ithaca, NY 14853, U.S.A. T.A. Kuffner Washington University in St. Louis, St.

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Models for Multivariate Panel Count Data

Models for Multivariate Panel Count Data Semiparametric Models for Multivariate Panel Count Data KyungMann Kim University of Wisconsin-Madison kmkim@biostat.wisc.edu 2 April 2015 Outline 1 Introduction 2 3 4 Panel Count Data Motivation Previous

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Classical and Bayesian inference

Classical and Bayesian inference Classical and Bayesian inference AMS 132 January 18, 2018 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 18, 2018 1 / 9 Sampling from a Bernoulli Distribution Theorem (Beta-Bernoulli

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Forecasting Data Streams: Next Generation Flow Field Forecasting

Forecasting Data Streams: Next Generation Flow Field Forecasting Forecasting Data Streams: Next Generation Flow Field Forecasting Kyle Caudle South Dakota School of Mines & Technology (SDSMT) kyle.caudle@sdsmt.edu Joint work with Michael Frey (Bucknell University) and

More information

Mixture of Gaussians Models

Mixture of Gaussians Models Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Measuring Social Influence Without Bias

Measuring Social Influence Without Bias Measuring Social Influence Without Bias Annie Franco Bobbie NJ Macdonald December 9, 2015 The Problem CS224W: Final Paper How well can statistical models disentangle the effects of social influence from

More information

ntopic Organic Traffic Study

ntopic Organic Traffic Study ntopic Organic Traffic Study 1 Abstract The objective of this study is to determine whether content optimization solely driven by ntopic recommendations impacts organic search traffic from Google. The

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised ) Ronald H. Heck 1 University of Hawai i at Mānoa Handout #20 Specifying Latent Curve and Other Growth Models Using Mplus (Revised 12-1-2014) The SEM approach offers a contrasting framework for use in analyzing

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Expectation-Maximization

Expectation-Maximization Expectation-Maximization Léon Bottou NEC Labs America COS 424 3/9/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression,

More information

Semi-parametric predictive inference for bivariate data using copulas

Semi-parametric predictive inference for bivariate data using copulas Semi-parametric predictive inference for bivariate data using copulas Tahani Coolen-Maturi a, Frank P.A. Coolen b,, Noryanti Muhammad b a Durham University Business School, Durham University, Durham, DH1

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

Graph Detection and Estimation Theory

Graph Detection and Estimation Theory Introduction Detection Estimation Graph Detection and Estimation Theory (and algorithms, and applications) Patrick J. Wolfe Statistics and Information Sciences Laboratory (SISL) School of Engineering and

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

MODEL BASED CLUSTERING FOR COUNT DATA

MODEL BASED CLUSTERING FOR COUNT DATA MODEL BASED CLUSTERING FOR COUNT DATA Dimitris Karlis Department of Statistics Athens University of Economics and Business, Athens April OUTLINE Clustering methods Model based clustering!"the general model!"algorithmic

More information

A Note on the Expectation-Maximization (EM) Algorithm

A Note on the Expectation-Maximization (EM) Algorithm A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization

More information

Interaction effects for continuous predictors in regression modeling

Interaction effects for continuous predictors in regression modeling Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

IE 303 Discrete-Event Simulation

IE 303 Discrete-Event Simulation IE 303 Discrete-Event Simulation 1 L E C T U R E 5 : P R O B A B I L I T Y R E V I E W Review of the Last Lecture Random Variables Probability Density (Mass) Functions Cumulative Density Function Discrete

More information

Practical Considerations Surrounding Normality

Practical Considerations Surrounding Normality Practical Considerations Surrounding Normality Prof. Kevin E. Thorpe Dalla Lana School of Public Health University of Toronto KE Thorpe (U of T) Normality 1 / 16 Objectives Objectives 1. Understand the

More information

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018 15-388/688 - Practical Data Science: Basic probability J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Logistics of next few lectures Final project released, proposals/groups due

More information

ECE 275A Homework 7 Solutions

ECE 275A Homework 7 Solutions ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator

More information

Lossless Online Bayesian Bagging

Lossless Online Bayesian Bagging Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu

More information

Summary and discussion of The central role of the propensity score in observational studies for causal effects

Summary and discussion of The central role of the propensity score in observational studies for causal effects Summary and discussion of The central role of the propensity score in observational studies for causal effects Statistics Journal Club, 36-825 Jessica Chemali and Michael Vespe 1 Summary 1.1 Background

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

More on Unsupervised Learning

More on Unsupervised Learning More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Discriminant Analysis and Statistical Pattern Recognition

Discriminant Analysis and Statistical Pattern Recognition Discriminant Analysis and Statistical Pattern Recognition GEOFFRY J. McLACHLAN The University of Queensland @EEC*ENCE A JOHN WILEY & SONS, INC., PUBLICATION This Page Intentionally Left Blank Discriminant

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Modeling Multiscale Differential Pixel Statistics

Modeling Multiscale Differential Pixel Statistics Modeling Multiscale Differential Pixel Statistics David Odom a and Peyman Milanfar a a Electrical Engineering Department, University of California, Santa Cruz CA. 95064 USA ABSTRACT The statistics of natural

More information

Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes

Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes Ellida M. Khazen * 13395 Coppermine Rd. Apartment 410 Herndon VA 20171 USA Abstract

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS

ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS AIDA TOMA The nonrobustness of classical tests for parametric models is a well known problem and various

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference

The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference An application to longitudinal modeling Brianna Heggeseth with Nicholas Jewell Department of Statistics

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution Carl Scheffler First draft: September 008 Contents The Student s t Distribution The

More information

Advanced Statistical Methods. Lecture 6

Advanced Statistical Methods. Lecture 6 Advanced Statistical Methods Lecture 6 Convergence distribution of M.-H. MCMC We denote the PDF estimated by the MCMC as. It has the property Convergence distribution After some time, the distribution

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information