Fast Bayesian scan statistics for multivariate event detection and visualization

Size: px
Start display at page:

Download "Fast Bayesian scan statistics for multivariate event detection and visualization"

Transcription

1 Research Article Received 22 January 2010, Accepted 22 January 2010 Published online in Wiley Online Library (wileyonlinelibrary.com) DOI: /sim.3881 Fast Bayesian scan statistics for multivariate event detection and visualization Daniel B. Neill The multivariate Bayesian scan statistic (MBSS) is a recently proposed, general framework for event detection and characterization in multivariate space time data. MBSS integrates prior information and observations from multiple data streams in a Bayesian framework, computing the posterior probability of each type of event in each space time region. MBSS has been shown to have many advantages over previous event detection approaches, including improved timeliness and accuracy of detection, easy interpretation and visualization of results, and the ability to model and accurately differentiate between multiple event types. This work extends the MBSS framework to enable detection and visualization of irregularly shaped clusters in multivariate data, by defining a hierarchical prior over all subsets of locations. While a naive search over the exponentially many subsets would be computationally infeasible, we demonstrate that the total posterior probability that each location has been affected can be efficiently computed, enabling rapid detection and visualization of irregular clusters. We compare the run time and detection power of this Fast Subset Sums method to our original MBSS approach (assuming a uniform prior over circular regions) on semi-synthetic outbreaks injected into real-world Emergency Department data from Allegheny County, Pennsylvania. We demonstrate substantial improvements in spatial accuracy and timeliness of detection, while maintaining the scalability and fast run time of the original MBSS method. Copyright 2011 John Wiley & Sons, Ltd. Keywords: event detection; disease surveillance; scan statistics 1. Introduction This work focuses on the task of disease surveillance, in which we monitor electronically available public health data sources, such as hospital visits and medication sales, to automatically detect emerging outbreaks of disease. Our goal is to develop disease surveillance systems which can identify outbreaks in the very early stages (typically, before a definitive diagnosis of the outbreak disease can be obtained), enabling a timely and effective public health response. The multivariate Bayesian scan statistic (MBSS) [1] is a recently proposed Bayesian framework for detection and characterization of emerging outbreaks. MBSS integrates information from multiple data streams, enabling more timely and more accurate event detection, and can model and differentiate between multiple event types. MBSS can be used to detect emerging outbreaks of disease, pinpoint the affected spatial region, distinguish between different outbreak diseases, and reduce the number of false-positive alerts due to irrelevant anomalies in the data [1]. By detecting and characterizing outbreaks in their early stages, MBSS can provide public health users with sufficient situational awareness to enable a rapid and informed response. MBSS has been shown to have numerous advantages over the frequentist spatial scan statistics approach [1]. By integrating information from multiple data streams and incorporating prior knowledge of an outbreak s effects on the monitored data streams, MBSS can achieve more timely and more accurate detection, detecting an average of 1.3 days faster than Kulldorff s multivariate scan statistic [2]. MBSS can model and accurately differentiate between multiple event types, thus distinguishing between patterns due to different outbreak diseases and those due to other irrelevant events in the data. Outbreak models can be specified by a domain expert or learned automatically from a small number of labeled training examples. Randomization testing is not necessary in the Bayesian framework: since 999 or more Monte Carlo replications must typically be performed to obtain accurate p-values for the frequentist approach, we can obtain a 1000 speedup by avoiding the need for randomization. Finally, the results produced by MBSS (the posterior probability H.J. Heinz III College, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, U.S.A. Correspondence to: Daniel B. Neill, H.J. Heinz III College, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, U.S.A. neill@cs.cmu.edu 455 Copyright 2011 John Wiley & Sons, Ltd. Statist. Med. 2011,

2 that each event type has affected each spatial region) are easy to interpret, visualize, and use for decision-making. MBSS computes the posterior probability Pr(H 1 (S, E k ) D) that each outbreak type E k has affected each spatial region S, and these results can be visualized as a posterior probability map, showing the total probability Pr(H 1 (s i ) D)= Pr(H 1 (S, E k ) D) E k S:s i S that each location s i has been affected. The primary limitation of MBSS is a computational issue common to many spatial scan statistic methods: to compute the posterior probabilities, we must evaluate a huge number of spatial regions S. If we place no constraints on the search region, all 2 N subsets of the N locations must be considered, which is computationally infeasible for even moderately large N. MBSS, like other typical scan statistic approaches, solves this problem by only considering a reduced set of regions, placing restrictions on the region shape. As in Kulldorff s original spatial scan statistic approach [3], MBSS restricts the search space to the set of O(N 2 ) distinct circular regions centered at one of the N spatial locations. While this limitation of the search space makes MBSS computationally feasible, the assumption of circular outbreaks causes it to lose detection power for elongated or irregular outbreaks, as well as reducing its ability to accurately pinpoint the affected region. Here we propose a new extension of the MBSS framework, Fast Subset Sums (FSS), which enables timely and accurate detection and visualization of irregularly shaped clusters in multivariate data. FSS can efficiently compute the posterior probability map, as well as the total posterior probability of an outbreak, without computing the individual posterior probabilities of each spatial region. It defines a hierarchical prior distribution over regions which assigns nonzero prior probability to each of the 2 N subsets of locations, and efficiently computes the summed posterior probabilities using this distribution. 2. Methods We briefly review the recently proposed MBSS framework, discuss its limitations, and then present our new FSS method Multivariate Bayesian scan statistics In the multivariate disease surveillance problem, our goal is to detect and characterize outbreaks based on their effects on the monitored data sources. We typically monitor aggregated count data for multiple spatial locations, time steps, and data streams. For example, to detect an influenza outbreak, we could monitor hospital Emergency Department (ED) visits, with each data stream representing the number of ED visits with a different symptom type (e.g. cough or fever ). The MBSS framework [1] assumes a dataset D consisting of multiple data streams D m, for m =1...M. Each data stream consists of spatial time series data collected at a set of spatial locations s i, for i =1...N. For each stream D m and location s i, we have a time series of observed counts ci,m t and a time series of expected counts bt i,m, where t =0 represents the current time step and t =1...T represent the counts from 1 to T time steps ago respectively. For example, a given observed count ci,m t might represent the total number of ED visits with fever symptoms, for a given zip code on a given day. The corresponding expected count bi,m t would be the expected number of ED visits with fever symptoms in that zip code on that day, estimated from the time series of historical fever counts for that zip code. Here we calculate expected counts using a simple, 28-day moving average, but other time series analysis methods can be also be applied, in order to adjust for seasonal and day-of-week trends. MBSS is designed to detect emerging outbreaks, identify the type of outbreak (e.g. distinguishing between anthrax and influenza), and pinpoint the affected locations. To do so, it compares the set of alternative hypotheses H 1 (S, E k ), each representing the occurrence of some outbreak type E k in some subset of locations S, against the null hypothesis H 0 that no outbreaks have occurred. These hypotheses are assumed to be mutually exclusive, and thus compound events (where multiple outbreaks are occurring simultaneously) must be modeled as separate hypotheses. The posterior probability of each hypothesis (given the dataset D) is computed using Bayes theorem: 456 Pr(H 1 (S, E k ) D) = Pr(D H 1(S, E k ))Pr(H 1 (S, E k )) Pr(D) Pr(H 0 D) = Pr(D H 0)Pr(H 0 ) Pr(D)

3 Figure 1. Bayesian network representation of the MBSS method. Solid ovals represent observed quantities, and dashed ovals represent hidden quantities that are modeled. The counts ci,m t are directly observed, whereas the baselines bt i,m and the parameter priors for each stream (α m,β m ) are estimated from historical data. In this expression, the posterior probability of each hypothesis is normalized by the total probability of the data, Pr(D)= Pr(D H 0 )Pr(H 0 )+ Pr(D H 1 (S, E k ))Pr(H 1 (S, E k )). S,E k The standard MBSS method [1] assumes a uniform prior Pr(H 1 (S, E k )) over all event types E k and all circular spatial regions S. As discussed below, nonuniform priors can also be incorporated. In some cases, the prior distribution can be learned from a small number of labeled training examples [4]. The likelihood of the data given each hypothesis, Pr(D H), is computed using the hierarchical Gamma Poisson model shown in Figure 1. Each count ci,m t is assumed to be Poisson distributed: ct i,m Poisson(qt i,m bt i,m ), where bt i,m is the expected count (computed by time series analysis of historical data) and qi,m t is the relative risk. Under the null hypothesis of no outbreaks, each relative risk qi,m t is assumed to be generated from a Gamma distribution: qt i,m Gamma(α m,β m ), where α m and β m are estimated from the historical data for stream D m, as described in [1]. Under the alternative hypothesis H 1 (S, E k ), the expected value of each relative risk is increased by a multiplicative factor xi,m t, the impact of the outbreak for the given spatial location s i, data stream D m, and time step t: qi,m t Gamma(xt i,m α m,β m ). For a given set of impacts X ={xi,m t }, the likelihood of the data can be computed from the Gamma Poisson model as follows [1]: Pr(D X) = Pr(ci,m t bt i,m, xt i,m,α m,β m ) i,m,t i,m,t ( β m β m +b t i,m ) x t i,m α m Γ(x t i,m α m +c t i,m ) Γ(x t i,m α m) In this expression, terms not dependent on the xi,m t have been removed, since these are constant for all hypotheses under consideration. For the null hypothesis H 0, no events have occurred, and thus we assume xi,m t =1 everywhere: Pr(D H 0 ) ( ) αm β m Γ(α m +ci,m t ) i,m,t β m +bi,m t Γ(α m ) For the alternative hypothesis H 1 (S, E k ), we must marginalize over the values of x t i,m : Pr(D H 1 (S, E k ))= X Pr(D X)Pr(X H 1 (S, E k )) The distribution of the impacts xi,m t is conditional on the outbreak type E k, the affected region S, and two additional parameters: the outbreak severity θ and the temporal window W. The outbreak is assumed to affect those and only those locations in region S, for the most recent W time steps (t =0...W 1). Thus we assume xi,m t =1 for unaffected locations (s i S) and for time steps before the start of the outbreak (t W ). For affected locations and time steps (s i S and t<w ), the impact xi,m t is computed as a function of the average impact x km,avg of outbreak type E k on data stream D m, and the outbreak severity θ: x t i,m =1+θ(x km,avg 1). For example, if x km,avg was equal to 1.4, an outbreak of typical severity (θ=1) would increase counts by 40 per cent (x t i,m =1.4), and an outbreak with severity θ=2 would increase counts by 80 per cent (xt i,m =1.8). 457

4 The average effects of each outbreak type on each stream, x km,avg, can either be specified by a domain expert or learned from training data, as described in [1]. For the experiments below, we assume a simple, fixed set of three outbreak models E 1...E 3 for two data streams D 1 and D 2. E 1 assumes that only stream D 1 is affected (x 11 =1.5 andx 12 =1), E 2 assumes that only stream D 2 is affected (x 21 =1andx 22 =1.5), and E 3 assumes that the outbreak has equal impact on both data streams (x 31 = x 32 =1.5). We assume a discrete uniform distribution Θ for θ, and assume that W is drawn uniformly between 1 and W max, where W max is the maximum temporal window size. W max is an important parameter for detection: larger values of W max improve detection power for more gradually emerging outbreaks, whereas smaller values of W max improve detection power for more rapidly emerging outbreaks [5]. We consider two typical values, W max =3andW max =7, in our experiments below. The total likelihood of the data given in the hypothesis H 1 (S, E k ) can be computed by marginalizing over θ and W : 1 Pr(D H 1 (S, E k ))= Pr(D H 1 (S, E k ),θ, W ) W max Θ θ Θ W 1...W max As described in [1], the posterior probability map can be computed by MBSS in seven steps: loading the count data, computing baselines, computing parameter priors, computing location likelihood ratios, computing region likelihood ratios, computing region posterior probabilities, and computing the posterior probability map. The first three steps can be performed in time proportional to the size of the dataset, O(NMT), where N, M, andt are the numbers of locations, streams, and time steps, respectively. The fourth step is to pre-compute the likelihood ratios LR i = m=1...m t=0...w 1 Pr(c t i,m bt i,m,x mα m,β m ) Pr(c t i,m bt i,m,α m,β m ) for each location s i, for each combination of the outbreak type E k, temporal window W, and severity θ. Then the likelihood ratio for a given region S (given E k, W,andθ) can be computed by multiplying the likelihood ratios LR i for all s i S. This formulation has the benefit that the expensive likelihood ratio computations are only performed a number of times proportional to the number of locations, rather than the much larger number of regions. The fifth step of the MBSS framework is to multiply the location likelihood ratios (or add log-likelihood ratios) to obtain the likelihood ratio for each spatial region S, for each combination of E k, W,andθ. The sixth step requires computation of posterior probabilities for each combination of outbreak type E k and spatial region S, marginalizing over W and θ, and the seventh step computes the posterior probability map by summing the posterior probabilities of all regions containing each spatial location. Each of the last three steps requires computation time proportional to the number of regions S, and thus MBSS is computationally expensive when the number of regions is large. The standard MBSS method deals with this computational issue by considering only a small fraction of the O(2 N ) possible subsets of the N spatial locations, limiting its search to circular regions S and assuming that all non-circular regions have zero prior probability. As in Kulldorff s original spatial scan statistic [3], MBSS examines the set of O(N 2 ) circular regions S, each containing a center location s c and its k 1 nearest neighbors (as measured by distance between the zip code centroids), for k =1...N. As noted above, MBSS typically assumes a uniform region prior Pr(H 1 (S, E k ) E k )=1/N S, where N S is the total number of space time regions, but nonuniform priors can also be incorporated, and various methods for learning these priors from data [1, 4] have been explored Fast subset sums As noted above, the primary limitation of the standard MBSS method is its exhaustive computation over spatial regions S, requiring us to restrict the set of spatial regions considered to only a small fraction of the O(2 N ) possible subsets of locations, e.g. only considering circular regions. This restriction prevents MBSS from searching over the huge number of irregularly shaped regions, reducing its detection power and spatial detection accuracy for highly elongated or irregular regions. However, two key insights enable us to circumvent this limitation. First, both the total posterior probability of an outbreak and the posterior probability map are sums of the posterior region probabilities Pr(H 1 (S, E k ) D), where the total posterior probability Pr(H 1 D) is a sum over all spatial regions, and the posterior probability of each spatial location s i is a sum over all spatial regions which contain s i. We show that each of these sums can be calculated efficiently without computing the individual posterior probabilities of each spatial region S. Second, we define a specific, nonuniform prior distribution Pr(H 1 (S, E k ) E k ) over all 2 N subsets of the data. This prior distribution has non-zero prior probabilities for any given subset of the data S, but more compact clusters have larger priors, thus enforcing a soft constraint on spatial proximity. Most importantly, as we demonstrate below, the prior distribution has a hierarchical structure which enables us to efficiently compute sums of posterior probabilities over exponentially many regions in linear time.

5 We consider a hierarchical region prior conditioned on two latent variables: the center location s c and the neighborhood size k. We assume a two-stage process in which s c and k are drawn from their respective distributions, and then the subset S is chosen conditional on s c and k. As in the original MBSS method (searching over circular regions), we can assume that s c is drawn uniformly at random from the set of spatial locations, and that the neighborhood size k is drawn uniformly at random between 1 and some constant k max (the maximum neighborhood size). We typically use k max = N, the number of spatial locations, but smaller values of k max can be used to reduce computation time, as shown below. Nonuniform distributions for the center s c and neighborhood size k can easily be incorporated into our method, but the experiments described below assume uniform distributions. Once we have drawn the center location s c and neighborhood size k, we then consider location s c and its k 1 nearest neighbors, and choose a subset of locations uniformly at random from the 2 k possible subsets of these k locations. Thus the probability of a given center s c, neighborhood k, and region S is Pr(s c,k, S)= Pr(s c )Pr(k)/2 k, if all locations s i S are contained in the k-neighborhood of s c,and0otherwise.the total prior probability of a given region S can be computed by marginalizing over s c and k. For uniform distributions of s c and k, with k max = N, we can compute sc Pr(S)= 2N kc+1 1, N 2 2 N where k c is the minimum neighborhood size such that the k-neighborhood of s c contains all elements of S. This expression is roughly proportional to 2 k min, where k min =min sc k c is a measure of the compactness of region S. The advantage of this formulation is that, for a given center location s c and a given neighborhood size k, we can compute the total posterior probability of the 2 k spatial regions in O(k) time. As we must consider O(N) center locations and O(k max ) neighborhood sizes, this approach enables us to compute the total posterior probability in time O(Nkmax 2 ). To do so, we define S ck as the k-neighborhood of center location s c (i.e. s c and its k 1 nearest neighbors). Then, conditioning on the outbreak type E k, severity θ, temporal window W, center location s c, and neighborhood size k, we can compute the total posterior probability that any region S S ck has been affected. First, we know Pr(S D) Pr(S)LR(S), S S ck S S ck where Pr(S) is the prior probability of region S, conditioned on E k, s c,andk, andlr(s) is the likelihood ratio of region S, Pr(D H 1 (S, E k ))/Pr(D H 0 ), conditioned on E k, θ, andw. Second, we know that Pr(S)LR(S)= 1 S S ck 2 k LR(S), S S ck as we are assuming a uniform prior over the 2 k subsets of S ck. Third, we know that LR(S)= LR i, S S ck S S ck s i S where LR i is the likelihood ratio of location s i, conditioned on E k, θ, andw. Fourth, as we are summing over all 2 k subsets of S ck, we can write the sum of 2 k products as a product of k sums: LR i = (1+LR i ). S S ck s i S s i S ck Fifth, we obtain the final result, S S ck Pr(S D) 1 2 k (1+LR i )= s i S ck s i S ck ( ) 1+LRi 2. Thus the posterior probability of an outbreak, conditioned on the outbreak type E k, temporal window W, severity θ, center location s c, and neighborhood size k, is proportional to the product of the smoothed likelihood ratios (1+LR i )/2 for all locations s i S ck. This is very similar to the case of circular regions, where the posterior probability of an outbreak (again, conditioned on E k, W, θ, s c,andk) was proportional to the product of the unsmoothed likelihood ratios LR i.in each case, we must compute the total posterior probability of an outbreak by marginalizing over E k, W, θ, s c,andk. Finally, we consider how the posterior probability map can be efficiently computed. To do so, we can efficiently compute the posterior probability of an outbreak affecting a given location s j, conditioned on the outbreak type E k, temporal window W, severity θ, center location s c, and neighborhood size k, using a procedure very similar to the above. The only difference is that we must compute Pr(S D) 1 LR 2 k i. S S ck :s j S S S ck :s j S s i S 459

6 In this case, we are summing over all 2 k 1 subsets of S ck that contain s j, and can write the sum of 2 k 1 products as the product of k 1 sums: and thus S S ck :s j S s i S S S ck :s j S LR i =LR j s i S ck {s j } (1+LR i ), ( ) LR Pr(S D) = j 1+LR j Pr(S D). S S ck We can compute the total posterior probability of an outbreak containing each location s j by marginalizing over E k, W, θ, s c,andk, thus enabling efficient computation of the posterior probability map Related work This work describes FSS, an extension of our recently proposed MBSS framework [1] which enables computationally efficient detection of irregularly shaped outbreaks. MBSS extends our Bayesian event detection framework [6] to integrate information from multiple data streams, improving detection power and reducing false positives, and also enables us to accurately model and distinguish between multiple outbreak types. The Bayesian scan statistic is a variant of the more traditional, hypothesis testing approach to spatial scan statistics, developed by Kulldorff and Nagarwalla [3, 7]. While Kulldorff s original spatial scan statistic [3] did not take the time dimension into account, later work generalized this method to the space time scan statistic by scanning over variable size temporal windows [8, 9]. Recent extensions, such as the expectation-based scan statistic [10] and model-based scan statistic [11], use historical data to model the expected distribution of counts in each spatial location, and the expectation-based scan statistic approach is also applied in the present work to obtain the expected counts bi,m t. Many other variants of the spatial and space time scan statistics have been proposed, differing in both the set of regions to be searched and the underlying statistical models. Various statistical models have been proposed for the spatial scan, ranging from simple Poisson and Gaussian statistics [10, 12] to robust and nonparametric models [13, 14]. While Kulldorff s original method [3] assumed circular search regions, other methods have searched over rectangles [15], ellipses [16], and various sets of irregularly shaped regions [17--19]. We note that previous spatial scan methods for detecting irregularly shaped regions either restrict the search space (reducing power for any outbreak regions which do not fit the assumed distribution) or perform a heuristic search, in which case they are not guaranteed to find an optimal or near-optimal region; our FSS method, on the other hand, performs an exact computation over all O(2 N ) subsets of locations, though it computes the posterior probability map rather than identifying the specific subset of greatest interest. Two multivariate extensions of the frequentist spatial scan have recently been proposed: Kulldorff s parametric scan [2], which directly extends the original spatial scan statistic to multiple data streams by assuming that all data streams are independent, and the nonparametric scan [14], which combines empirical p-values from multiple data streams without relying on an underlying parametric model. Unlike MBSS and FSS, neither of these two methods can differentiate between multiple outbreak types. Several other multivariate disease surveillance methods have been proposed, including multivariate extensions of traditional time series analysis methods [20--22] and network-based methods [23] that detect anomalous ratios of counts between streams. These purely temporal methods do not take spatial information into account: they may be used to detect anomalous increases in the aggregate time series of the entire area being monitored, rather than detecting and pinpointing a spatial cluster of affected locations. Additionally, these methods cannot model and differentiate between multiple event types. PANDA [24, 25] uses Bayesian network models to differentiate between multiple outbreak types (e.g. the Centers for Disease Control (CDC) Category A diseases), assuming an underlying entity-based model of ED visits. We have recently developed a multivariate model that incorporates spatial information into PANDA, using ED chief complaint data as evidence [26]. We have recently developed other efficient search methods based on linear-time subset scanning (LTSS), which allow us to efficiently perform unconstrained maximization of certain univariate likelihood ratio statistics over the 2 N subsets of the data [27]. These methods cannot be easily applied to the MBSS because of the added complications of non-uniform prior probabilities, multivariate data, and (most of all) the need to calculate the denominator of the posterior probability, which requires evaluation and summation of the posterior probabilities of all hypotheses under consideration. Additionally, LTSS methods do not guarantee a solution to constrained maximization problems, and thus are difficult to apply in spatial detection scenarios where we wish to enforce spatial proximity constraints on the detected regions.

7 3. Evaluation We now compare the run time, detection power, and spatial accuracy of the FSS method to the original MBSS. While FSS and MBSS both rely on the Bayesian probabilistic framework [1] described briefly above, MBSS assumes a uniform prior over the set of O(N 2 ) circular regions, whereas FSS assumes a nonuniform, hierarchical prior which is nonzero for all O(2 N ) subsets of the data. Thus we expect FSS to improve timeliness and accuracy of detection for elongated or irregular outbreak regions, as these are not modeled well by the assumption of circular regions. However, since a much larger set of regions must be considered, we expect the run time of FSS to be slower than the original MBSS method. However, we expect the run time of FSS to be much faster than a naive search over all O(2 N ) subsets, which would be computationally infeasible for even moderate values of N. The following subsections describe our test dataset (ED data from Allegheny County), compare the run time of FSS to the MBSS and naive subset sums methods, describe our outbreak simulations, and compare the detection power and spatial accuracy of FSS and MBSS for these simulated outbreaks Description of emergency department data We obtained a dataset of de-identified ED visit records collected from 10 Allegheny County hospitals from January 1, 2004 to December 31, Each record contains fields for the patient s date of admission to the ED, home zip code, chief complaint (free text), and International Classification of Diseases (ICD9) code (numeric). We removed records where the home zip code or admission date was missing, or where the home zip code was outside Allegheny County, leaving records (64.8 per cent). The free-text chief complaint was present for all remaining records, and the ICD9 code was present for (84.7 per cent) of the remaining records. From this data, we created two distinct streams of count data (cough/dyspnea (CD) and nausea/vomiting (NV)) by recording the number of patient records matching the given symptoms in each zip code for each day. A patient record was determined to match a given set of symptoms if its chief complaint string contained certain substrings, or if its ICD9 code matched certain values: for the CD stream, we included records with chief complaints containing the substrings cough, dyspnea, shortness, or sob, or with ICD9 codes matching (cough) or (shortness of breath). For the NV stream, we included records with chief complaints containing the substrings naus, vom, or n/v, or with ICD9 codes matching (nausea and/or vomiting) or (persistent vomiting). Each set of records was manually refined to remove spurious substring matches. The time series of daily counts (aggregated over all 97 Allegheny County zip codes) for each data stream is shown in Figure 2. The CD stream had a mean daily count of 44.0 cases, with a standard deviation of 12.1, and the NV stream had a mean daily count of 25.9 cases, with a standard deviation of 7.0. As these cases were spread over the 97 Allegheny County zip codes, many zip codes had zero counts on any given day. Both streams were weakly overdispersed, with index of dispersion (ratio of variance to mean) 3.31 for the CD data and 1.88 for the NV data, and daily counts of the two streams were positively correlated (r = 0.338). Both streams exhibited slight (but statistically significant) day-of-week trends, with counts peaking on Mondays, and clear seasonal trends, with counts peaking in February and March for the CD and NV datasets, respectively Comparison of run times Our first experiment compared the run times of FSS, Naive Subset Sums (searching exhaustively over all subsets of locations), and the original MBSS method (searching over circular regions), as a function of the maximum neighborhood Figure 2. Daily counts for two streams of Allegheny County Emergency Department data (cough/dyspnea and nausea/vomiting) from January 1, 2004 to December 31, 2005: (a) CD and (b) NV. 461

8 Figure 3. Total run times for 100 days of data, for Fast Subset Sums (FSS), naive subset sums, and the original MBSS approach, as a function of the maximum neighborhood size k max. size k max. Increasing the neighborhood size requires a larger number of regions to be searched: the number of search regions with neighborhood size k {1...k max } scales linearly with k max if only circular regions are considered, and exponentially with k max if we consider all subsets of the center and its k 1 nearest neighbors. We computed the time required by each method to compute the posterior probability maps for the first 100 days of Allegheny County ED data, for each value of k max. For each method, we used the same two data streams (CD and NV), the same three outbreak models (assuming one outbreak type that affects only CD, one outbreak type that affects only NV, and one that affects both streams equally), and the same maximum temporal window size of three days (W max =3). As shown in Figure 3, the run time of the original MBSS method increased gradually with increasing k max,upto a maximum of s (to process 100 days of data) for k max =90. The run time of the Naive Subset Sums method increased exponentially, making it computationally infeasible for k max 25. Run times were 5.02 h for 100 days of data for k max =15, and 161 h for 100 days of data for k max =20; for k max =30, the naive method would have required an estimated 68.5 days to process a single day of data, and for k max =90, it would have required an estimated years. Finally, we observe that the run time of FSS scales quadratically with increasing k max, up to a maximum of s (to process 100 days of data) for k max =90. Thus, while FSS is approximately 7.5 slower than the original MBSS method, it is still extremely fast, computing the posterior probability map for each day of data in under 9 s. We also considered a fixed k max =10, and examined the effects of doubling the number of data streams (from 2 to 4), doubling the number of event models (from 3 to 6), and doubling the maximum temporal window size (from 3 to 6), respectively. Each of these three changes increased the run time of each method by per cent, demonstrating linear dependence of run time on these three parameters as expected Simulation of outbreaks Next we used a semi-synthetic testing framework (injecting simulated multivariate outbreaks into the real-world ED data) to compare the detection power and spatial accuracy of the FSS and MBSS methods. We considered a simple class of simulated outbreaks with a linear increase in the expected number of cases over the duration of the outbreak. More precisely, our outbreak simulator takes three parameters: the outbreak duration T, the outbreak severity Δ, and the subset of affected zip codes S inject. Then for each injected outbreak, the outbreak simulator chooses the start date of the outbreak t start uniformly at random. On each day t of the outbreak, t =1...T, the outbreak simulator injects Poisson(tw i,m Δ m ) cases into each stream of each affected zip code, where w i,m is the weight of the zip code for that stream, t w i,m = ct i,m. We considered 10 differently shaped outbreak regions S inject, as shown in Figures 5 14 (left panels). All outbreaks were assumed to be two weeks in duration (T =14), and we assumed Δ CD =Δ NV =1. For each outbreak region, we considered 200 different, randomly generated outbreaks, giving a total of 2000 outbreaks for evaluation. We note that simulation of outbreaks is an active area of ongoing research in biosurveillance. The creation of realistic outbreak scenarios is important because of the difficulty of obtaining sufficient labeled data from real outbreaks, but is also very challenging. State-of-the-art outbreak simulations, such as those of Buckeridge et al. [28], and Wallstrom i t ct i,m

9 Table I. Comparison of MBSS and FSS methods, using maximum temporal window size W max =3. Outbreak region MBSS FSS (96.0 per cent) (98.0 per cent) (96.0 per cent) (96.5 per cent) (71.5 per cent) (85.0 per cent) (83.5 per cent) (90.5 per cent) (68.5 per cent) (95.0 per cent) (97.5 per cent) (99.5 per cent) (100 per cent) (100 per cent) (100 per cent) (100 per cent) (100 per cent) (100 per cent) (99.0 per cent) (100 per cent) Average (91.2 per cent) (96.5 per cent) Average days to detection, and proportion of outbreaks detected, at a fixed false-positive rate of 1/month. Table II. Comparison of MBSS and FSS methods, using maximum temporal window size W max =7. Outbreak region MBSS FSS (97.0 per cent) (97.0 per cent) (99.0 per cent) (99.0 per cent) (65.0 per cent) (79.0 per cent) (78.5 per cent) (83.5 per cent) (62.5 per cent) (94.5 per cent) (99.0 per cent) (100 per cent) (100 per cent) (100 per cent) (100 per cent) (100 per cent) (100 per cent) (100 per cent) (100 per cent) (100 per cent) Average (90.1 per cent) (95.3 per cent) Average days to detection, and proportion of outbreaks detected, at a fixed false-positive rate of 1/month. et al. [29], combine disease trends observed from past outbreaks with information about the current background data into which the outbreak is being injected, as well as allowing the user to adjust parameters, such as outbreak duration and severity. While the simple linear outbreak model that we use here is not a realistic model of the temporal progression of an outbreak, it enables precise comparison of the detection power of different methods, gradually ramping up the severity of the outbreak until it is detected Comparison of detection power We compared the detection power of the FSS and MBSS methods for each of the 10 outbreak regions, considering two values of the maximum temporal window size (W max =3andW max =7) for each method. We fixed the value of k max = N, i.e. the MBSS method considers all O(N 2 ) circular regions and the FSS method considers all O(2 N ) subsets of the data. For each combination of method and outbreak region, we computed the method s proportion of outbreaks detected and average number of days to detect as a function of the allowable false-positive rate. To do this, we first computed the maximum region score F =max S F(S) for each day of the original dataset with no outbreaks injected (the first 84 days of data are excluded, since these are used to calculate baseline estimates for our methods). Then for each injected outbreak, we computed the maximum region score for each outbreak day, and determined what proportion of the days for the original dataset have higher scores. Assuming that the original dataset contains no outbreaks, this is the proportion of false positives that we would have to accept in order to have detected the outbreak on day t. Fora fixed false-positive rate r, the days to detect for a given outbreak is computed as the first outbreak day (t =1...14) with proportion of false positives less than r. If no day of the outbreak has proportion of false positives less than r, the method has failed to detect that outbreak: for the purposes of our days to detect calculation, these are counted as 14 days to detect, but could also be penalized further. The detection performance of the MBSS and FSS methods is presented in Tables I and II. For each of the 10 outbreak regions, we present each method s average detection time and percentage of outbreaks detected at a fixed false-positive rate of 1/month. We also show the AMOC curves (average detection time as a function of false-positive rate) for MBSS and FSS, averaged over all 10 outbreak regions, in Figure

10 Figure 4. AMOC curves for FSS and MBSS, for maximum temporal window sizes 3 and 7. Average days to detection as a function of false-positive rate. Figure 5. Outbreak region #1 (compact cluster, downtown Pittsburgh). Left: true outbreak region. Center: average posterior probability map for MBSS. Right: average posterior probability map for FSS. From the AMOC curves, we see that FSS consistently achieves more timely detection than MBSS across a range of false-positive rates. For a fixed false-positive rate of 1/month, FSS detected an average of one day earlier than MBSS for a maximum temporal window size W max =3, and 0.54 days earlier for W max =7, with fewer than half as many missed outbreaks in each case. Comparing the detection performance of FSS and MBSS for each of the 10 outbreak regions, we observe that both methods achieve very similar detection times for compact outbreak regions (#1, #2, and #8). For highly elongated outbreak regions (#5 and #10), FSS detected between 1.3 and 2.2 days earlier. For moderately elongated and irregularly shaped regions, the difference between methods was smaller: FSS detected days earlier for W max =3 and days earlier for W max =7. Both methods achieved more timely detection when W max =3, suggesting that the given outbreaks did not emerge gradually enough for the increased temporal window size to improve performance Comparison of spatial accuracy For each of the 10 outbreak regions, we compared the spatial accuracy of the MBSS and FSS methods. The center panels of Figures 5 14 show the average posterior probability maps for the original MBSS method for each of the 10 outbreak regions at the midpoint of the outbreak, averaged over the 200 randomly generated outbreaks affecting that region. The right panels of Figures 5 14 show the average posterior probability maps for FSS for each outbreak region at the midpoint of the outbreak. We used a maximum temporal window size W max =7 for both methods. Comparing each averaged map to the true affected regions (left panels of Figures 5 14), we observe that FSS more accurately estimates the spatial extent of the outbreak region. The detected regions were very similar for the compact outbreaks (#1, #2, and #8). For the elongated outbreaks (#4, #5, #9, and #10), FSS closely approximated the true outbreak region, whereas MBSS was not able to accurately distinguish the true region. Finally, for the irregular and disjoint outbreaks (#3, #6, and #7), FSS was better able to fit the irregular shape, whereas MBSS typically reported a compact cluster containing the irregular shape and several additional (incorrect) zip codes. To quantify these differences in spatial detection accuracy, we computed the average overlap coefficient, precision, and recall for each method for each of the 10 outbreak types, as a function of the outbreak day. To compute the overlap coefficient for a given day of a given outbreak, we first compute the set of detected zip codes. To do so, we find the zip code with largest posterior outbreak probability p, and consider any zip code with posterior probability greater than p/2 to have been detected. We compare the detected zip codes to the set of true zip codes containing injected cases, and count the number of hits (zip codes which are both true and detected). The overlap coefficient is then defined as

11 Figure 6. Outbreak region #2 (compact cluster, southeast of Pittsburgh). Left: true outbreak region. Center: average posterior probability map for MBSS. Right: average posterior probability map for FSS. Figure 7. Outbreak region #3 (two disjoint clusters). Left: true outbreak region. Center: average posterior probability map for MBSS. Right: average posterior probability map for FSS. Figure 8. Outbreak region #4 (elongated cluster, downtown Pittsburgh). Left: true outbreak region. Center: average posterior probability map for MBSS. Right: average posterior probability map for FSS. Figure 9. Outbreak region #5 (highly elongated cluster, downtown Pittsburgh). Left: true outbreak region. Center: average posterior probability map for MBSS. Right: average posterior probability map for FSS. Figure 10. Outbreak region #6 (irregular cluster, Pittsburgh North and South Sides). Left: true outbreak region. Center: average posterior probability map for MBSS. Right: average posterior probability map for FSS. N hits /(N true + N detected N hits ). The precision is defined as N hits /N detected, and the recall is defined as N hits /N true. Finally, we compute the average overlap coefficient, precision, and recall for each outbreak region (#1 10) for each outbreak day (1 14), averaging over all 200 outbreaks of the given type. 465

12 Figure 11. Outbreak region #7 (irregular cluster, Monroeville area). Left: true outbreak region. Center: average posterior probability map for MBSS. Right: average posterior probability map for FSS. Figure 12. Outbreak region #8 (compact cluster, west of Pittsburgh). Left: true outbreak region. Center: average posterior probability map for MBSS. Right: average posterior probability map for FSS. Figure 13. Outbreak region #9 (elongated cluster, south of Pittsburgh). Left: true outbreak region. Center: average posterior probability map for MBSS. Right: average posterior probability map for FSS. Figure 14. Outbreak region #10 (highly elongated cluster, north Allegheny County). Left: true outbreak region. Center: average posterior probability map for MBSS. Right: average posterior probability map for FSS. 466 Figure 15 shows the average overlap coefficients (averaged over all 10 outbreak regions) for each method as a function of the outbreak day. Tables III and IV show the average overlap coefficient, precision, and recall for each outbreak region for each method, at the midpoint of the outbreak. From the figure, we observe that FSS increased the average overlap coefficient by 6 15 per cent when compared with MBSS, for a given outbreak day and a given value of the maximum temporal window size W max. Both methods had higher overlap coefficients for W max =7: as many of the affected locations may have had no counts injected on a given time step, a longer temporal window enabled the methods to observe a larger number of injected counts, thus improving their ability to pinpoint the affected region. At the midpoint of the outbreak, FSS increased the average overlap coefficient by 7.3 per cent for W max =3and 10.0 per cent for W max =7. However, we observe from the tables that the overlap coefficients were similar for the more compact outbreaks (#1, #2, and #7), while FSS substantially improved the overlap coefficients for the more elongated outbreaks (#5, #9, and #10). FSS obtained substantially higher precision than MBSS for all 10 outbreak regions, whereas MBSS obtained substantially higher recall for 8 of 10 outbreak regions. Increasing W max from 3 to 7 improved both precision and recall for FSS, whereas only improving precision for MBSS. FSS tended to miss affected zip codes with very small numbers of injected counts, whereas MBSS tended to include these zip codes if they were spatially proximate

13 Figure 15. Spatial detection accuracy for FSS and MBSS, for maximum temporal window sizes 3 and 7. Average overlap coefficient between true and estimated regions for each outbreak day. Table III. Comparison of MBSS and FSS methods, using maximum temporal window size W max =3. Precision Recall Overlap Outbreak region MBSS (per cent) FSS (per cent) MBSS (per cent) FSS (per cent) MBSS (per cent) FSS (per cent) Average Average overlap coefficient, precision, and recall at the midpoint of the outbreak. Table IV. Comparison of MBSS and FSS methods, using maximum temporal window size W max =7. Precision Recall Overlap Outbreak region MBSS (per cent) FSS (per cent) MBSS (per cent) FSS (per cent) MBSS (per cent) FSS (per cent) Average Average overlap coefficient, precision, and recall at the midpoint of the outbreak. to other affected zip codes; on the other hand, MBSS often detected additional spatially proximate zip codes which were not part of the affected region. The most likely reason for these differences is that FSS picks out the (possibly irregularly shaped) subset of locations with sufficiently many injected cases without including the surrounding locations, whereas MBSS typically detects a larger, circular region containing these locations. When the true outbreak region is compact (circular or nearly circular), the circular region chosen by MBSS may better approximate its true extent; for elongated 467

14 or irregular regions, the circular region chosen by MBSS is a poor approximation, and the greater flexibility of FSS is needed to improve spatial accuracy. 4. Conclusions The FSS method is an extension of our MBSS framework [1] which allows the computationally efficient detection of irregular clusters. By defining a hierarchical prior over all O(2 N ) subsets of the N monitored locations, rather than considering only the O(N 2 ) circular regions, FSS can achieve more timely detection of elongated or irregularly shaped clusters, and can more accurately pinpoint the affected subset of locations. Our experiments demonstrate substantial improvements in detection time ( days) and spatial accuracy (more than doubling the overlap coefficient between true and detected regions) for highly elongated clusters, while achieving similar detection time and spatial accuracy for compact clusters. Most importantly, while a naive approach to computing the posterior probability map (requiring likelihood computations for each of the exponentially many subsets of the data) would be computationally infeasible, our computationally efficient FSS method scales up to the approximately 100 zip codes in Allegheny County, requiring less than 9 s to compute the posterior probability map for a given day of data. Our continuing exploration of the FSS approach will extend the present work in two main directions. First, while our evaluation of detection power and spatial accuracy assumed a maximum neighborhood size k max equal to the number of locations N, k max can also be set to less than N if we wish to rule out highly elongated clusters, improving detection power for compact and nearly compact clusters, and reducing computation time. Similarly, while we assumed that each subset of the chosen neighborhood (a center location s c and its k 1 nearest neighbors) was equally likely, we can also generalize to the case where we independently choose (with some probability p) whether each location in the neighborhood is affected. Our current method is equivalent to assuming p = 1 2, and searching over circles is equivalent to assuming p =1, but we can also choose a value of p between 1 2 and 1. This generalization still allows us to consider elongated and irregular regions, but gives a higher weight to more compact regions. Our future work will perform a detailed analysis of the detection power of the generalized FSS method as a function of k max, p, and the compactness of the outbreak region. Finally, we propose to incorporate incremental model learning into the FSS approach. As in our original MBSS method, the average effects of each outbreak type on each data stream can be learned from a small number of labeled training examples using a smoothed maximum likelihood approach [1]. Learning the hierarchical prior over subsets of the data is more challenging, as we must infer the distributions over center locations s c and neighborhood size k (and optionally, the location probability p) from partially labeled training examples. Typically, the subset of affected locations S is labeled, but the values of s c, k, andp are unknown. We will apply a generalized expectation-maximization (GEM) approach to estimate the distributions of these values, assuming multinomial distributions for s c and k, andafixed(but unknown) parameter p. This model is very similar to the latent center model described in [4], but conditions on the neighborhood size k rather than the radius r, and assumes that the conditional probability that a location is affected is uniform (within the chosen neighborhood) rather than decreasing as a function of distance from the center. We believe that incorporation of learning into the FSS approach will enable efficient optimization of the hierarchical prior from a small number of training examples, further improving the timeliness and accuracy of outbreak detection. Acknowledgements This work was partially supported by the National Science Foundation grants IIS , IIS , and IIS References 1. Neill DB, Cooper GF. A multivariate Bayesian scan statistic for early event detection and characterization. Machine Learning 2009; DOI: /s Kulldorff M, Mostashari F, Duczmal L, Yih WK, Kleinman K, Platt R. Multivariate scan statistics for disease surveillance. Statistics in Medicine 2007; 26: Kulldorff M. A spatial scan statistic. Communications in Statistics: Theory and Methods 1997; 26(6): Makatchev M, Neill DB. Learning outbreak regions in Bayesian spatial scan statistics. Proceedings of the ICML/UAI/COLT Workshop on Machine Learning for Health Care Applications, Helsinki, Finland, Neill DB. Expectation-based scan statistics for monitoring spatial time series data. International Journal of Forecasting 2009; 25: Neill DB, Moore AW, Cooper GF. A Bayesian spatial scan statistic. Advances in Neural Information Processing Systems 18, 2006; Kulldorff M, Nagarwalla N. Spatial disease clusters: detection and inference. Statistics in Medicine 1995; 14:

Scalable Bayesian Event Detection and Visualization

Scalable Bayesian Event Detection and Visualization Scalable Bayesian Event Detection and Visualization Daniel B. Neill Carnegie Mellon University H.J. Heinz III College E-mail: neill@cs.cmu.edu This work was partially supported by NSF grants IIS-0916345,

More information

Expectation-based scan statistics for monitoring spatial time series data

Expectation-based scan statistics for monitoring spatial time series data International Journal of Forecasting 25 (2009) 498 517 www.elsevier.com/locate/ijforecast Expectation-based scan statistics for monitoring spatial time series data Daniel B. Neill H.J. Heinz III College,

More information

Results and Findings. I. Theory of linear time subset scanning

Results and Findings. I. Theory of linear time subset scanning Results and Findings I. Theory of linear time subset scanning In Neill (2012), we present formal proofs of the linear time subset scanning property for a large class of score functions, using the proof

More information

Fast Multidimensional Subset Scan for Outbreak Detection and Characterization

Fast Multidimensional Subset Scan for Outbreak Detection and Characterization Fast Multidimensional Subset Scan for Outbreak Detection and Characterization Daniel B. Neill* and Tarun Kumar Event and Pattern Detection Laboratory Carnegie Mellon University neill@cs.cmu.edu This project

More information

Learning Outbreak Regions in Bayesian Spatial Scan Statistics

Learning Outbreak Regions in Bayesian Spatial Scan Statistics Maxim Makatchev Daniel B. Neill Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213 USA maxim.makatchev@cs.cmu.edu neill@cs.cmu.edu Abstract The problem of anomaly detection for biosurveillance

More information

A Recursive Algorithm for Spatial Cluster Detection

A Recursive Algorithm for Spatial Cluster Detection A Recursive Algorithm for Spatial Cluster Detection Xia Jiang, MS, Gregory F. Cooper, MD, PhD Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA Abstract Spatial cluster detection

More information

Detection of Emerging Space-Time Clusters

Detection of Emerging Space-Time Clusters Detection of Emerging Space-Time Clusters Daniel B. Neill, Andrew W. Moore, Maheshkumar Sabhnani, Kenny Daniel School of Computer Science Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

A spatial scan statistic for multinomial data

A spatial scan statistic for multinomial data A spatial scan statistic for multinomial data Inkyung Jung 1,, Martin Kulldorff 2 and Otukei John Richard 3 1 Department of Epidemiology and Biostatistics University of Texas Health Science Center at San

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning, Midterm Exam: Spring 2009 SOLUTION 10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Bayesian Learning Features of Bayesian learning methods:

Bayesian Learning Features of Bayesian learning methods: Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Detecting Emerging Space-Time Crime Patterns by Prospective STSS

Detecting Emerging Space-Time Crime Patterns by Prospective STSS Detecting Emerging Space-Time Crime Patterns by Prospective STSS Tao Cheng Monsuru Adepeju {tao.cheng@ucl.ac.uk; monsuru.adepeju.11@ucl.ac.uk} SpaceTimeLab, Department of Civil, Environmental and Geomatic

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan Bayesian Learning CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Multivariate Bayesian modeling of known and unknown causes of events An application to biosurveillance

Multivariate Bayesian modeling of known and unknown causes of events An application to biosurveillance c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 7 ( 2 0 1 2 ) 436 446 j o ur nal homep age : w ww.intl.elsevierhealth.com/journals/cmpb Multivariate Bayesian modeling

More information

Statistics for IT Managers

Statistics for IT Managers Statistics for IT Managers 95-796, Fall 2012 Module 2: Hypothesis Testing and Statistical Inference (5 lectures) Reading: Statistics for Business and Economics, Ch. 5-7 Confidence intervals Given the sample

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

FleXScan User Guide. for version 3.1. Kunihiko Takahashi Tetsuji Yokoyama Toshiro Tango. National Institute of Public Health

FleXScan User Guide. for version 3.1. Kunihiko Takahashi Tetsuji Yokoyama Toshiro Tango. National Institute of Public Health FleXScan User Guide for version 3.1 Kunihiko Takahashi Tetsuji Yokoyama Toshiro Tango National Institute of Public Health October 2010 http://www.niph.go.jp/soshiki/gijutsu/index_e.html User Guide version

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty Bayes Classification n Uncertainty & robability n Baye's rule n Choosing Hypotheses- Maximum a posteriori n Maximum Likelihood - Baye's concept learning n Maximum Likelihood of real valued function n Bayes

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Cluster Analysis using SaTScan

Cluster Analysis using SaTScan Cluster Analysis using SaTScan Summary 1. Statistical methods for spatial epidemiology 2. Cluster Detection What is a cluster? Few issues 3. Spatial and spatio-temporal Scan Statistic Methods Probability

More information

Short Note: Naive Bayes Classifiers and Permanence of Ratios

Short Note: Naive Bayes Classifiers and Permanence of Ratios Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES Mariana Nagy "Aurel Vlaicu" University of Arad Romania Department of Mathematics and Computer Science

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Quasi-likelihood Scan Statistics for Detection of

Quasi-likelihood Scan Statistics for Detection of for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for

More information

INTRODUCTION TO PATTERN RECOGNITION

INTRODUCTION TO PATTERN RECOGNITION INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take

More information

Machine Learning Overview

Machine Learning Overview Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Chapter 6 Classification and Prediction (2)

Chapter 6 Classification and Prediction (2) Chapter 6 Classification and Prediction (2) Outline Classification and Prediction Decision Tree Naïve Bayes Classifier Support Vector Machines (SVM) K-nearest Neighbors Accuracy and Error Measures Feature

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Automated Discovery of Novel Anomalous Patterns

Automated Discovery of Novel Anomalous Patterns Automated Discovery of Novel Anomalous Patterns Edward McFowland III Machine Learning Department School of Computer Science Carnegie Mellon University mcfowland@cmu.edu DAP Committee: Daniel B. Neill Jeff

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Steven Bergner, Chris Demwell Lecture notes for Cmpt 882 Machine Learning February 19, 2004 Abstract In these notes, a

More information

Cluster Analysis using SaTScan. Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007

Cluster Analysis using SaTScan. Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007 Cluster Analysis using SaTScan Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007 Outline Clusters & Cluster Detection Spatial Scan Statistic Case Study 28 September 2007 APHEO Conference

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005 Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach Radford M. Neal, 28 February 2005 A Very Brief Review of Gaussian Processes A Gaussian process is a distribution over

More information

Spatiotemporal Outbreak Detection

Spatiotemporal Outbreak Detection Spatiotemporal Outbreak Detection A Scan Statistic Based on the Zero-Inflated Poisson Distribution Benjamin Kjellson Masteruppsats i matematisk statistik Master Thesis in Mathematical Statistics Masteruppsats

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Naïve Bayesian. From Han Kamber Pei

Naïve Bayesian. From Han Kamber Pei Naïve Bayesian From Han Kamber Pei Bayesian Theorem: Basics Let X be a data sample ( evidence ): class label is unknown Let H be a hypothesis that X belongs to class C Classification is to determine H

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Lossless Online Bayesian Bagging

Lossless Online Bayesian Bagging Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

A fast and simple algorithm for training neural probabilistic language models

A fast and simple algorithm for training neural probabilistic language models A fast and simple algorithm for training neural probabilistic language models Andriy Mnih Joint work with Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 25 January 2013 1

More information

Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?

Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture? Bayes Rule CS789: Machine Learning and Neural Network Bayesian learning P (Y X) = P (X Y )P (Y ) P (X) Jakramate Bootkrajang Department of Computer Science Chiang Mai University P (Y ): prior belief, prior

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Efficient Likelihood-Free Inference

Efficient Likelihood-Free Inference Efficient Likelihood-Free Inference Michael Gutmann http://homepages.inf.ed.ac.uk/mgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh 8th November 2017

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Learning a probabalistic model of rainfall using graphical models

Learning a probabalistic model of rainfall using graphical models Learning a probabalistic model of rainfall using graphical models Byoungkoo Lee Computational Biology Carnegie Mellon University Pittsburgh, PA 15213 byounko@andrew.cmu.edu Jacob Joseph Computational Biology

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Disease mapping with Gaussian processes

Disease mapping with Gaussian processes EUROHEIS2 Kuopio, Finland 17-18 August 2010 Aki Vehtari (former Helsinki University of Technology) Department of Biomedical Engineering and Computational Science (BECS) Acknowledgments Researchers - Jarno

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

A Bayesian Approach to Concept Drift

A Bayesian Approach to Concept Drift A Bayesian Approach to Concept Drift Stephen H. Bach Marcus A. Maloof Department of Computer Science Georgetown University Washington, DC 20007, USA {bach, maloof}@cs.georgetown.edu Abstract To cope with

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information