Bayesian Analysis of Massive Datasets Via Particle Filters

Size: px

Start display at page:

Download "Bayesian Analysis of Massive Datasets Via Particle Filters"

Bartholomew Webb
5 years ago
Views:

1 Bayesian Analysis of Massive Datasets Via Particle Filters

2 Bayesian Analysis Use Bayes theorem to learn about model parameters from data Examples: Clustered data: hospitals, schools Spatial models: public health Support vector machines Model based clustering

3 Metropolis-Hastings algorithm Initialize θ 1 2. For i in 2,, M a. Draw a proposal θ from q(θ θ i 1 ) b. Compute the acceptance probability c. Set θ i = θ with probability α Otherwise θ i = θ i 1

4 Important ideas Metropolis makes Bayesian analysis practical Metropolis often requires an enormous number of laps through the dataset Given a θ drawn from f (θ x), the Metropolis algorithm produces a new draw having the same distribution Using particle filtering we reverse the inner and outer for-loops of Metropolis

5 Importance sampling Target distribution is f (θ x) Sampling distribution is g(θ) θ i has density g(θ) and w i = f (θ i x)/g(θ i )

6 Important ideas We cannot sample from f (θ x) directly because the model is complex and x is massive Importance sampling allows us to sample from difficult to sample distributions For efficiency, g(θ) and f (θ x) should be similar

7 Importance sampling for massive datasets Set the sampling distribution as where n << N The importance weights greatly simplify D 1 D 2 D 1 Use Metropolis to sample from g(θ) and reweight the draws to look like a sample from f (θ x)

9 The algorithm Load as much data into memory as possible to form D 1 Draw M times from f (θ D 1 ) via, e.g., Metropolis Purge D 1 from memory Set w i = 1, i = 1,, M one pass through D For j = n+1,, N 2 { for i = 1,, M w i = w i f(x j θ i ) }

11 Effective sample size coefficient of variation Why? Suppose S weights are 1/S and M-S are zero. Then Also can argue

12 Note:

13 ESS deterioration unknown variance Effective sample size M(n/N) d Known variance Additional observations

14 Gilks and Berzuini rejuvenation θ 1,, θ M are particles and the weights filter the θ i with little posterior mass Get initial sample from f (θ x 1,,x n ) While ESS is large enough incorporate new observations using importance reweighting Sample with replacement from θ 1,, θ M with probability proportional to w i (new weights 1/M) Rejuvenate: For each θ i do a single Metropolis step

15 Sample from g(θ) g(θ)

16 Reweight, resample to get f(θ x) f(θ x) g(θ)

17 Rejuvenate

18 Frequency of rejuvenation Let N k be the total number of observations up to the kth rejuvenation Suppose one rejuvenates every time ESS drop below p M Recall: Thus: So that:

19 Frequency of rejuvenation

21 Example 1: Mixture of transition models Set of sequences of length 5 to 20 states visited by each observation; 4 possible states Each sequence was generated by one of two first order probability transition matrices We do not know the transition probabilities nor the cluster assignments Properties 25 million observations 1 Gb of data allowed only 1,000 sequences in memory

22 Number of accesses Number of Access Observation index in millions

24 Example 2: Fully Bayes regression AT&T outpic dataset Predict whether a customer has churned, switched to a competitor s service Five continuous and two three-level categorical variables 744,963 records, 57 Mb when stored double precision

25 Example 2: Fully Bayes regression Logistic regression with a Laplace shrinkage prior Related to the LASSO

26 Parameter estimates The Metropolis algorithm has strong dependence Additional steps, at the cost of additional scans will fix this

27 Number of accesses

28 Conclusions Requires one good Metropolis-Hastings run up front with a small dataset Greatly reduces data access requirements Number of data accesses does not depend on M Chopin (2002) Biometrika article offers a similar strategy with interesting measures of sample quality

CLASSIFIERS OF MASSIVE AND STRUCTURED DATA PROBLEMS: ALGORITHMS AND APPLICATIONS

CLASSIFIERS OF MASSIVE AND STRUCTURED DATA PROBLEMS: ALGORITHMS AND APPLICATIONS BY SUHRID BALAKRISHNAN A dissertation submitted to the Graduate School New Brunswick Rutgers, The State University of New