Asymptotics of ABC. Paul Fearnhead 1, Correspondence: Abstract

Aymptotic of ABC Paul Fearnhead 1, 1 Department of Mathematic and Statitic, Lancater Univerity Correpondence: p.fearnhead@lancater.ac.uk arxiv:1706.07712v1 [tat.me] 23 Jun 2017 Abtract Thi document i due to appear a a chapter of the forthcoming Handbook of Approximate Bayeian Computation (ABC) edited by S. Sion, Y. Fan, and M. Beaumont. We preent an informal review of recent work on the aymptotic of Approximate Bayeian Computation (ABC). In particular we focu on how doe the ABC poterior, or point etimate obtained by ABC, behave in the limit a we have more data? The reult we review how that ABC can perform well in term of point etimation, but tandard implementation will over-etimate the uncertainty about the parameter. If we ue the regreion correction of Beaumont et al. then ABC can alo accurately quantify thi uncertainty. The theoretical reult alo have practical implication for how to implement ABC. 1 Introduction Thi chapter aim to give an overview of recent work on the aymptotic of Approximate Bayeian Computation (ABC). By aymptotic here we mean how doe the ABC poterior, or point etimate obtained by ABC, behave in the limit a we have more data? The chapter ummarie reult from three paper, Li and Fearnhead (2015), Frazier et al. (2016) and Li and Fearnhead (2016). The preentation in thi chapter i deliberately informal, with the hope of conveying both the intuition behind the theoretical reult from thee paper and the practical conequence of thi theory. A uch we will not preent all the technical condition for the reult we give: the intereted reader hould conult the relevant paper for thee, and the reult we tate hould be interpreted a holding under appropriate regularity condition. We will focu on ABC for a p-dimenional parameter,, from a prior p() (we ue the common convention of denoting vector in bold, and we will aume thee are column vector). We aume we 1

have data of ize n that i ummaried through a d-dimenional ummary tatitic. The aymptotic reult we review conider the limit n, but aume that the ummary tatitic i of fixed dimenion. Furthermore all reult aume that the dimenion of the ummary tatitic i at leat a large a the dimenion of the parameter, d p thi i implicit in the identifiability condition that we will introduce later. Example of uch a etting are where the ummarie are ample mean of function of individual data point, quantile of the data, or, for time-erie data, are empirical auto-correlation of the data. It alo include ummarie baed on fixed-dimenional auxillary model (Drovandi et al., 2015) or on compoite likelihood core function (Ruli et al., 2016). To ditinguih the ummary tatitic for the oberved data from the ummary tatitic of data imulated within ABC, we will denote the former by ob, and the latter by. Our model for the data will define a probability model for the ummary. We aume that thi in turn pecifie a probability denity function, or likelihood, for the ummary, f n (; ), which depend on the parameter. In ome ituation we will want to refer to the random variable for the ummary tatitic, and thi will be S n,. A i tandard with ABC, we aume that we can imulate from the model but cannot calculate f n (; ). The mot baic ABC algorithm i a rejection ampler (Pritchard et al., 1999), which iterate the following three tep: (RS1) Simulate a parameter from the prior: i p(). (RS2) Simulate a ummary tatitic from the model given i : i f n ( i ). (RS3) Accept i if ob i < ɛ. Here ob i i a uitably choen ditance between the oberved and imulated ummary tatitic, and ɛ i a uitably choen bandwidth. In the following we will aume that x i either Euclidean ditance, x 2 = x T x, or a Mahalanobi ditance, x 2 = x T Γx for ome choen poitive-definite d d matrix Γ. If we define a (uniform) kernel function, K(x), to be 1 if x < 1 and 0 otherwie, then thi rejection ampler i drawing from the following ditribution ( ) ob π ABC () p() f n ( )K d. ɛ We call thi the ABC poterior. If we are intereted in etimating a function of the parameter h() we can ue the ABC poterior mean h ABC = h()π ABC ()d. 2

In practice we cannot calculate thi poterior mean analytically, but would have to etimate it baed on the ample mean of h( i ) for parameter value i imulated uing the above rejection ampler. In thi chapter we review reult on the behaviour of the ABC poterior, the ABC poterior mean, and Monte Carlo etimate of thi mean a n. In particular we conider whether the ABC poterior concentrate around the true parameter value in Section 2. We then conider the limiting form of the ABC poterior and the frequentit aymptotic ditribution of the ABC poterior mean in Section 3. For the latter two reult we compare thee aymptotic ditribution with thoe of the true poterior given the ummary which i the bet we can hope for once we have choen our ummary tatitic. The reult in thee two ection ignore any Monte Carlo error. The impact of Monte Carlo error on the aymptotic variance of our ABC poterior mean etimate i the focu of Section 4. Thi impact depend on the choice of algorithm we ue to ample from the ABC poterior (wherea the choice of algorithm ha no effect on the actual ABC poterior or poterior mean that are analyed in the earlier ection). The rejection ampling algorithm above i inefficient in the limit a n and thu we conider more efficient importance ampling and MCMC generaliation in thi ection. We then review reult that how how pot-proceing the output of ABC can lead to ubtantially tronger aymptotic reult. The chapter then finihe with a dicuion that aim to draw out the key practical inight from the theory. Before we review thee reult, it i worth mentioning that we can generalie the definition of the ABC poterior, and the aociate poterior mean, given above. Namely we can ue a more general form of kernel than the uniform kernel. Mot of the reult we review apply if we replace the uniform kernel by a different kernel, K(x), that i monotonically decreaing in x. Furthermore the pecific form of the kernel ha little affect on the aymptotic reult what matter mot i how we chooe the bandwidth and, in ome cae, the choice of ditance. The fact that mot of the theoretical reult do not depend on the choice of kernel mean that, for concretene, we will primarily aume a uniform kernel in our preentation below. The exception being in Section 3 where it i eaier to get an intuition for the reult if we ue a Gauian kernel. By focuing on thee two choice we do not mean to ugget that they are necearily better than other choice, it i jut that they implify the expoition. We will return to the choice of kernel in the Dicuion. 2 Poterior Concentration The reult we preent in thi ection are from Frazier et al. (2016) (though ee alo Martin et al., 2016), and conider the quetion of whether the ABC poterior will place increaing probability 3

ma around the true parameter value a n. It i the mot baic convergence reult we would wih for, require weaker condition than reult we give in Section 3, and i thu eaier to apply to other ABC etting (ee for example Marin et al., 2014; Bernton et al., 2017). We will denote the true parameter value by 0. If we define Pr ABC ( 0 < δ) = π ABC ()d, : 0 <δ the ABC poterior probability that i within ome ditance δ of the true parameter value, then for poterior concentration we want that for any δ > 0 Pr ABC ( 0 < δ) 1 a n. That i, for any trictly poitive choice of ditance, δ, regardle of how mall it i, a n we need the ABC poterior to place all it probability on the event that i within δ of the true parameter value. To obtain poterior concentration for ABC we will need to let the bandwidth depend on n, and henceforth we denote the bandwidth by ɛ n. 2.1 ABC Poterior Concentration The poterior concentration reult of Frazier et al. (2016) i baed upon auming a law of large number for the ummary tatitic. Specifically we need the exitence of a binding function, b(), uch that for any S n, b() in probability a n. If thi hold, and the binding function atifie an identifiability condition: that b() = b( 0 ) implie = 0, then we have poterior concentration providing the bandwidth tend to zero, ɛ n 0. To gain ome inight into thi reult and the aumption behind it, we preent an example. To be able to viuallie what i happening we will aume that the parameter and ummary tatitic are both 1-dimenional. Figure 1 how an example binding function, a value of 0 and ob, and output from the ABC rejection ampler. A n increae we can ee the plotted point, that how propoed parameter and ummary tatitic value, converge toward the line that how the binding function. Thi tem from our aumption of a law of large number for the ummarie, o that for each value the ummarie hould tend to b() a n increae. We alo have that the oberved ummary tatitic, ob, converge toward b( 0 ). Furthermore we are decreaing the bandwidth a we increae n, which correpond to narrower acceptance region 4

Figure 1: Example binding function, b() (top-left plot). Pair of parameter and ummary tatitic value propoed by a rejection ampler (top-middle). Output of rejection ampler (top-right): 0 and b( 0 ) (blue dotted vertical and horizontal line repectively); ob (bold red circle, and red dahed horizontal line) and acceptance region for propoed ummarie (bold red dahed horizonal line); pair of parameter and ummary tatitic value accepted (bold) and rejected (grey) by the rejection ampler. Bottom-row plot are the ame a top-right plot but for increaing n and decreaing ɛ n. Here, and for all plot, our reult are for a imple cenario where data i IID Gauian with a mean that i a function of the parameter, and the ummary tatitic i the ample mean. (In thi cae the binding function i, by definition, equal to the mean function.) for the ummarie, which mean that the accepted ummary tatitic converge toward b( 0 ). Aymptotically, only parameter value cloe to 0, which have value b() which are cloe to b( 0 ), will imulate ummarie cloe to b( 0 ). Hence the only accepted parameter value will be cloe to, and aymptotically will concentrate on, 0. Thi can be een in practice from the plot in the bottom row of Figure 1. The identifiability condition on the binding function i ued to enure that concentration of accepted ummarie around b( 0 ) reult in ABC poterior concentration around 0. What happen when thi identifiability condition doe not hold i dicued in Section 2.3. 2.2 Rate of Concentration We can obtain tronger reult by looking at the rate at which concentration occur. Informally we can think of thi a the upremum of rate, λ n 0, uch that Pr ABC ( 0 < λ n ) 1 5

Figure 2: Example of ABC concentration for differing rate of the noie in the ummary tatitic and rate of ɛ n. Plot are a in Figure 1. Top-row: noie in ummary tatitic halving, or equivalently ample ize increaing by a factor of 4, while ɛ n decreaing by 1/ 2 a we move from left to right. Bottom-row: noie in ummary tatitic decreaing by 1/ 2, or equivalently ample ize doubling, while ɛ n halving a we move from left to right. a n. For parametric Bayeian inference with independent and identically ditributed data thi rate would be 1/ n. Auming the binding function i continuou at 0, then the rate of concentration will be determined by the rate at which accepted ummarie concentrate on b( 0 ). A decribed above, thi depend on the variability (or noie ) of the imulated ummarie around the binding function and on the bandwidth, ɛ n. The rate of concentration will be the lower of the rate at which the noie in the ummary tatitic and the rate at which ɛ n tend to 0. We can ee thi from the example in Figure 2, where we how output from the ABC rejection ampler for different value of n, but with ɛ n tending to 0 at either a fater or lower rate than that of the noie in the ummarie. For each regime the rate of concentration of both the accepted ummarie and of the accepted parameter value i determined by the lower of the two rate. 2.3 Effect of Binding Function The hape of the binding function for value of for which b() i cloe to b( 0 ) affect the ABC poterior a it affect the range of value that will have a reaonable chance of producing ummary tatitic value that would be accepted by the ABC rejection ampler. If the identifiability condition hold and the binding function i differentiable at 0 then the value of thi gradient will directly impact the ABC poterior variance. Thi i hown in the top row 6

2 3 4 5 6 1.0 0.5 0.0 0.5 1.0 1.5 Figure 3: Example of the effect of the hape of binding function on the ABC poterior (plot are a in Figure 1). Top row: gradient of binding function at b( 0 ) affect the ABC poterior variance, with larger gradient (left-hand plot) reulting in lower ABC poterior variance than maller gradient (right-hand plot). Bottom row: effect of non-identifiability on ABC poterior. of Figure 3. If thi gradient i large (top-left plot) then even quite large difference in ummary tatitic would correpond to mall difference in the parameter, and hence a mall ABC poterior variance. By comparion if the gradient i mall (top-right plot) then large difference in parameter may mean only mall difference in ummary tatitic. In thi cae we expect a much larger ABC poterior variance for the ame width of the region in which the ummary tatitic are accepted. The bottom row of Figure 3 how what can happen if the identifiability condition doe not hold. The bottom-left plot give an example where there are two ditinct parameter value for which the binding function i equal to b( 0 ). In thi cae we have a bi-modal ABC poterior that concentrate on thee two value. The bottom-right plot how an example where there i a range of parameter value whoe binding function value i equal to b( 0 ), and in thi cae the ABC poterior will concentrate on thi range of parameter value. It can be difficult in practice to know whether the identifiability condition hold. In large data etting, oberving a multi-modal poterior a in the bottom-left plot of Figure 3 would ugget that it doe not hold. In uch cae it may be poible to obtain identifiability by adding extra ummarie. The wih to enure identifiability i one reaon for chooing a higher dimenional ummary than parameter. However thi doe not come without potential cot, a we how in Section 3. 7

2 1 2 3 4 5 2 1 2 3 4 5 2 1 0 1 2 2 1 0 1 2 1 1 2 1 2 3 4 5 2 1 2 3 4 5 2 1 0 1 2 2 1 0 1 2 1 1 Figure 4: Example of the effect of model error in ABC for the Gauian model with incorrect variance decribed in the text. The plot, from left to right and top to bottom, correpond to increaing ample ize. Each plot how the 2-dimenional binding function a we vary (line); the oberved ummary tatitic (red circle) and accepted (black dot) and rejected (grey dot) ummary tatitic value. (For thi model the parameter value ued to imulate the ummary tatitic will be cloe to the firt ummary tatitic, 1.) 2.4 Model Error One of the implicit aumption behind the reult on poterior concentration i that our model i correct. Thi manifet itelf within the aumption that a we get more data the oberved ummary tatitic will converge to the value b( 0 ). If the model we aume in ABC i incorrect then thi may not be the cae (ee Frazier et al., 2017, for a fuller dicuion of the impact of model error). There are then two poibilitie, the firt i that the oberved ummary tatitic will converge to a value b( ) for ome parameter value 0. In thi cae, by the argument above, we can till expect poterior concentration but to and not 0. The other poibility i that the oberved ummary tatitic converge to a value that i not equal to b() for any. Thi i mot likely to occur when the dimenion of the ummary tatitic i greater than the dimenion of the parameter. To give ome inight into thi cenario, we give in an example in Figure 4, where we have independent identically ditributed data from a Gauian ditribution with mean and variance 2 + 2, but our model aume the mean and variance are and 2 + 1 repectively. Thi correpond to a wrong aumption about the variance. We then apply ABC with ummary tatitic that are the ample mean and variance. A hown in the figure, we till can get poterior concentration in thi etting. If we denote the limiting value of the binding function for the true model a b 0, then the poterior concentrate on 8

parameter value, or value, whoe binding function value i cloet, according to the ditance we ue for deciding whether to accept imulated ummarie, to b 0. In thi econd cenario it may be poible to detect the model error by monitoring the cloene of the accepted ummarie to the oberved ummarie. If the model i correct, then the ditance between accepted and oberved ummarie tend to 0 with increaing n. Wherea in thi econd model error cenario, thee ditance will tend toward ome non-zero contant. 3 ABC Poterior and Poterior Mean We now conider tronger aymptotic reult for ABC. To obtain thee reult we need extra aumption in addition to thoe required for poterior concentration (ee Frazier et al., 2016; Li and Fearnhead, 2015, for full detail). The mot important of thee i that the ummary tatitic obey a central limit theorem n {Sn, b()} N {0, A()}, for ome d d poitive definite matrix A(). In the above central limit theorem we have aumed a 1/ n rate of convergence, but it i trivial to generalie thi (Li and Fearnhead, 2015). 3.1 ABC Poterior Under thi central limit aumption we firt conider convergence of the ABC poterior. Formal reult can be found in Frazier et al. (2016) (but ee alo Li and Fearnhead, 2016). Here we give an informal preentation of thee reult. To gain intuition about the limiting form of the ABC poterior, we can ue the fact from the previou ection that there i poterior concentration around 0. Thu aymptotically we need only conider the behaviour of the model for cloe to 0. Alo aymptotically the noie in the ummarie i Gauian. So if we make a linear approximation to b() for cloe to 0, our model will be well approximated by S n, = b( 0 ) + D 0 ( 0 ) + 1 n Z, where D 0 i the d p matrix of firt derivative of b() with repect to, with thee derivative evaluated at 0 ; and Z i a d-dimenional Gauian random variable with covariance matrix A( 0 ). Furthermore, for cloe to 0 the prior will be well approximated by a uniform prior. For the following we aume that D 0 i of rank p. Wilkinon (2013) how that the effect of the approximation in ABC, whereby we accept imulated ummarie which are imilar, but not identical, to the oberved ummary, i equivalent to performing 9

exact Bayeian inference under a different model. Thi different model ha additional additive noie, where the ditribution of the noie i given by the kernel, K( ), we ue in ABC. So if V i a d- dimenional random variable with denity K( ), independent of Z, then our ABC poterior will behave like the true poterior for the model S n, = b( 0 ) + D 0 ( 0 ) + 1 n Z + ɛ n V. (1) From Section 2.2, we know that the rate of concentration i the lower of the rate of the noie in the ummarie, 1/ n under our central limit theorem, and the bandwidth ɛ n. Thi mean that we get different limiting reult depending on whether ɛ n = O(1/ n) or not. Thi can be een from (1), a whether ɛ n = O(1/ n) or not will affect whether the ɛ n V noie term dominate or not. If nɛ n, o ɛ n i the lower rate, then to get convergence of the ABC poterior we need to conider the re-caled variable t = ( 0 )/ɛ n. If we further define S n, = {S n, b( 0 )}/ɛ n then we can re-write (1) a S n, = D 0 t + V + 1 ɛ n n Z D 0 t + V. Thu the limiting form of the ABC poterior i equivalent to the true poterior for thi model, given obervation ob = { ob b( 0 )}/ɛ n, with a uniform prior for t. The hape of thi poterior will be determined by the ABC kernel. If we ue the tandard uniform kernel, then the ABC poterior will aymptotically be uniform. By converting from t to we ee that the aymptotic variance for i O(1/ɛ 2 n) in thi cae. The other cae i that nɛ n c for ome poitive, finite contant c. In thi cae we conider the re-caled variable t = n( 0 ), and re-caled obervation S n, = n{s n, b( 0 )}. The ABC poterior will aymptotically be equivalent to the true poterior for t under a uniform prior, for a model S n, = D 0 t + Z + ɛ n nv D0 t + Z + cv, and given an obervation ob = n{ ob b( 0 )}. We make three obervation from thi. Firt if ɛ n = o(1/ n), o c = 0, then uing tandard reult for the poterior ditribution of a linear model, the ABC poterior for t will converge to a Gauian with mean { D T 0 A( 0 ) 1 D 0 ) } 1 D T 0 A( 0 ) 1 ob, (2) and variance I 1 where I = D0 T A( 0 ) 1 D 0. Thi i the ame limiting form a the true poterior given the ummarie. The matrix I can be viewed a an information matrix, and note that thi i larger if the derivative of the binding function, D 0, are larger; in line with the intuition we preented in Section 2.3. 10

Second if c 0, the ABC poterior will have a larger variance than the poterior given ummarie. Thi inflation of the ABC poterior variance will increae a c increae. In general it i hard to ay the form of the poterior, a it will depend on the ditribution of noie in our limiting model, Z + cv, which i a convolution of the limiting Gauian noie of the ummarie and a random variable drawn from the ABC kernel. Our final obervation i that we can get ome inight into the behaviour of the ABC poterior when c 0 if we aume a Gauian kernel, a again the limiting ABC poterior will be the true poterior for a linear a model with Gauian noie. If the Gauian kernel ha variance Σ, which correpond to meauring ditance between ummary tatitic uing the caled ditance x = x T Σ 1 x, then the ABC poterior for t will converge to a Gauian with mean { } D T 0 (A( 0 ) + c 2 Σ) 1 1 D 0 D T 0 {A( 0 ) + c 2 Σ} 1 ob (3) and variance, Ĩ 1, where Ĩ = D T 0 {A( 0 ) + c 2 Σ} 1 D 0. 3.2 ABC Poterior Mean We now conider the aymptotic ditribution of the ABC poterior mean. By thi we mean the frequentit ditribution, whereby we view the poterior mean a a function of the data, and look at the ditribution of thi under repeated ampling of the data. Formal reult appear in Li and Fearnhead (2015), but we will give informal reult, building on the reult we gave for the ABC poterior. We will focu on the cae where ɛ n = O(1/ n), but note that reult hold for the ituation where ɛ n decay more lowly; in fact Li and Fearnhead (2015) how that if ɛ n = o(n 3/10 ) then the ABC poterior mean will have the ame aymptotic ditribution a for the cae we conider, where ɛ n = O(1/ n). The reult we tated for the ABC poterior in ection 3.1 for the cae ɛ n = O(1/ n) included expreion for the poterior mean; ee (2) and (3). The latter expreion wa under the aumption of a Gauian kernel in ABC, but mot of the expoition we give below hold for a general kernel (ee Li and Fearnhead, 2015, for more detail). The firt of thee, (2), i the true poterior mean given the ummarie. Aymptotically our re-caled obervation ob ha a Gauian ditribution with mean 0 and variance A( 0 ) due to the central limit theorem aumption, and the poterior mean for t i a linear tranformation of ob. Thi immediately give that the aymptotic ditribution of the ABC poterior mean of t i Gauian with mean 0 and variance I 1. Equivalently, for large n, the ABC poterior mean for will be approximately normally ditributed with mean 0 and variance I 1 /n. 11

The cae where nɛ n c for ome c > 0 i more intereting. If we have d = p, o we have the ame number of ummarie a we have parameter, then D 0 i a quare matrix. Auming thi matrix i invertible, we ee that the ABC poterior mean implifie to D 1 0 ob. Alternatively if d > p but Σ = γa( 0 ) for ome calar γ > 0, o that the variance of our ABC kernel i proportional to the aymptotic variance of the noie in our ummary tatitic, then the ABC poterior mean again implifie; thi time to ( D T 0 A( 0 ) 1 D 0 ) 1 D T 0 A( 0 ) 1 ob. In both cae the expreion for the ABC poterior mean are the ame a for the c = 0 cae, and are identical to the true poterior mean given the ummarie. Thu the ABC poterior mean ha the ame limiting Gauian ditribution a the true poterior mean in thee cae. More generally for the c > 0 cae, the ABC poterior mean will be different from the true poterior mean given the ummarie. In particular the aymptotic variance of the ABC poterior mean can be greater than the aymptotic variance of the true poterior mean given the ummarie. Li and Fearnhead (2015) how that it i alway poible to project a d > p dimenional ummary to a p dimenional ummary uch that the aymptotic variance of the true poterior mean i not changed. Thi ugget uing uch a p dimenional ummary tatitic for ABC (ee Fearnhead and Prangle, 2012, for a different argument for chooing d = p). An alternative concluion from thee reult i to cale the ditance ued when deciding whether to accept or reject ummarie to be proportional an etimate of the variance of the noie in the ummarie. It i intereting to compare the aymptotic variance of the ABC poterior mean to the limiting value of the ABC poterior variance. Ideally thee would be the ame, a that implie that the ABC poterior i correctly quantifying uncertainty. We do get equality when ɛ n = o(1/ n); but in other cae we can ee that the ABC poterior variance i larger than the aymptotic variance of the ABC poterior mean, and thu ABC over-etimate uncertainty. We will return to thi in Section 5. 4 Monte Carlo Error The previou ection included reult on the aymptotic variance of the ABC poterior mean which give a meaure of accuracy of uing the ABC poterior mean a a point etimate for the parameter. In practice we cannot calculate the ABC poterior mean analytically and we need to ue output from a Monte Carlo algorithm, uch a the rejection ampler decribed in the introduction. A natural quetion i what effect doe the reulting Monte Carlo error have? And can we implement ABC in uch a way that, for a fixed Monte Carlo ample ize, the Monte Carlo etimate of the ABC poterior mean i an accurate point etimate? Or do we necearily require the Monte Carlo ample 12

ize to increae a n increae. Li and Fearnhead (2015) explore thee quetion. To do o they conider an importance ampling verion of the rejection ampling algorithm we previouly introduced. Thi algorithm require the pecification of a propoal ditribution for the parameter, q(), and involve iterating the following N time (IS1) Simulate a parameter from the propoal ditribution: i q(). (IS2) Simulate a ummary tatitic from the model given i : i f n ( i ). (IS3) If ob i < ɛ n accept i and aign it a weight proportional to π( i )/q( i ). The output i a et of, N acc ay, weighted parameter value which can be ued to etimate, for example, poterior mean. With a light abue of notation, if the accepted parameter value are denoted k and their weight w k for k = 1,..., N acc then we would etimate the poterior mean of by N 1 acc ˆ N = Nacc k=1 w w k k. k The ue of thi Monte Carlo etimator will inflate the error in our point etimate of the parameter by Var(ˆ N ), where we calculate variance with repect to randomne of the Monte Carlo algorithm. If the aymptotic variance of the ABC poterior mean i O(1/n) we would want the Monte Carlo variance to be O(1/(nN)). Thi would mean that the overall impact of the Monte Carlo error i to inflate the mean quare error of our etimator of the parameter by a factor 1 + O(1/N) (imilar to other likelihood free method; e.g. Gourieroux et al., 1993; Heggland and Frigei, 2004). Now the bet we can hope for with a rejection or importance ampler would be equally weighted, independent ample from the ABC poterior. The Monte Carlo variance of uch an algorithm would be proportional to the ABC poterior variance. Thu if we want the Monte Carlo variance to be O(1/n) then we need ɛ n = O(1/ n), a for lower rate the ABC poterior variance will decay more lowly than O(1/n). Thu we will focu on ɛ n = O(1/ n). The key limiting factor in term of the Monte Carlo error of our rejection or importance ampler i the acceptance probability. To have a Monte Carlo variance that i O(1/n) we will need an implementation whereby the acceptance probability i bounded away from 0 a n increae. To ee whether and how thi i poible we can examine the acceptance criteria in tep, (RS3) or (IS3): k=1 ob i = { ob b( 0 )} + {b( 0 ) b( i )} + {b( i ) i }. 13

We need thi ditance to have a non-negligible probability of being le than ɛ n. Now the firt and third bracketed term on the right-hand ide will be O p (1/ n) under our aumption for the central limit theorem for the ummarie. Thu thi ditance i at bet O p (1/ n), and if ɛ n = o(1/ n) the probability of the ditance being le than ɛ n hould tend to 0 a n increae. Thi ugget we need nɛ n c for ome c > 0. For thi choice, if we have a propoal which ha a reaonable probability of imulating value within O(1/ n) of 0, then we could expect the ditance to have a non-zero probability of being le than ɛ n a n increae. Thi rule out the rejection ampler, or any importance ampler with a pre-choen propoal ditribution. But an adaptive importance ampler that learn a good propoal ditribution (e.g. Sion et al., 2007; Beaumont et al., 2009; Peter et al., 2012) can have thi property. Note that uch an importance ampler would need a propoal ditribution for which the importance ampling weight are alo well-behaved. Li and Fearnhead (2015) give a family a propoal ditribution that have both an acceptance probability that i non-zero a n and have well-behaved importance ampling weight. Whilt Li and Fearnhead (2015) did not conider MCMC baed implementation of ABC (Marjoram et al., 2003; Bortot et al., 2007), the intuition behind the reult for the importance ampler ugget that we can implement uch algorithm in a way that the Monte Carlo variance will be O(1/(nN)). For example if we ue a random walk propoal ditribution with a variance that i O(1/n) then after convergence the propoed value will be a ditance O p (1/ n) away from 0 a required. Thu the acceptance probability hould be bounded away from 0 a n increae. Furthermore uch a caling i appropriate for a random walk propoal to efficiently explore a target whoe variance i O(1/n) (Robert et al., 2001). Note that care would be needed whilt the MCMC algorithm i converging to tationarity a the propoed parameter value at thi tage will be far away from 0. 5 The Benefit of Regreion Adjutment We finih thi chapter by briefly reviewing aymptotic reult for a popular verion of ABC which pot-procee the output of ABC uing regreion adjutment. Thi idea wa firt propoed by Beaumont et al. (2002) (ee Nott et al., 2014, for link to Baye linear method). We will tart with a brief decription, then how how uing regreion adjutment can enable the adjuted ABC poterior to have the ame aymptotic propertie a the true poterior given the ummarie, even if ɛ n decay lightly lower than 1/ n. Figure 5 provide an example of the ABC adjutment. The idea i to run an ABC algorithm that accept pair of parameter and ummarie. Denote thee by ( k, k ) for k = 1..., N acc. Thee are 14

hown in the top-left plot of Figure 5. We then fit p linear model that, in turn, aim to predict each component of the parameter vector from the ummarie. The output of thi fitting procedure i a p-dimenional vector ˆα, the intercept in the p linear model, and a p d matrix ˆB, whoe ijth entry i the coefficient of the j ummary tatitic in the linear model for etimating the ith component of. An example of uch fit i hown in the top-left hand plot of Figure 5. Thi fit i indicative of biae in our accepted which correpond to different value of the ummarie. In our example, the fit ugget that value accepted for maller, or larger, value of the ummary tatitic will, on average, be le then, or greater than, the true parameter value. We can then ue the fit to correct for thi bia. In particular we can adjut each of the accepted parameter value, to k for k = 1,..., N acc where k = k ˆB( k ob ). The adjuted parameter value are hown in the bottom-left plot of Figure 5, and a comparion of the ABC poterior before and after adjutment are hown in the bottom-right plot. From the latter we ee the adjuted ABC poterior ha a maller variance and ha more poterior ma cloe to the true parameter value. The vector ˆα and the matrix ˆB can be viewed a etimate of the vector α and the matrix B that minimie the expectation of ( ) 2 p d i α i B ij S j i=1 j=1 where expectation i with repect to parameter, ummary tatitic pair drawn from our ABC algorithm. Li and Fearnhead (2016) how that if we adjut our ABC output uing thi optimal B then, for any ɛ n = o(n 3/10 ), the adjuted ABC poterior ha the ame aymptotic limit a the true poterior given the ummarie. Obviouly the aymptotic ditribution of the mean of thi adjuted poterior will alo have the ame aymptotic ditribution a the mean of the true poterior given the ummarie. The intuition behind thi reult i that, aymptotically, if we chooe ɛ n = o(n 3/10 ), then our accepted ample will concentrate around the true parameter value. A we focu on an increaingly mall ball around the true parameter value, the binding function will be well approximated by the linear regreion model we are fitting. Thu the regreion correction tep i able to correct for the biae we obtain from accepting ummarie that are lightly different from the oberved ummary tatitic. From thi intuition we ee that a key requirement of our model, implicit within the aumption needed for the theoretical reult, i that the binding function i differentiable at the true parameter value: a uch a differentiability condition i needed for the linear regreion model to be accurate. 15

8 9 8 9 4.0 4.0 8 9 ABC Poterior 0.0 0.2 0.4 0.6 0.8 1.0 1.2 4.0 1.0 1.5 2.0 2.5 3.0 3.5 Figure 5: Example of the regreion correction procedure of Beaumont et al. (2002) for a ingle parameter, ingle ummary tatitic. Output of an ABC algorithm (top-left) howing accepted pair of parameter and ummary value (dot), the binding function for thi model (olid black line), and 0 and ob (red circle and alo blue vertical and red horizonal line repectively). Top-right: the fit from a linear model predicting the parameter value from the ummary (blue olid line). Bottomleft: the adjuted output (black dot; with original output in grey); we plot both old and adjuted parameter value againt original ummary tatitic value. Bottom-right: the ABC poterior baed on the original accepted parameter value (black olid line) and the adjuted value (red dahed line). 16

In practice we ue an etimate ˆB, and thi will inflate the aymptotic variance of the adjuted poterior mean by a factor that i 1+O(1/N acc ), a imilar effect to that of uing Monte Carlo draw to etimate the mean. Importantly we get thee trong aymptotic reult even when ɛ n decay more lowly than 1/ n. For uch a choice, for example ɛ n = O(n 1/3 ), and with a good importance ampling or MCMC implementation, the aymptotic acceptance rate of the algorithm will tend to 1 a n increae. 6 Dicuion The theoretical reult we have reviewed are poitive for ABC. If initially we ignore uing regreion adjutment, then the reult ugget that ABC with ɛ n = O(1/ n) and with an efficient adaptive importance ampling or MCMC algorithm will have performance that i cloe to that of uing the true poterior given the ummarie. Ignoring Monte Carlo error, the accuracy of uing the ABC poterior mean will be the ame a that of uing the true poterior mean if either we have the ame number of ummarie a parameter, or we chooe an appropriate Mahalanobi ditance for meauring the dicrepancy in ummary tatitic. However, for thi cenario the ABC poterior will over-etimate the uncertainty in our point etimate. The impact of Monte Carlo error will only be to inflate the aymptotic variance of our etimator by a factor 1 + O(1/N), where N i the Monte Carlo ample ize. We ugget that thi caling of the bandwidth, ɛ n = O(1/ n), i optimal if we do not ue regreion adjutment. Chooing either a fater or lower rate will reult in Monte Carlo error that will dominate. One way of achieving thi caling i by uing an adaptive importance ampling algorithm and fixing the proportion of ample to accept. Thu the theory upport the common practice of chooing the bandwidth indirectly in thi manner. Alo baed on thee reult, we ugget chooing the number of ummary tatitic to be cloe to, or equal to, the number of parameter, and chooing a ditance for meauring the dicrepancy in ummary tatitic that i baed on the variance of the ummary tatitic. In ituation where there are many potentially informative ummary tatitic then one of the many dimenion reduction approache, that try to contruct low dimenional ummarie that are information about the parameter, hould be ued (e.g. Wegmann et al., 2009; Fearnhead and Prangle, 2012; Blum et al., 2013; Prangle et al., 2014). The reult for ABC with regreion adjutment are tronger till. Thee how that the ABC poterior and it mean can have the ame aymptotic a the true ABC poterior and mean given the ummarie. Furthermore thi i poible with ɛ n decreaing more lowly than 1/ n, in which cae 17

the acceptance rate of a good ABC algorithm will increae a n increae. Thee trong reult ugget that regreion adjutment hould be routinely applied. One word of caution i that the regreion adjutment involve fitting a number of linear-model to predict the parameter from the ummarie. If a large number of ummarie are ued then the error in fitting thee model can be large (Fearnhead and Prangle, 2012) and lead to under-etimation of uncertainty in the adjuted poterior (Marin et al., 2016). Thi again ugget uing a mall number of ummary tatitic, cloe or equal to the number of parameter. Whilt the choice of bandwidth i crucial to the performance of ABC, and the choice of ditance can alo have an important impact on the aymptotic accuracy, the actual choice of kernel aymptotically ha little impact. It affect the form of the ABC poterior, but doe not affect the aymptotic variance of the ABC poterior mean (at leat under relatively mild condition). Thee aymptotic reult ignore any higher-order effect of the kernel that become negligible a n get large; o there may be ome mall advantage of one kernel over another for finite n, but thee are hard to quantify. Intuitively the uniform kernel eem the mot enible choice a for a fixed acceptance proportion it accept the ummarie cloet to the oberved. Furthermore in ituation where there i model error it i natural to conjecture that a kernel with bounded upport, uch a the uniform kernel, will be optimal. For uch a cae we want to only accept ummarie that are d 0 + O(1/ n), for ome contant ditance d 0 > 0, away from the oberved ummary (ee Figure 4). Thi i only poible for a kernel with bounded upport. Acknowledgement Thi work wa upported by EPSRC through the i-like programme grant. It alo benefitted from dicuion during the BIRS workhop on Validating and Expanding ABC Method in February 2017. Reference Beaumont, M. A., Zhang, W. and Balding, D. J. (2002). Approximate Bayeian computation in population genetic. Genetic 162, 2025 2035. Beaumont, M. A., Cornuet, J.-M., Marin, J.-M. and Robert, C. P. (2009). Adaptive approximate Bayeian computation. Biometrika 96(4), 983 990. Bernton, E., Jacob, P. E., Gerber, M. and Robert, C. P. (2017). Inference in generative model uing the Waertein ditance. arxiv:1701.05146. Blum, M. G., Nune, M. A., Prangle, D., Sion, S. A. et al. (2013). A comparative review of 18

dimenion reduction method in approximate Bayeian computation. Statitical Science 28(2), 189 208. Bortot, P., Cole, S. G. and Sion, S. A. (2007). Inference for tereological extreme. Journal of the American Statitical Aociation 102(477), 84 92. Drovandi, C. C., Pettitt, A. N., Lee, A. et al. (2015). Bayeian indirect inference uing a parametric auxiliary model. Statitical Science 30(1), 72 95. Fearnhead, P. and Prangle, D. (2012). Contructing ummary tatitic for approximate Bayeian computation: emi-automatic approximate Bayeian computation. Journal of the Royal Statitical Society: Serie B (Statitical Methodology) 74(3), 419 474. Frazier, D. T., Martin, G. M., Robert, C. P. and Roueau, J. (2016). Aymptotic Propertie of Approximate Bayeian Computation. arxiv.1607.06903. Frazier, D. T., Robert, C. P. and Roueau, J. (2017). Model mipecification in ABC: Conequence and diagnotic. In preparation. Gourieroux, C., Monfort, A. and Renault, E. (1993). Indirect inference. Journal of Applied Econometric 8(S1), S85 S118. Heggland, K. and Frigei, A. (2004). Etimating function in indirect inference. Journal of the Royal Statitical Society: Serie B 66, 447 462. Li, W. and Fearnhead, P. (2015). On the aymptotic efficiency of ABC etimator. arxiv:1506.03481. Li, W. and Fearnhead, P. (2016). Improved convergence of regreion adjuted Approximate Bayeian Computation. arxiv:1609.07135. Marin, J.-M., Pillai, N. S., Robert, C. P. and Roueau, J. (2014). Relevant tatitic for Bayeian model choice. Journal of the Royal Statitical Society: Serie B (Statitical Methodology) 76(5), 833 859. Marin, J.-M., Raynal, L., Pudlo, P., Ribatet, M. and Robert, C. P. (2016). ABC random foret for Bayeian parameter inference. arxiv.1605.05537. Marjoram, P., Molitor, J., Plagnol, V. and Tavare, S. (2003). Markov chain Monte Carlo without likelihood. Proceeding of the National Academy of Science 100, 15324 15328. Martin, G. M., McCabe, B. P., Maneeoonthorn, W. and Robert, C. P. (2016). Approximate Bayeian computation in tate pace model. arxiv:1409.8363. 19

Nott, D. J., Fan, Y., Marhall, L. and Sion, S. (2014). Approximate Bayeian computation and Baye linear analyi: toward high-dimenional ABC. Journal of Computational and Graphical Statitic 23(1), 65 86. Peter, G. W., Fan, Y. and Sion, S. A. (2012). On equential Monte Carlo, partial rejection control and approximate Bayeian computation. Statitic and Computing 22(6), 1209 1222. Prangle, D., Fearnhead, P., Cox, M. P., Bigg, P. J. and French, N. P. (2014). Semi-automatic election of ummary tatitic for ABC model choice. Statitical Application in Genetic and Molecular Biology 13(1), 67 82. Pritchard, J. K., Seieltad, M. T., Perez-Lezaun, A. and Feldman, M. W. (1999). Population growth of human Y chromoome: a tudy of Y chromoome microatellite. Molecular Biology and Evolution 16, 1791 1798. Robert, G. O., Roenthal, J. S. et al. (2001). Optimal caling for variou Metropoli-Hating algorithm. Statitical Science 16(4), 351 367. Ruli, E., Sartori, N. and Ventura, L. (2016). Approximate Bayeian computation with compoite core function. Statitic and Computing 26(3), 679 692. Sion, S. A., Fan, Y. and Tanaka, M. M. (2007). Sequential Monte Carlo without likelihood. Proceeding of the National Academy of Science 104(6), 1760 1765. Wegmann, D., Leuenberger, C. and Excoffier, L. (2009). Efficient approximate Bayeian computation coupled with Markov chain Monte Carlo without likelihood. Genetic 182(4), 1207 1218. Wilkinon, R. D. (2013). Approximate Bayeian computation (ABC) give exact reult under the aumption of model error. Statitical Application in Genetic and Molecular Biology 12(2), 129 141. 20