Bayesian Aggregation for Extraordinarily Large Dataset

Size: px
Start display at page:

Download "Bayesian Aggregation for Extraordinarily Large Dataset"

Transcription

1 Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University Department Seminar Statistics@LSE May 19, A Joint Work with Zuofeng Shang. Acknowledge NSF, ONR and Simons Foundation. 1 / 1

2 Big Data: large and complex There were 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days. Eric Schmidt, Google CEO ( ) 1 EB=10 18 bytes and 1 ZB= bytes 2 / 1

3 Big Data: large and complex 3 / 1

4 Bayesian Aggregation This generic aggregation procedure applies to both finite dimensional parameter and infinite dimensional parameter. Big Data (N) Super machine R oracle (α) Divide Machine 1 Subset 1 (n 1 ) Subset 2 (n 2 ) Subset s (n s ) Machine 2 Machine s R 1 (α) R 2 (α) R s (α) Aggre gate R(α) R oracle (α): (1 α) oracle credible region constructed from the entire data (computationally prohibitive in practice, though); R j (α): (1 α) credible region constructed from the j-th subset. 4 / 1

5 A Series of Theoretical Questions... How to define an aggregation rule s.t. R(α) covers (1 α) posterior mass, with the same radius as R oracle (α)? How to construct a prior s.t. R(α) covers the true parameter (generating the data) with probability (1 α)? How fast can we allow s to diverge ( splitotics theory )? The above tasks are particularly challenging when the parameter in consideration is infinite dimensional, which is the focus of our talk today. 5 / 1

6 A Series of Theoretical Questions... How to define an aggregation rule s.t. R(α) covers (1 α) posterior mass, with the same radius as R oracle (α)? How to construct a prior s.t. R(α) covers the true parameter (generating the data) with probability (1 α)? How fast can we allow s to diverge ( splitotics theory )? The above tasks are particularly challenging when the parameter in consideration is infinite dimensional, which is the focus of our talk today. 6 / 1

7 A Series of Theoretical Questions... How to define an aggregation rule s.t. R(α) covers (1 α) posterior mass, with the same radius as R oracle (α)? How to construct a prior s.t. R(α) covers the true parameter (generating the data) with probability (1 α)? How fast can we allow s to diverge ( splitotics theory )? The above tasks are particularly challenging when the parameter in consideration is infinite dimensional, which is the focus of our talk today. 7 / 1

8 A Series of Theoretical Questions... How to define an aggregation rule s.t. R(α) covers (1 α) posterior mass, with the same radius as R oracle (α)? How to construct a prior s.t. R(α) covers the true parameter (generating the data) with probability (1 α)? How fast can we allow s to diverge ( splitotics theory )? The above tasks are particularly challenging when the parameter in consideration is infinite dimensional, which is the focus of our talk today. 8 / 1

9 Literature Review In the Bayesian community, the existing statistical studies mostly focus on computational or methodological aspects of MCMC-based distributed methods; Nonetheless, not much effort has been devoted to theoretically understanding scalable Bayesian procedures especially in a general nonparametric context; One particular reason is the failure of Bernstein-von Mises theorem in the nonparametric setting found by Cox (1993) and Freedman (1999). 9 / 1

10 Literature Review In the Bayesian community, the existing statistical studies mostly focus on computational or methodological aspects of MCMC-based distributed methods; Nonetheless, not much effort has been devoted to theoretically understanding scalable Bayesian procedures especially in a general nonparametric context; One particular reason is the failure of Bernstein-von Mises theorem in the nonparametric setting found by Cox (1993) and Freedman (1999). 10 / 1

11 Literature Review In the Bayesian community, the existing statistical studies mostly focus on computational or methodological aspects of MCMC-based distributed methods; Nonetheless, not much effort has been devoted to theoretically understanding scalable Bayesian procedures especially in a general nonparametric context; One particular reason is the failure of Bernstein-von Mises theorem in the nonparametric setting found by Cox (1993) and Freedman (1999). 11 / 1

12 Outline 12 / 1

13 What is Bernstein-von Mises (BvM) Theorem? BvM theorem 2 characterizes asymptotic shape of posterior distribution d(π( D n ), P 0 ( )) 0 as n, where Π( D n ) represents a posterior measure based on sample D n with size n, P 0 ( ) is a limiting probability measure, and d denotes a distance measure; For example, in parametric models BvM Theorem says sup Π(B D n ) N ( θ n, (ni θ0 ) 1 )(B) = o P n θ0 (1), B B where B is the Borel algebra on R d. 2 Named after two mathematicians: S. Bernstein and R. von Mises. 13 / 1

14 What is Bernstein-von Mises (BvM) Theorem? BvM theorem 2 characterizes asymptotic shape of posterior distribution d(π( D n ), P 0 ( )) 0 as n, where Π( D n ) represents a posterior measure based on sample D n with size n, P 0 ( ) is a limiting probability measure, and d denotes a distance measure; For example, in parametric models BvM Theorem says sup Π(B D n ) N ( θ n, (ni θ0 ) 1 )(B) = o P n θ0 (1), B B where B is the Borel algebra on R d. 2 Named after two mathematicians: S. Bernstein and R. von Mises. 14 / 1

15 A Graphical Illustration More importantly, BvM theorem implies the frequentist validity of Bayesian credible sets, called as BvM phenomenon, as P n θ 0 (θ 0 (1 α)-th credible set) 1 α. Π( D n ) P 0 ( ) BvM Theorem MCMC BvM Phenomenon (1 α)-th credible set 15 / 1

16 Nonparametric BvM: a negative example Consider Gaussian sequence models: Y i = θ 0i + 1 n ɛ i, i = 1, 2,..., where ɛ i iid N(0, 1). The true mean sequence {θ 0i } i=1 is square-summable, i.e., i=1 θ2 0i < ; Assign a (very innocent) Gaussian Prior: P0: θ i N(0, i 2p ) for some p > 1/2. Freedman (1999) demonstrated the failure of BvM: P n θ 0 (θ 0 (1 α) credible set) 0. The credible set is based on l 2 -norm. 16 / 1

17 Nonparametric BvM: a negative example Consider Gaussian sequence models: Y i = θ 0i + 1 n ɛ i, i = 1, 2,..., where ɛ i iid N(0, 1). The true mean sequence {θ 0i } i=1 is square-summable, i.e., i=1 θ2 0i < ; Assign a (very innocent) Gaussian Prior: P0: θ i N(0, i 2p ) for some p > 1/2. Freedman (1999) demonstrated the failure of BvM: P n θ 0 (θ 0 (1 α) credible set) 0. The credible set is based on l 2 -norm. 17 / 1

18 Nonparametric BvM: a negative example Consider Gaussian sequence models: Y i = θ 0i + 1 n ɛ i, i = 1, 2,..., where ɛ i iid N(0, 1). The true mean sequence {θ 0i } i=1 is square-summable, i.e., i=1 θ2 0i < ; Assign a (very innocent) Gaussian Prior: P0: θ i N(0, i 2p ) for some p > 1/2. Freedman (1999) demonstrated the failure of BvM: P n θ 0 (θ 0 (1 α) credible set) 0. The credible set is based on l 2 -norm. 18 / 1

19 A Solution: Tuning Prior The power of smoothing spline (Wahba, 1990)! We will show that nonparametric BvM theorem can be rescured under a new class of Gaussian process (GP) priors motivated by smoothing spline, named as tuning prior ; Take Gaussian regression models as an example 3 : Y i = f 0 (X i ) + ɛ i, i = 1, 2,..., n, where ɛ i iid N(0, 1) and f H m (0, 1), a m-th order Sobolev space. Denote its log-likelihood function as l n (f) = n (Y i f(x i )) 2 /2. i=1 3 Our nonparametric BvM results hold in a general exponential family. No conjugacy is needed. 19 / 1

20 A Solution: Tuning Prior The power of smoothing spline (Wahba, 1990)! We will show that nonparametric BvM theorem can be rescured under a new class of Gaussian process (GP) priors motivated by smoothing spline, named as tuning prior ; Take Gaussian regression models as an example 3 : Y i = f 0 (X i ) + ɛ i, i = 1, 2,..., n, where ɛ i iid N(0, 1) and f H m (0, 1), a m-th order Sobolev space. Denote its log-likelihood function as l n (f) = n (Y i f(x i )) 2 /2. i=1 3 Our nonparametric BvM results hold in a general exponential family. No conjugacy is needed. 20 / 1

21 A Solution: Tuning Prior The power of smoothing spline (Wahba, 1990)! We will show that nonparametric BvM theorem can be rescured under a new class of Gaussian process (GP) priors motivated by smoothing spline, named as tuning prior ; Take Gaussian regression models as an example 3 : Y i = f 0 (X i ) + ɛ i, i = 1, 2,..., n, where ɛ i iid N(0, 1) and f H m (0, 1), a m-th order Sobolev space. Denote its log-likelihood function as l n (f) = n (Y i f(x i )) 2 /2. i=1 3 Our nonparametric BvM results hold in a general exponential family. No conjugacy is needed. 21 / 1

22 A Solution: Tuning Prior The power of smoothing spline (Wahba, 1990)! We will show that nonparametric BvM theorem can be rescured under a new class of Gaussian process (GP) priors motivated by smoothing spline, named as tuning prior ; Take Gaussian regression models as an example 3 : Y i = f 0 (X i ) + ɛ i, i = 1, 2,..., n, where ɛ i iid N(0, 1) and f H m (0, 1), a m-th order Sobolev space. Denote its log-likelihood function as l n (f) = n (Y i f(x i )) 2 /2. i=1 3 Our nonparametric BvM results hold in a general exponential family. No conjugacy is needed. 22 / 1

23 Tuning Prior: A General Framework Assume that f follows a probability measure Π λ ; Specify Π λ through its Radon-Nikodym derivative w.r.t. a base measure Π (also on H m (0, 1)) as follows: ( dπ λ (f) exp nλ ) dπ 2 J(f), (1.1) where J(f) is a type of roughness penalty used in smoothing spline literature. 23 / 1

24 Tuning Prior: A General Framework Assume that f follows a probability measure Π λ ; Specify Π λ through its Radon-Nikodym derivative w.r.t. a base measure Π (also on H m (0, 1)) as follows: ( dπ λ (f) exp nλ ) dπ 2 J(f), (1.1) where J(f) is a type of roughness penalty used in smoothing spline literature. 24 / 1

25 Tuning Prior: Duality Based on (1.1), we have the posterior as P (f D n ) := = exp(l n (f))dπ λ (f) H m (0,1) exp(l n(f))dπ λ (f) exp(l n,λ (f))dπ(f) H m (0,1) exp(l n,λ(f))dπ(f), where l n,λ (f) = l n (f) nλj(f). Smoothing spline estimate f n,λ := arg max l n,λ(f); f H m (0,1) The name tuning prior now makes sense. So, we can employ GCV to select a proper tuning prior (and we did!); More importantly, we are able to borrow the recent advances in smoothing spline inference theory (Shang and C., 2013, AoS) to build a foundation of nonpara. BvM. 25 / 1

26 Tuning Prior: Duality Based on (1.1), we have the posterior as P (f D n ) := = exp(l n (f))dπ λ (f) H m (0,1) exp(l n(f))dπ λ (f) exp(l n,λ (f))dπ(f) H m (0,1) exp(l n,λ(f))dπ(f), where l n,λ (f) = l n (f) nλj(f). Smoothing spline estimate f n,λ := arg max l n,λ(f); f H m (0,1) The name tuning prior now makes sense. So, we can employ GCV to select a proper tuning prior (and we did!); More importantly, we are able to borrow the recent advances in smoothing spline inference theory (Shang and C., 2013, AoS) to build a foundation of nonpara. BvM. 26 / 1

27 Tuning Prior: Duality Based on (1.1), we have the posterior as P (f D n ) := = exp(l n (f))dπ λ (f) H m (0,1) exp(l n(f))dπ λ (f) exp(l n,λ (f))dπ(f) H m (0,1) exp(l n,λ(f))dπ(f), where l n,λ (f) = l n (f) nλj(f). Smoothing spline estimate f n,λ := arg max l n,λ(f); f H m (0,1) The name tuning prior now makes sense. So, we can employ GCV to select a proper tuning prior (and we did!); More importantly, we are able to borrow the recent advances in smoothing spline inference theory (Shang and C., 2013, AoS) to build a foundation of nonpara. BvM. 27 / 1

28 Tuning Prior: Gaussian Process Construction To satisfy (1.1), we choose Π λ and Π as two Gaussian measures induced by GP priors as specified below (this can be verified by applying Hájek s Lemma); Assign a GP prior on f, i.e., Π λ, as follows: f G λ ( ) = w ν ϕ ν ( ), ν=1 where (recall that m is the smoothness of f 0 ) { N(0, 1), ν = 1,..., m w ν ( ) N 0, (ρ 1+β/2m ν + nλρ ν ) 1, ν > m, for a sequence ρ ν ν 2m ; Π is induced by a similar GP (by setting λ = 0). 28 / 1

29 Tuning Prior: Gaussian Process Construction To satisfy (1.1), we choose Π λ and Π as two Gaussian measures induced by GP priors as specified below (this can be verified by applying Hájek s Lemma); Assign a GP prior on f, i.e., Π λ, as follows: f G λ ( ) = w ν ϕ ν ( ), ν=1 where (recall that m is the smoothness of f 0 ) { N(0, 1), ν = 1,..., m w ν ( ) N 0, (ρ 1+β/2m ν + nλρ ν ) 1, ν > m, for a sequence ρ ν ν 2m ; Π is induced by a similar GP (by setting λ = 0). 29 / 1

30 Tuning Prior: Gaussian Process Construction To satisfy (1.1), we choose Π λ and Π as two Gaussian measures induced by GP priors as specified below (this can be verified by applying Hájek s Lemma); Assign a GP prior on f, i.e., Π λ, as follows: f G λ ( ) = w ν ϕ ν ( ), ν=1 where (recall that m is the smoothness of f 0 ) { N(0, 1), ν = 1,..., m w ν ( ) N 0, (ρ 1+β/2m ν + nλρ ν ) 1, ν > m, for a sequence ρ ν ν 2m ; Π is induced by a similar GP (by setting λ = 0). 30 / 1

31 Our construction of GP prior is motivated from Wahba Bayesian view on smoothing spline (Wahba, 1990); The RKHS induced by G λ is essentially H m+β/2 (0, 1), where β adjusts the prior support; In addition, we need to assume β (1, 2m + 1) to guarantee E{J(G λ, G λ )} < such that the sample path of G λ belongs to H m (0, 1) a.s.. 31 / 1

32 Our construction of GP prior is motivated from Wahba Bayesian view on smoothing spline (Wahba, 1990); The RKHS induced by G λ is essentially H m+β/2 (0, 1), where β adjusts the prior support; In addition, we need to assume β (1, 2m + 1) to guarantee E{J(G λ, G λ )} < such that the sample path of G λ belongs to H m (0, 1) a.s.. 32 / 1

33 Our construction of GP prior is motivated from Wahba Bayesian view on smoothing spline (Wahba, 1990); The RKHS induced by G λ is essentially H m+β/2 (0, 1), where β adjusts the prior support; In addition, we need to assume β (1, 2m + 1) to guarantee E{J(G λ, G λ )} < such that the sample path of G λ belongs to H m (0, 1) a.s.. 33 / 1

34 Underlying Eigensystem (ϕ ν ( ), ρ ν ) Under mild conditions, f admits a Fourier expansion: f( ) = f ν ϕ ν ( ), ν=1 where ϕ ν ( ) s are basis functions in H m (0, 1). An example for (ϕ ν, ρ ν ) is the following ODE solution: ϕ (2m) ν ( ) = ρ ν ϕ ν ( ), ϕ (j) (0) = ϕ (j) (1) = 0, j = 2,..., 2m 1, ν where ϕ ν s have closed forms. This is also called as uniform free beam problem in physics. ν 34 / 1

35 Underlying Eigensystem (ϕ ν ( ), ρ ν ) Under mild conditions, f admits a Fourier expansion: f( ) = f ν ϕ ν ( ), ν=1 where ϕ ν ( ) s are basis functions in H m (0, 1). An example for (ϕ ν, ρ ν ) is the following ODE solution: ϕ (2m) ν ( ) = ρ ν ϕ ν ( ), ϕ (j) (0) = ϕ (j) (1) = 0, j = 2,..., 2m 1, ν where ϕ ν s have closed forms. This is also called as uniform free beam problem in physics. ν 35 / 1

36 Nonparametric BvM theorem Theorem 1 Given that λ n 2m/(2m+β), we have sup P (S D n ) Π W (S) = o P n f0 (1), S H m (0,1) where Π W ( ) is the probability measure induced by a GP W. 36 / 1

37 Specifications of the Limiting GP W Suppose that f n,λ ( ) = ν=0 f n,ν ϕ ν ( ); The mean function of W (also the approximate posterior mode of P ( D n )) is f n,λ := a n,ν fn,ν ϕ ν ( ). ν=0 Hence, f n,λ f n,λ (but very close); The mean-zero GP W n := W f n,λ is expressed as W n ( ) = ν=0 b n,ν z ν ϕ ν ( ) and z ν iid N(0, 1); Here, a n,ν and b n,ν are both non-random sequences. 37 / 1

38 Specifications of the Limiting GP W Suppose that f n,λ ( ) = ν=0 f n,ν ϕ ν ( ); The mean function of W (also the approximate posterior mode of P ( D n )) is f n,λ := a n,ν fn,ν ϕ ν ( ). ν=0 Hence, f n,λ f n,λ (but very close); The mean-zero GP W n := W f n,λ is expressed as W n ( ) = ν=0 b n,ν z ν ϕ ν ( ) and z ν iid N(0, 1); Here, a n,ν and b n,ν are both non-random sequences. 38 / 1

39 Specifications of the Limiting GP W Suppose that f n,λ ( ) = ν=0 f n,ν ϕ ν ( ); The mean function of W (also the approximate posterior mode of P ( D n )) is f n,λ := a n,ν fn,ν ϕ ν ( ). ν=0 Hence, f n,λ f n,λ (but very close); The mean-zero GP W n := W f n,λ is expressed as W n ( ) = ν=0 b n,ν z ν ϕ ν ( ) and z ν iid N(0, 1); Here, a n,ν and b n,ν are both non-random sequences. 39 / 1

40 Specifications of the Limiting GP W Suppose that f n,λ ( ) = ν=0 f n,ν ϕ ν ( ); The mean function of W (also the approximate posterior mode of P ( D n )) is f n,λ := a n,ν fn,ν ϕ ν ( ). ν=0 Hence, f n,λ f n,λ (but very close); The mean-zero GP W n := W f n,λ is expressed as W n ( ) = ν=0 b n,ν z ν ϕ ν ( ) and z ν iid N(0, 1); Here, a n,ν and b n,ν are both non-random sequences. 40 / 1

41 Recall Bayesian Aggregation Big Data (N) Super machine R oracle (α) Divide Machine 1 Subset 1 (n) Subset 2 (n) Subset s (n) Machine 2 Machine s R 1,n (α) R 2,n (α) R s,n (α) Aggre gate R N (α) Note that N = s n. Both n and s are allowed to diverge. 41 / 1

42 Uniform Nonparametric BvM Theorem Uniform BvM theorem characterizes limit shapes of a sequence of s nonparametric posterior distributions (under proper tuning priors) as long as s does not grow too fast. Theorem 2 Given that λ N 2m/(2m+β) (used in each subset with size n), we have sup S H m (0,1) max P (S D j,n) Π Wj (S) = o P n 1 j s f0 (1) as long as s does not grow faster than N (β 1)/(2m+β). 42 / 1

43 Aggregated Credible Interval The j-th credible ball is defined as R j,n (α) = {f H m (0, 1) : f f j,n 2 r j,n (α)}, where the radius r j,n (α) is directly obtained via MCMC; The aggregated credible ball is constructed as R N (α) = {f H m (0, 1) : f f N,λ 2 r N (α)}; As will be seen, the aggregation step is through weighted averaging Fourier frequencies and weighted averaging individual radii. No additional computation is needed. 43 / 1

44 Aggregated Credible Interval The j-th credible ball is defined as R j,n (α) = {f H m (0, 1) : f f j,n 2 r j,n (α)}, where the radius r j,n (α) is directly obtained via MCMC; The aggregated credible ball is constructed as R N (α) = {f H m (0, 1) : f f N,λ 2 r N (α)}; As will be seen, the aggregation step is through weighted averaging Fourier frequencies and weighted averaging individual radii. No additional computation is needed. 44 / 1

45 Aggregated Credible Interval The j-th credible ball is defined as R j,n (α) = {f H m (0, 1) : f f j,n 2 r j,n (α)}, where the radius r j,n (α) is directly obtained via MCMC; The aggregated credible ball is constructed as R N (α) = {f H m (0, 1) : f f N,λ 2 r N (α)}; As will be seen, the aggregation step is through weighted averaging Fourier frequencies and weighted averaging individual radii. No additional computation is needed. 45 / 1

46 Implication of Uniform BvM Theorem Uniform BvM shows that R N (α) (asymptotically) covers (1 α) posterior mass and also possesses frequentist validity as long as λ N 2m/(2m+β) and s = o(n (β 1/(2m+β)) ). 46 / 1

47 Aggregation Details Aggregated center: f N,λ ( ) = a N,ν fν ϕ ν ( ) and ν=1 Aggregated radius: r N (α) = 1 ζ 1,N + N where ζ k,n = ζ 2,N n ζ 2,n s f ν = (1/s) s j=1 f (j) n,ν; s rj,n 2 (α) ζ 1,n, j=1 ( n ) k. τν 2 + n(1 + λρ ν ) ν=1 In fact, the aggregated radius r N is (asymptotically) the same as that of oracle credible ball; see simulations. 47 / 1

48 Aggregation Details Aggregated center: f N,λ ( ) = a N,ν fν ϕ ν ( ) and ν=1 Aggregated radius: r N (α) = 1 ζ 1,N + N where ζ k,n = ζ 2,N n ζ 2,n s f ν = (1/s) s j=1 f (j) n,ν; s rj,n 2 (α) ζ 1,n, j=1 ( n ) k. τν 2 + n(1 + λρ ν ) ν=1 In fact, the aggregated radius r N is (asymptotically) the same as that of oracle credible ball; see simulations. 48 / 1

49 Aggregation Details Aggregated center: f N,λ ( ) = a N,ν fν ϕ ν ( ) and ν=1 Aggregated radius: r N (α) = 1 ζ 1,N + N where ζ k,n = ζ 2,N n ζ 2,n s f ν = (1/s) s j=1 f (j) n,ν; s rj,n 2 (α) ζ 1,n, j=1 ( n ) k. τν 2 + n(1 + λρ ν ) ν=1 In fact, the aggregated radius r N is (asymptotically) the same as that of oracle credible ball; see simulations. 49 / 1

50 Aggregated Credible Interval We can also aggregate individual credible intervals for linear functionals of f, denoted as F (f). Two examples: Evaluation functional: F z (f) = f(z); Integral functional: F ω (f) = 1 f(z)ω(z)dz for a known 0 function ω( ) such as an indicator function; Individual credible interval for F (f): CI F j,n(α) := {f S m (I) : F (f) F ( f j,n ) r F,j,n (α)}; The aggregated version is constructed as CI F N(α) := {f S m (I) : F (f) F ( f N,λ ) r F,N (α)}, where r F,N (α) is a weighted l 2 average of r F,j,n (α) s. 50 / 1

51 Aggregated Credible Interval We can also aggregate individual credible intervals for linear functionals of f, denoted as F (f). Two examples: Evaluation functional: F z (f) = f(z); Integral functional: F ω (f) = 1 f(z)ω(z)dz for a known 0 function ω( ) such as an indicator function; Individual credible interval for F (f): CI F j,n(α) := {f S m (I) : F (f) F ( f j,n ) r F,j,n (α)}; The aggregated version is constructed as CI F N(α) := {f S m (I) : F (f) F ( f N,λ ) r F,N (α)}, where r F,N (α) is a weighted l 2 average of r F,j,n (α) s. 51 / 1

52 Aggregated Credible Interval We can also aggregate individual credible intervals for linear functionals of f, denoted as F (f). Two examples: Evaluation functional: F z (f) = f(z); Integral functional: F ω (f) = 1 f(z)ω(z)dz for a known 0 function ω( ) such as an indicator function; Individual credible interval for F (f): CI F j,n(α) := {f S m (I) : F (f) F ( f j,n ) r F,j,n (α)}; The aggregated version is constructed as CI F N(α) := {f S m (I) : F (f) F ( f N,λ ) r F,N (α)}, where r F,N (α) is a weighted l 2 average of r F,j,n (α) s. 52 / 1

53 Aggregated Credible Interval We can also aggregate individual credible intervals for linear functionals of f, denoted as F (f). Two examples: Evaluation functional: F z (f) = f(z); Integral functional: F ω (f) = 1 f(z)ω(z)dz for a known 0 function ω( ) such as an indicator function; Individual credible interval for F (f): CI F j,n(α) := {f S m (I) : F (f) F ( f j,n ) r F,j,n (α)}; The aggregated version is constructed as CI F N(α) := {f S m (I) : F (f) F ( f N,λ ) r F,N (α)}, where r F,N (α) is a weighted l 2 average of r F,j,n (α) s. 53 / 1

54 Aggregated Credible Interval We can also aggregate individual credible intervals for linear functionals of f, denoted as F (f). Two examples: Evaluation functional: F z (f) = f(z); Integral functional: F ω (f) = 1 f(z)ω(z)dz for a known 0 function ω( ) such as an indicator function; Individual credible interval for F (f): CI F j,n(α) := {f S m (I) : F (f) F ( f j,n ) r F,j,n (α)}; The aggregated version is constructed as CI F N(α) := {f S m (I) : F (f) F ( f N,λ ) r F,N (α)}, where r F,N (α) is a weighted l 2 average of r F,j,n (α) s. 54 / 1

55 Aggregated Credible Interval We can also aggregate individual credible intervals for linear functionals of f, denoted as F (f). Two examples: Evaluation functional: F z (f) = f(z); Integral functional: F ω (f) = 1 f(z)ω(z)dz for a known 0 function ω( ) such as an indicator function; Individual credible interval for F (f): CI F j,n(α) := {f S m (I) : F (f) F ( f j,n ) r F,j,n (α)}; The aggregated version is constructed as CI F N(α) := {f S m (I) : F (f) F ( f N,λ ) r F,N (α)}, where r F,N (α) is a weighted l 2 average of r F,j,n (α) s. 55 / 1

56 A Series of Theoretical Questions... How to define an aggregation rule s.t. R(α) covers (1 α) posterior mass, with the same radius as R oracle (α)? Weighted averaging individual centers (in terms of their Fourier coefficients) and radii by analytical formula. How to construct a prior s.t. R(α) covers the true parameter (generating the data) with probability (1 α)? Pick a proper tuning prior by GCV. How fast can we allow s to diverge? s cannot grow faster than a rate jointly determined by the smoothness of f 0 and the smoothness of GP prior. 56 / 1

57 A Series of Theoretical Questions... How to define an aggregation rule s.t. R(α) covers (1 α) posterior mass, with the same radius as R oracle (α)? Weighted averaging individual centers (in terms of their Fourier coefficients) and radii by analytical formula. How to construct a prior s.t. R(α) covers the true parameter (generating the data) with probability (1 α)? Pick a proper tuning prior by GCV. How fast can we allow s to diverge? s cannot grow faster than a rate jointly determined by the smoothness of f 0 and the smoothness of GP prior. 57 / 1

58 A Series of Theoretical Questions... How to define an aggregation rule s.t. R(α) covers (1 α) posterior mass, with the same radius as R oracle (α)? Weighted averaging individual centers (in terms of their Fourier coefficients) and radii by analytical formula. How to construct a prior s.t. R(α) covers the true parameter (generating the data) with probability (1 α)? Pick a proper tuning prior by GCV. How fast can we allow s to diverge? s cannot grow faster than a rate jointly determined by the smoothness of f 0 and the smoothness of GP prior. 58 / 1

59 Simulations Gaussian regression models: where ɛ N(0, 1) and Y = f 0 (X) + ɛ, f 0 (x) = 3β 30,17 (x) + 2β 3,11 (x), where β a,b is the pdf of Beta distribution. Set m = 2; Assign a tuning prior with β = 2 and λ being selected by GCV as follows; Let λ GCV be the GCV-selected tuning parameter with the order N 2m/(2m+1) by applying to the entire data (A practical formula needs to be developed here). Set λ as λ (2m+1)/(2m+β) GCV to match with the order N 2m/(2m+β). 59 / 1

60 Simulations Gaussian regression models: where ɛ N(0, 1) and Y = f 0 (X) + ɛ, f 0 (x) = 3β 30,17 (x) + 2β 3,11 (x), where β a,b is the pdf of Beta distribution. Set m = 2; Assign a tuning prior with β = 2 and λ being selected by GCV as follows; Let λ GCV be the GCV-selected tuning parameter with the order N 2m/(2m+1) by applying to the entire data (A practical formula needs to be developed here). Set λ as λ (2m+1)/(2m+β) GCV to match with the order N 2m/(2m+β). 60 / 1

61 Simulations Gaussian regression models: where ɛ N(0, 1) and Y = f 0 (X) + ɛ, f 0 (x) = 3β 30,17 (x) + 2β 3,11 (x), where β a,b is the pdf of Beta distribution. Set m = 2; Assign a tuning prior with β = 2 and λ being selected by GCV as follows; Let λ GCV be the GCV-selected tuning parameter with the order N 2m/(2m+1) by applying to the entire data (A practical formula needs to be developed here). Set λ as λ (2m+1)/(2m+β) GCV to match with the order N 2m/(2m+β). 61 / 1

62 f x Figure 1: Plot of the true function f / 1

63 Computing Time ρ N=1200 ρ ρ N= ACR FCR ACR FCR γ γ N= ACR FCR γ Figure 2: ρ versus γ based on FCR and ACR, where ρ = (T 0 T )/T 0, T 0 is computing time based on big data and T is the D&C time. And, γ = log s/ log N describes the growth of s. 63 / 1

64 Phase Transition: Coverage Probability 1 α = α = 0.90 CP ACR FCR CP γ γ 1 α = α = 0.50 CP CP γ γ 1 α = α = 0.10 CP CP γ γ Figure 3: Frequentist coverage probability (CP) of R N (α) against γ for N = Red-dotted line indicates the position of 1 α. 64 / 1

65 Phase Transition: Radius radius γ Figure 4: Radius of R N (α) against γ for various α. 65 / 1

66 Thanks for your attention. Questions are welcome. 66 / 1

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Nonparametric Inference In Functional Data

Nonparametric Inference In Functional Data Nonparametric Inference In Functional Data Zuofeng Shang Purdue University Joint work with Guang Cheng from Purdue Univ. An Example Consider the functional linear model: Y = α + where 1 0 X(t)β(t)dt +

More information

Semiparametric posterior limits

Semiparametric posterior limits Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations

More information

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Quantile Regression for Extraordinarily Large Data

Quantile Regression for Extraordinarily Large Data Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step

More information

A semiparametric Bernstein - von Mises theorem for Gaussian process priors

A semiparametric Bernstein - von Mises theorem for Gaussian process priors Noname manuscript No. (will be inserted by the editor) A semiparametric Bernstein - von Mises theorem for Gaussian process priors Ismaël Castillo Received: date / Accepted: date Abstract This paper is

More information

Regularization in Reproducing Kernel Banach Spaces

Regularization in Reproducing Kernel Banach Spaces .... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred

More information

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Bayesian Modeling of Conditional Distributions

Bayesian Modeling of Conditional Distributions Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

Uncertainty Quantification for Inverse Problems. November 7, 2011

Uncertainty Quantification for Inverse Problems. November 7, 2011 Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance

More information

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling Min-ge Xie Department of Statistics, Rutgers University Workshop on Higher-Order Asymptotics

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

An inverse of Sanov s theorem

An inverse of Sanov s theorem An inverse of Sanov s theorem Ayalvadi Ganesh and Neil O Connell BRIMS, Hewlett-Packard Labs, Bristol Abstract Let X k be a sequence of iid random variables taking values in a finite set, and consider

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Nonparametric Bayesian Uncertainty Quantification

Nonparametric Bayesian Uncertainty Quantification Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery

More information

Confidence Distribution

Confidence Distribution Confidence Distribution Xie and Singh (2013): Confidence distribution, the frequentist distribution estimator of a parameter: A Review Céline Cunen, 15/09/2014 Outline of Article Introduction The concept

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging

A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging Cheng Li DSAP, National University of Singapore Joint work with Rajarshi Guhaniyogi (UC Santa Cruz), Terrance D. Savitsky (US Bureau of Labor

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Priors for the frequentist, consistency beyond Schwartz

Priors for the frequentist, consistency beyond Schwartz Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist

More information

Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation

Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation Curtis B. Storlie a a Los Alamos National Laboratory E-mail:storlie@lanl.gov Outline Reduction of Emulator

More information

POWER SERIES AND ANALYTIC CONTINUATION

POWER SERIES AND ANALYTIC CONTINUATION POWER SERIES AND ANALYTIC CONTINUATION 1. Analytic functions Definition 1.1. A function f : Ω C C is complex-analytic if for each z 0 Ω there exists a power series f z0 (z) := a n (z z 0 ) n which converges

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Bayesian mixture modeling for spectral density estimation

Bayesian mixture modeling for spectral density estimation Bayesian mixture modeling for spectral density estimation Annalisa Cadonna a,, Athanasios Kottas a, Raquel Prado a a Department of Applied Mathematics and Statistics, University of California at Santa

More information

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam.  aad. Bayesian Adaptation p. 1/4 Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection

More information

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:

More information

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression Communications of the Korean Statistical Society 2009, Vol. 16, No. 4, 713 722 Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression Young-Ju Kim 1,a a Department of Information

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Nonparametric Heterogeneity Testing for Massive Data

Nonparametric Heterogeneity Testing for Massive Data Nonparametric Heterogeneity Testing for Massive Data Abstract A massive dataset often consists of a growing number of potentially heterogeneous sub-populations. This paper is concerned about testing various

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,

More information

3. Some tools for the analysis of sequential strategies based on a Gaussian process prior

3. Some tools for the analysis of sequential strategies based on a Gaussian process prior 3. Some tools for the analysis of sequential strategies based on a Gaussian process prior E. Vazquez Computer experiments June 21-22, 2010, Paris 21 / 34 Function approximation with a Gaussian prior Aim:

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

Spline Density Estimation and Inference with Model-Based Penalities

Spline Density Estimation and Inference with Model-Based Penalities Spline Density Estimation and Inference with Model-Based Penalities December 7, 016 Abstract In this paper we propose model-based penalties for smoothing spline density estimation and inference. These

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Simultaneous reconstruction of outer boundary shape and conductivity distribution in electrical impedance tomography

Simultaneous reconstruction of outer boundary shape and conductivity distribution in electrical impedance tomography Simultaneous reconstruction of outer boundary shape and conductivity distribution in electrical impedance tomography Nuutti Hyvönen Aalto University nuutti.hyvonen@aalto.fi joint work with J. Dardé, A.

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO. X, XXXX

IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO. X, XXXX IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING, VOL XX, NO X, XXXX 2018 1 Bayesian Nonparametric Causal Inference: Information Rates and Learning Algorithms Ahmed M Alaa, Member, IEEE, and Mihaela

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

arxiv: v2 [stat.me] 24 Sep 2018

arxiv: v2 [stat.me] 24 Sep 2018 ANOTHER LOOK AT STATISTICAL CALIBRATION: A NON-ASYMPTOTIC THEORY AND PREDICTION-ORIENTED OPTIMALITY arxiv:1802.00021v2 [stat.me] 24 Sep 2018 By Xiaowu Dai, and Peter Chien University of Wisconsin-Madison

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

BAYESIAN DETECTION OF IMAGE BOUNDARIES

BAYESIAN DETECTION OF IMAGE BOUNDARIES Submitted to the Annals of Statistics arxiv: arxiv:1508.05847 BAYESIAN DETECTION OF IMAGE BOUNDARIES By Meng Li and Subhashis Ghosal Duke University and North Carolina State University arxiv:1508.05847v3

More information

Lecture 4: Lower Bounds (ending); Thompson Sampling

Lecture 4: Lower Bounds (ending); Thompson Sampling CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from

More information

Density Estimation: ML, MAP, Bayesian estimation

Density Estimation: ML, MAP, Bayesian estimation Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum

More information

Today. Calculus. Linear Regression. Lagrange Multipliers

Today. Calculus. Linear Regression. Lagrange Multipliers Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject

More information

Penalized Barycenters in the Wasserstein space

Penalized Barycenters in the Wasserstein space Penalized Barycenters in the Wasserstein space Elsa Cazelles, joint work with Jérémie Bigot & Nicolas Papadakis Université de Bordeaux & CNRS Journées IOP - Du 5 au 8 Juillet 2017 Bordeaux Elsa Cazelles

More information

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department

More information

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Compressive Inference

Compressive Inference Compressive Inference Weihong Guo and Dan Yang Case Western Reserve University and SAMSI SAMSI transition workshop Project of Compressive Inference subgroup of Imaging WG Active members: Garvesh Raskutti,

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Estimation of Quantiles

Estimation of Quantiles 9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles

More information

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates : A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.

More information

Bickel Rosenblatt test

Bickel Rosenblatt test University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance

More information

Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n

Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Jiantao Jiao (Stanford EE) Joint work with: Kartik Venkat Yanjun Han Tsachy Weissman Stanford EE Tsinghua

More information

A Representation Approach for Relative Entropy Minimization with Expectation Constraints

A Representation Approach for Relative Entropy Minimization with Expectation Constraints A Representation Approach for Relative Entropy Minimization with Expectation Constraints Oluwasanmi Koyejo sanmi.k@utexas.edu Department of Electrical and Computer Engineering, University of Texas, Austin,

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

A PECULIAR COIN-TOSSING MODEL

A PECULIAR COIN-TOSSING MODEL A PECULIAR COIN-TOSSING MODEL EDWARD J. GREEN 1. Coin tossing according to de Finetti A coin is drawn at random from a finite set of coins. Each coin generates an i.i.d. sequence of outcomes (heads or

More information

Introduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems

Introduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems Introduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems Anne Sabourin, Ass. Prof., Telecom ParisTech September 2017 1/78 1. Lecture 1 Cont d : Conjugate priors and exponential

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2017-0074.R2 Title The semi-parametric Bernstein-von Mises theorem for regression models with symmetric errors Manuscript ID SS-2017-0074.R2 URL http://www.stat.sinica.edu.tw/statistica/

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Efficient Complex Output Prediction

Efficient Complex Output Prediction Efficient Complex Output Prediction Florence d Alché-Buc Joint work with Romain Brault, Alex Lambert, Maxime Sangnier October 12, 2017 LTCI, Télécom ParisTech, Institut-Mines Télécom, Université Paris-Saclay

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information