Bayesian Sparse Linear Regression with Unknown Symmetric Error
|
|
- Moris McCoy
- 5 years ago
- Views:
Transcription
1 Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of Statistics and Data Sciences, The University of Texas at Austin 3 Department of Statistical Science, Duke University June 17, 2016 BU-KEIO 2016 Workshop 1 / 38
2 2 / 38 Outline 1 Introduction 2 Sparse linear model 3 Linear model with unknwon error distribution 4 Asymptotic results
3 3 / 38 Outline 1 Introduction 2 Sparse linear model 3 Linear model with unknwon error distribution 4 Asymptotic results
4 4 / 38 Symmetric location problem Y i = µ + ɛ i, ɛ i iid η( ) (unknown) If η is symmetric, efficient and adaptive estimation of µ is possible. [Beran, 1974; Stone 1975;...] Linear regression [Bickel, 1982]: µ = x T i θ, θ R p, i = 1,..., n. For Bayesian, the semi-parametric Bernstein-von Mises (BvM) theorem holds. [Chae, Kim and Kleijn, 2016] We study a Bayesian approach when p is large.
5 5 / 38 Bayesian paradigm A parameter θ is generated acoording to a prior distribution Π. Conditional on θ, the data X is generated according to a density p θ. For given observed data X, statistical inferences are based on the posterior distribution: dπ(θ X) p θ (X)dΠ(θ). Typically, the posterior distribution can be approximated via MCMC.
6 6 / 38 Bayesian asymptotics A frequentist would like to know their performance in a frequentist viewpoint. Assume that the data X 1,..., X n is generated according to a given parameter θ 0 and consider the posterior Π(θ X 1,..., X n ). For large enough n, we want Π(θ X 1,..., X n ) to put most of its mass near θ 0 for most X 1,..., X n. For infinite dimensional θ, the choice of the prior is important.
7 7 / 38 Parametric Bernstein-von Mises theorem Assume that a parametric model P = {P θ : θ Θ} is regular and iid X 1,..., X n P θ0, where θ 0 Θ. THEOREM (Bernstein-von Mises) [Le Cam and Yang, 1990] For any prior with positive density around θ 0, Π( X 1,..., X n ) N (ˆθn, I 1 θ 0 /n ) P TV 0, where ˆθ n is an efficient estimator for θ and I θ0 is the Fisher information matrix. The Bayesian credible interval is a standard confidence interval.
8 8 / 38 Parametric BvM: Illustration θ Beta(5, 1), X 1,..., X n θ iid Bernoulli(θ), θ 0 = 1/2 density n=10 n=50 n=
9 9 / 38 Bayesian asymptotics A frequentist would like to know their performance in a frequentist viewpoint. Assume that the data X 1,..., X n is generated according to a given parameter θ 0 and consider the posterior Π(θ X 1,..., X n ). For large enough n, we want Π(θ X 1,..., X n ) to put most of its mass near θ 0 for most X 1,..., X n. For infinite dimensional θ, the choice of the prior is important.
10 10 / 38 Semi-parametric BvM (fixed p) Y i = x T i θ + ɛ i, ɛ i iid η( ) (unknown) Put a symmetrized Dirichlet process (DP) mixture prior on η. THEOREM [Chae, Kim and Kleijn, 2016] For any prior on θ, with positive density around θ 0, Π(θ X 1,..., X n ) N (ˆθn, I 1 θ 0,η 0 /n ) P TV 0, where ˆθ n is an efficient estimator for θ and I θ0,η 0 is the efficient information matrix. What if p is large?
11 11 / 38 Outline 1 Introduction 2 Sparse linear model 3 Linear model with unknwon error distribution 4 Asymptotic results
12 12 / 38 Sparse linear model Consider the linear regression model Y i = x T i θ + ɛ i, i = 1,..., n where θ = (θ 1,..., θ p ) T and possibly p n. Simply, Y = Xθ + ɛ. A sparse model assumes that most of θ i s are (nearly) zero. We apply full Bayesian procedures, and express the sparsity in priors.
13 Sparse prior A prior Π Θ for θ R p can be constructed as follows: 1 (Dimension) Choose s from prior π p on {0, 1,..., p}. 2 (Model) Choose S {0, 1,..., p} of size S = s at random. 3 (Nonzero coeff.) Choose θ S = (θ i ) i S from density g S on R S and set θ S c = 0. Formally, (S, θ) π p (s) 1 ( p s)g S (θ S )δ 0 (θ S c). Prior π p on the dimension controls the level of sparsity. 13 / 38
14 14 / 38 Sparse prior: example Spike and slab [Ishwaran and Rao 2005; and many authors] for some r (0, 1), similarly, s Binomial(p, r) θ i (1 r)δ 0 + rg, i p for some continuous distribution G. Good asymptotic properties if r Beta(1, p u ) for some u > 1 and tail of G is as thick as Laplace. [Castillo and van der Vaart, 2015]
15 15 / 38 Sparse prior: example Complexity prior [Castillo and van der Vaart, 2012] π p (s) c s p as, s = 0, 1,..., p for some constants a, c > 0. Roughly, π p (s) ( ) p 1, for s p. s
16 16 / 38 Other priors Continuous shrinkage priors that peaks near zero. Typically, scale mixtures of normals: for i = 1,..., p, θ i τ 2, λ 2 i N(0, τ 2 λ 2 i ), λ 2 i π λ (λ 2 i ),, τ 2 π τ (τ 2 ). 1 Bayesian Lasso [Park and Casella, 2008] 2 Horseshoe [Carvalho, Polson and Scott, 2010] 3 Normal-gamma [Griffin and Brown, 2010] 4 Generalized double Pareto [Amagan, Dunson and Lee, 2013] 5 Dirichlet-Laplace [Bhattacharya et al., 2016] 6...
17 17 / 38 Outline 1 Introduction 2 Sparse linear model 3 Linear model with unknwon error distribution 4 Asymptotic results
18 18 / 38 Gaussian model Y i = x T i θ + ɛ i, i = 1,..., n. Assume that ɛ i i.i.d. η for some density η H. Usually it is assumed that η(y) = φ σ (y) because of 1 computational simplicity, and 2 good theoretical properties. Some properties (e.g. consistency and rate) tend to be robust to misspecification.
19 19 / 38 Key problems Y i = x T i θ + ɛ i, i = 1,..., n. Assume that ɛ i s are not really normally distributed. Key problems caused from model misspecification: 1 (Efficiency) Asymptotic variance of n(ˆθ i θ i ) can be large. 2 (Uncertainty quantification) Credible sets do not give valid confidence. [Kleijn and van der Vaart, 2012] 3 (Selection) Misspecification might result in serious overfitting. [Grünwald and Ommen, 2014] Good remedy : semi-parametric modelling.
20 20 / 38 Key problems: example [Grünwald and Ommen, 2014] Y i = θ int + θ 1 x i + θ 2 x 2 i + + θ p x p i + ɛ i, θ 0 = 0 R p+1
21 21 / 38 Key problems Y i = x T i θ + ɛ i, i = 1,..., n. Assume that ɛ i s are not really normally distributed. Key problems caused from model misspecification: 1 (Efficiency) Asymptotic variance of n(ˆθ i θ i ) can be large. 2 (Uncertainty quantification) Credible sets do not give valid confidence. [Kleijn and van der Vaart, 2012] 3 (Selection) Misspecification might result in serious overfitting. [Grünwald and Ommen, 2014] Good remedy : semi-parametric modelling.
22 22 / 38 Frequentist s method for fixed p Y i = x T i θ + ɛ i, ɛ i η. There is an efficient estimator for θ. [Bickel, 1982] One way to get an efficient estimator is: 1 Find an initial n 1/2 -consistent estimator θ n. 2 Estimate the score function with perturbed sample ɛ i = Y i θ T n X i. 3 Solve the score equation using one step Newton-Raphson iteration. Does it work if p n?
23 23 / 38 Bayesian method for fixed p Y i = xi T θ + ɛ i, ɛ i η. Put a symmetrized DP mixture prior Π H on η: η(y) = φ σ (y z)df(z, σ), F DP(α), and df(z, σ) = df(z, σ) + df( z, σ). 2 Then, the BvM theorem holds. [Chae, Kim and Kleijn, 2016] Inference: Gibbs sampler algorithm
24 24 / 38 Bayesian inference Y i = x T i θ + ɛ i Y i = x T i θ + z i + σ i ɛ i ɛ i η (z i, σ i ) F, ɛ i N(0, 1) Inference can be done through Gibbs sampler algorithm: 1 For given (z i, σ i ) i n, θ can be sampled as in the Gaussian model. 2 For given θ, (z i, σ i ) i n can be sampled as in the DPM model. Additional computational burden by semi-parametric modelling depends only on n. Feasible when p n!
25 25 / 38 Outline 1 Introduction 2 Sparse linear model 3 Linear model with unknwon error distribution 4 Asymptotic results
26 Goal: frequentist properties (p n) Assume fixed design X, and responce vector Y is really generated from a given (θ 0, η 0 ), possibly p n. We want (marginal) posterior Π(θ Y): 1 (Recovery) to put most of its mass around θ 0 2 (Uncertainty quantification) to express remaining uncertainty 3 (Selection) to find the true nonzero set S 0 of θ 0 4 (Adaptation) to adapt unknwon sparsity level and error density with high P θ0,η 0 -probability. 26 / 38
27 27 / 38 Prior for θ The probability π p (s) decrease exponentially: [Castillo and van der Vaart, 2012; 2015] (i) for some constants A 1, A 2, A 3, A 4 > 0, A 1 p A 3 π p (s 1) π p (s) A 2 p A 4 π p (s 1), s = 1,..., p Tails of nonzero coeff. are as thick as Laplace distribution: [Castillo and van der Vaart, 2012; van der Pas et al., 2016] (ii) g S (θ) = i S g(θ i ), g(θ i ) e λ θ i and λ satisfies n p λ n log p.
28 28 / 38 Prior for η Put a symmetrized DP mixture prior Π H on η [Chae, Kim and Kleijn, 2016] : η(y) = φ σ (y z)df(z, σ), F DP(α), and df(z, σ) = df(z, σ) + df( z, σ). 2 Assume that supp(α) [ M, M] [σ 1, σ 2 ] for some positive constants M and σ 1 < σ 2.
29 29 / 38 Design matrix Assume unifomrly bounded covariates: x ij 1. Define uniform compatibility numbers and restricted eigenvalues φ 2 (s) = inf { sθ Xθ 2 2 n θ 2 1 ψ 2 (s) = inf { Xθ 2 2 n θ 2 2 } : 0 < s θ s } : 0 < s θ s. φ(ks 0 ) 1 (ψ(ks 0 ) 1, resp.) for some const. K > 1 is sufficient for the recovery of θ in l 1 - (l 2 -, resp.) norm.
30 30 / 38 Design matrix: examples By C-S inequality, φ(s) ψ(s). ψ(s) 1 in many examples: 1 Typically, ψ(s) const. s max i j corr(x i, x j ). [Lounici, 2008] 2 If x ij s are i.i.d. random variables, then ψ(s) 1 with high probability for s n/ log p. [Cai and Jiang, 2011] 3 If p = n and corr(x i, x j ) = ρ i j for some ρ (0, 1), then ψ(p) 1. [Zhao and Yu, 2006] There are some examples such that φ(s) 1 but not for ψ(s). [van de Geer and Bühlmann, 2009]
31 31 / 38 Asymptotic: dimension THEOREM [Chae, Lin and Dunson, 2016] If λ θ 0 1 s 0 log p and s 0 log p n, then EΠ ( s θ > Ks 0 Y ) 0 for some constant K > 1. Small value of λ is preferred for large θ 0 1.
32 Asymptotic: consistency d 2 n((θ, η), (θ 0, η 0 )) = 1 n n dh(p 2 θ,η,i, p θ0,η 0,i). i=1 Mean Hellinger distance d n allows to construct certain exponentially consistent tests for independent observations. [Birgé, 1983; Ghosal and van der Vaart 2007] THEOREM [Chae, Lin and Dunson, 2016] If, furthermore, φ(ks 0 ) p 1, then ( s0 log p EΠ d n ((θ, η), (θ 0, η 0 )) n ) Y / 38
33 33 / 38 Asymptotic: consistency (cont.) THEOREM [Chae, Lin and Dunson, 2016] Under the previous conditions, ( s0 log p ) EΠ d H (η, η 0 ) Y 0. n If, furthermore, s 2 0 log p/φ2 (Ks 0 ) n, then ( EΠ θ θ 0 1 s 0 log p ) Y 0 φ(ks 0 ) n ( 1 s0 log p ) EΠ θ θ 0 2 Y 0 ψ(ks 0 ) n ( EΠ X(θ θ 0 ) 2 ) s 0 log p Y 0.
34 34 / 38 Asymptotic: LAN r n (θ, η) = L n (θ, η) L n (θ 0, η) { n(θ θ0 ) T G n lθ0,η 0 n } 2 (θ θ 0) T V n,η0 (θ θ 0 ) THEOREM [Chae, Lin and Dunson, 2016] If s 0 log p n 1/6, then sup sup r n (θ, η) = o P (1), η H n θ Θ n where Π(Θ n H n Y) 1 in probability.
35 35 / 38 Asymptotic: BvM theorem Let N n,s be the S -dimensional normal dist n to which an efficient estimator n(ˆθ S θs 0 ) converges in dist n. THEOREM [Chae, Lin and Dunson, 2016] If, furthermore, λs 0 log p n and ψ(ks0 ) 1, then sup Π( n(θs θ 0,S ) B Y, S θ = S) N n,s (B) = o P (1), sup S S n B where Π(S θ S n Y) 1 in probability. Posterior dist n of nonzero coeff. is asymptotically a mixture of normal dist n.
36 36 / 38 Asymptotic: selection THEOREM [Chae, Lin and Dunson, 2016] Under the previous conditions, Π(S θ S 0 Y) 0 in probability. The true non-zero coeff. can be selected if every non-zero coeff. is not very small (beta-min condition).
37 37 / 38 Discussion Condition s 0 log p n 1/6 is required due to semi-parametric bias. If η is known (may not be a Gaussian) and p = s 0, the condition may be reduced to s 0 n 1/3, and this cannot be improved. [Panov and Spokoiny, 2015] In some parametric models, s 0 n 1/6 is required for BvM theorem. [Ghosal, 2000] Results can be extended to more general prior, i.e., M, σ 1 and σ 1 0, but sub-gaussian tail of l η0 is (maybe) essential in selection. [Kim and Jeon, 2016]
38 Selected references [1] Castillo, I., Schmidt-Hieber, J., and van der Vaart, A. W. (2015). Bayesian linear regression with sparse priors. Ann. Statist. [2] Chae, M. (2015). The semiparametric Bernstein von Mises theorem for models with symmetric error. PhD thesis, Seoul National University. arxiv. [3] Chae, M., Kim, Y., and Kleijn, B. J. K. (2016). The semi-parametric Bernstein-von Mises theorem for regression models with symmetric errors. arxiv. [4] Chae, M., Lin, L., and Dunson, D. B. (2016). Bayesian sparse linear regression with unknown symmetric error. arxiv. [5] Grünwald, P. and van Ommen, T. (2014). Inconsistency of Bayesian inference for misspecified linear models, and a proposal for repairing it. arxiv. [6] Hanson, D. L. and Wright, F. T. (1971). A bound on tail probabilities for quadratic forms in independent random variables. Ann. Math. Statist. [7] Pollard, D. (2001). Bracketing methods. Unpublished manuscript. Available at pollard/books/asymptopia/bracketing.pdf. [8] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Statist. 38 / 38
Statistica Sinica Preprint No: SS R2
Statistica Sinica Preprint No: SS-2017-0074.R2 Title The semi-parametric Bernstein-von Mises theorem for regression models with symmetric errors Manuscript ID SS-2017-0074.R2 URL http://www.stat.sinica.edu.tw/statistica/
More informationSemiparametric posterior limits
Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations
More informationBayesian Statistics in High Dimensions
Bayesian Statistics in High Dimensions Lecture 2: Sparsity Aad van der Vaart Universiteit Leiden, Netherlands 47th John H. Barrett Memorial Lectures, Knoxville, Tenessee, May 2017 Contents Sparsity Bayesian
More informationNonparametric Bayesian Uncertainty Quantification
Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationPartial factor modeling: predictor-dependent shrinkage for linear regression
modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework
More informationICES REPORT Model Misspecification and Plausibility
ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin
More informationBayesian shrinkage approach in variable selection for mixed
Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction
More informationLasso & Bayesian Lasso
Readings Chapter 15 Christensen Merlise Clyde October 6, 2015 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection
More informationHorseshoe, Lasso and Related Shrinkage Methods
Readings Chapter 15 Christensen Merlise Clyde October 15, 2015 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Bayesian Lasso Park & Casella
More informationInconsistency of Bayesian inference when the model is wrong, and how to repair it
Inconsistency of Bayesian inference when the model is wrong, and how to repair it Peter Grünwald Thijs van Ommen Centrum Wiskunde & Informatica, Amsterdam Universiteit Leiden June 3, 2015 Outline 1 Introduction
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationPriors for the frequentist, consistency beyond Schwartz
Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist
More informationApproximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)
Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University
More informationScalable MCMC for the horseshoe prior
Scalable MCMC for the horseshoe prior Anirban Bhattacharya Department of Statistics, Texas A&M University Joint work with James Johndrow and Paolo Orenstein September 7, 2018 Cornell Day of Statistics
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationBayesian nonparametrics
Bayesian nonparametrics Posterior contraction and limiting shape Ismaël Castillo LPMA Université Paris VI Berlin, September 2016 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September
More informationPosterior properties of the support function for set inference
Posterior properties of the support function for set inference Yuan Liao University of Maryland Anna Simoni CNRS and CREST November 26, 2014 Abstract Inference on sets is of huge importance in many statistical
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationBayesian Regression with Heteroscedastic Error Density and Parametric Mean Function
Bayesian Regression with Heteroscedastic Error Density and Parametric Mean Function Justinas Pelenis pelenis@ihs.ac.at Institute for Advanced Studies, Vienna May 8, 2013 Abstract This paper considers a
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More informationThe Bayesian approach to inverse problems
The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu
More informationGeometric ergodicity of the Bayesian lasso
Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationA union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling
A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling Min-ge Xie Department of Statistics, Rutgers University Workshop on Higher-Order Asymptotics
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,
More informationNonparametric Bayes Density Estimation and Regression with High Dimensional Data
Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Abhishek Bhattacharya, Garritt Page Department of Statistics, Duke University Joint work with Prof. D.Dunson September 2010
More informationCalibrating general posterior credible regions
Calibrating general posterior credible regions Nicholas Syring and Ryan Martin April 24, 2018 arxiv:1509.00922v4 [stat.me] 22 Apr 2018 Abstract An advantage of methods that base inference on a posterior
More informationarxiv: v3 [stat.me] 10 Aug 2018
Lasso Meets Horseshoe: A Survey arxiv:1706.10179v3 [stat.me] 10 Aug 2018 Anindya Bhadra 250 N. University St., West Lafayette, IN 47907. e-mail: bhadra@purdue.edu Jyotishka Datta 1 University of Arkansas,
More informationPackage horseshoe. November 8, 2016
Title Implementation of the Horseshoe Prior Version 0.1.0 Package horseshoe November 8, 2016 Description Contains functions for applying the horseshoe prior to highdimensional linear regression, yielding
More informationBrittleness and Robustness of Bayesian Inference
Brittleness and Robustness of Bayesian Inference Tim Sullivan 1 with Houman Owhadi 2 and Clint Scovel 2 1 Free University of Berlin / Zuse Institute Berlin, Germany 2 California Institute of Technology,
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationQUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS. (Apr. 2018; first draft Dec. 2015) 1. Introduction
QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS YVES F. ATCHADÉ Apr. 08; first draft Dec. 05 Abstract. This paper deals with the Bayesian estimation of high dimensional Gaussian graphical
More informationDe-biasing the Lasso: Optimal Sample Size for Gaussian Designs
De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel
More informationDISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University
Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationRobustness to Parametric Assumptions in Missing Data Models
Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice
More informationD I S C U S S I O N P A P E R
I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationPROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationBrittleness and Robustness of Bayesian Inference
Brittleness and Robustness of Bayesian Inference Tim Sullivan with Houman Owhadi and Clint Scovel (Caltech) Mathematics Institute, University of Warwick Predictive Modelling Seminar University of Warwick,
More informationNear Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing
Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationExchangeability. Peter Orbanz. Columbia University
Exchangeability Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent noise Peter
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationSample Size Requirement For Some Low-Dimensional Estimation Problems
Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,
More informationarxiv: v1 [math.st] 13 Feb 2017
Adaptive posterior contraction rates for the horseshoe arxiv:72.3698v [math.st] 3 Feb 27 Stéphanie van der Pas,, Botond Szabó,,, and Aad van der Vaart, Leiden University and Budapest University of Technology
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationOverall Objective Priors
Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University
More informationRobust estimation, efficiency, and Lasso debiasing
Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationVariational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures
17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter
More informationOn Bayesian Computation
On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationThe Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.
Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface
More informationBayesian Aggregation for Extraordinarily Large Dataset
Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work
More informationA Bayesian Nonparametric Hierarchical Framework for Uncertainty Quantification in Simulation
Submitted to Operations Research manuscript Please, provide the manuscript number! Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationProbabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model
Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationChapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1
Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in
More informationarxiv: v2 [stat.me] 16 Aug 2018
arxiv:1807.06539v [stat.me] 16 Aug 018 On the Beta Prime Prior for Scale Parameters in High-Dimensional Bayesian Regression Models Ray Bai Malay Ghosh August 0, 018 Abstract We study high-dimensional Bayesian
More informationUniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)
Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: 1304.0282 Victor MIT, Economics + Center for Statistics Co-authors: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)
More informationDirichlet Processes: Tutorial and Practical Course
Dirichlet Processes: Tutorial and Practical Course (updated) Yee Whye Teh Gatsby Computational Neuroscience Unit University College London August 2007 / MLSS Yee Whye Teh (Gatsby) DP August 2007 / MLSS
More informationDISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich
Submitted to the Annals of Statistics DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich We congratulate Richard Lockhart, Jonathan Taylor, Ryan
More informationHierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31
Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let Y ig N(θ g, σ 2 ) for i = 1,...,
More informationTime Series and Dynamic Models
Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The
More informationThanks. Presentation help: B. Narasimhan, N. El Karoui, J-M Corcuera Grant Support: NIH, NSF, ARC (via P. Hall) p.2
p.1 Collaborators: Felix Abramowich Yoav Benjamini David Donoho Noureddine El Karoui Peter Forrester Gérard Kerkyacharian Debashis Paul Dominique Picard Bernard Silverman Thanks Presentation help: B. Narasimhan,
More informationBERNSTEIN POLYNOMIAL DENSITY ESTIMATION 1265 bounded away from zero and infinity. Gibbs sampling techniques to compute the posterior mean and other po
The Annals of Statistics 2001, Vol. 29, No. 5, 1264 1280 CONVERGENCE RATES FOR DENSITY ESTIMATION WITH BERNSTEIN POLYNOMIALS By Subhashis Ghosal University of Minnesota Mixture models for density estimation
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationFractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationA Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity
A Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity Peter Grünwald Centrum Wiskunde & Informatica Amsterdam Mathematical Institute Leiden University Joint work with
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationNonparametric Bayes tensor factorizations for big data
Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional
More informationAsymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors
Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:
More informationarxiv: v2 [math.st] 12 Feb 2008
arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig
More informationIntroduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems
Introduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems Anne Sabourin, Ass. Prof., Telecom ParisTech September 2017 1/78 1. Lecture 1 Cont d : Conjugate priors and exponential
More informationBayesian Variable Selection for Skewed Heteroscedastic Response
Bayesian Variable Selection for Skewed Heteroscedastic Response arxiv:1602.09100v2 [stat.me] 3 Jul 2017 Libo Wang 1, Yuanyuan Tang 1, Debajyoti Sinha 1, Debdeep Pati 1, and Stuart Lipsitz 2 1 Department
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationFoundations of Nonparametric Bayesian Methods
1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models
More informationIntroduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models
Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationSparsity and the truncated l 2 -norm
Lee H. Dicker Department of Statistics and Biostatistics utgers University ldicker@stat.rutgers.edu Abstract Sparsity is a fundamental topic in highdimensional data analysis. Perhaps the most common measures
More informationEstimating Sparse High Dimensional Linear Models using Global-Local Shrinkage
Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationA New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables
A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,
More informationMIT Spring 2016
MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :
More information