Beyond Mean Regression
|
|
- Gillian Jacobs
- 5 years ago
- Views:
Transcription
1 Beyond Mean Regression Thomas Kneib Lehrstuhl für Statistik Georg-August-Universität Göttingen Innsbruck
2 Introduction Introduction One of the top ten reasons to become statistician (according to Friedman, Friedman & Amoo): Statisticians are mean lovers. Focus on means in particular in regression model to reduce complexity. Obviously, a mean is not sufficient to fully describe a distribution. Beyond Mean Regression 1
3 Introduction Usual regression models based on data (y i, z i ) for a continuous response variable y and covariates z: y i = η i + ε i, where η i is a regression predictor formed in terms of the covariates z i. Assumptions on the error term: E(ε i ) = 0, Var(ε i ) = σ 2, or ε i N(0, σ 2 ). Beyond Mean Regression 2
4 Introduction The assumptions on the error term imply the following properties of the response distribution The predictor determines the expectation of the response: E(y i z i ) = η i. Homoscedasticity of the response: Var(y i z i ) = σ 2. Parallel quantile curves of the response (if the errors are also normal): Q τ (y i z i ) = η i + z τ σ. Beyond Mean Regression 3
5 Why could this be problematic? Introduction The variance of the responses may depend on covariates (heteroscedasticity). Other higher order characteristics (skewness, curtosis,... ) of the responses may depend on covariates. Generic interest in extreme observations or the complete conditional distribution of the response. Beyond Mean Regression 4
6 Introduction Example: Munich rental guide (illustrative application in this talk). Explain the net rent for a specific flat in terms of covariates such as living area or year of construction. Published to give reference intervals of usual rents for both tenants and landlords. We are not interested in average rents but rather in an interval covering typical rents. rent in Euro rent in Euro living area year of construction Beyond Mean Regression 5
7 Some further examples: Introduction Analysing childhood BMI patterns in (post-) industrialized countries, where interest is mainly on extreme forms of overweight (obesity). Studying covariate effects on extreme forms of malnutrition in developing countries. Efficiency estimation in agricultural production, where interest is on evaluating above-average performance of farms. Modelling gas flow networks, where the behavior of the network in high or low demand situations shall be studied. Beyond Mean Regression 6
8 More flexible regression approaches considered in the following: Introduction Regression models for location, scale and shape. Quantile regression. Expectile regression. Beyond Mean Regression 7
9 Regression models for location, scale and shape: Introduction Retain the assumption of a specific error distribution but allow covariate effects not only on the mean. Simplest example: Regression for mean and variance of a normal distribution where y i = η i1 + exp(η i2 )ε i, ε i N(0, 1), such that E(y i z i ) = η i1 Var(y i z i ) = exp(η i2 ) 2. In general: Specify a distribution for the response, where (potentially) all parameters are related to predictors. Beyond Mean Regression 8
10 Quantile and expectile regression: Introduction Drop the parametric assumption for the error / response distribution and instead estimate separate models for different asymmetries τ [0, 1]: y i = η iτ + ε iτ, Instead of assuming E(ε iτ ) = 0, we can for example assume P (ε iτ 0) = τ, i.e. the τ-quantile of the error term is zero. Yields a regression model for the quantiles of the response. A dense set of quantiles completely characterizes the conditional distribution of the response. Expectiles are a computationally attractive alternative to quantiles. Beyond Mean Regression 9
11 Introduction Estimated quantile curves for the Munich rental guide with linear effect of living area and quadratic effect for year of construction. Homoscedastic linear model: rent in Euro living area rent in Euro year of construction Beyond Mean Regression 10
12 Heteroscedastic linear model: Introduction rent in Euro living area rent in Euro year of construction Beyond Mean Regression 11
13 Quantile regression: Introduction rent in Euro living area rent in Euro year of construction Beyond Mean Regression 12
14 Introduction Usually, modern regression data contain more complex structures such that linear predictors are not enough. For example, in the Munich rental guide the effects of living area and size of the flat may be of complex nonlinear form (instead of simply polynomial) and a spatial effect based on the subquarter information may be included to capture effects of missing covariates and spatial correlation. Consider semiparametric extensions. Beyond Mean Regression 13
15 Overview for the Rest of the Talk Overview for the Rest of the Talk Semiparametric Predictor Specifications. More on Models: Generalized Additive Models for Location, Scale and Shape. Quantile Regression. Expectile Regression. Inferential Procedures & Comparison of the Approaches. Beyond Mean Regression 14
16 Semiparametric Regression Semiparametric Regression Semiparametric regression provides a generic framework for flexible regression models with predictor η = β 0 + f 1 (z) f r (z) where f 1,..., f r are generic functions of the covariate vector z. Types of effects: Linear effects: f(z) = x β. Nonlinear, smooth effects of continuous covariates: f(z) = f(x). Varying coefficients: f(z) = uf(x). Interaction surfaces: f(z) = f(x 1, x 2 ). Spatial effects: f(z) = f spat (s). Random effects: f(z) = b c with cluster index c. Beyond Mean Regression 15
17 Generic model description based on Semiparametric Regression a design matrix Z j, such that the vector of function evaluations f j = (f j (z 1 ),..., f j (z n )) can be written as f j = Z j γ j. a quadratic penalty term pen(f j ) = pen(γ j ) = γ jk j γ j which operationalises smoothness properties of f j. From a Bayesian perspective, the penalty term corresponds to a multivariate Gaussian prior ( ) p(γ j ) exp 1 2δj 2 γ jk j γ j. Beyond Mean Regression 16
18 Estimation then relies on a penalised fit criterion, e.g. Semiparametric Regression n (y i η i ) 2 + i=1 r λ j γ jk j γ j j=1 with smoothing parameters λ j 0. Beyond Mean Regression 17
19 Semiparametric Regression Example 1. Penalised splines for nonlinear effects f(x): Approximate f(x) in terms of a linear combination of B-spline basis functions f(x) = k γ k B k (x). Large variability in the estimates corresponds to large differences in adjacent coefficients yielding the penalty term pen(γ) = k ( d γ k ) 2 = γ D dd d γ with difference operator d and difference matrix D d of order d. The corresponding Bayesian prior is a random walk of order d, e.g. γ k = γ k 1 + u k, γ k = 2γ k 1 + γ k 2 + u k with u k i. i. d. N(0, δ 2 ). Beyond Mean Regression 18
20 Semiparametric Regression Beyond Mean Regression 19
21 Semiparametric Regression Example 2. Markov random fields for the estimation of spatial effects based on regional data: Estimate a separate regression coefficient γ s for each region, i.e. f = Zγ with Z[i, s] = { 1 observation i belongs to region s 0 otherwise Penalty term based on differences of neighboring regions: pen(γ) = s r N(s) (γ s γ r ) 2 = γ Kγ where N(s) is the set of neighbors of region s and K is an adjacency matrix. An equivalent Bayesian prior structure is obtained based on Gaussian Markov random fields. Beyond Mean Regression 20
22 Inferential Procedures Inferential Procedures For each of the three model classes discussed in the following, we will consider three potential avenues for inference: Direct optimization of a fit criterion (e.g. maximum likelihood estimation for GAMLSS). Bayesian approaches. Functional gradient descent boosting. Beyond Mean Regression 21
23 Functional gradient descent boosting: Inferential Procedures Define the estimation problem in terms of a loss function ρ (e.g. the negative log-likelihood). Use the negative gradients of the loss function evaluated at the current fit as a measure for lack of fit. Iteratively fit simple base-learning procedures to the negative gradients to update the model fit. Componentwise updates of only the best-fitting model component yield automatic variable selection and model choice. For semiparametric regression, penalized least squares estimates provide suitable base-learners. Beyond Mean Regression 22
24 Generalized Additive Models for Location, Scale and Shape Generalized Additive Models for Location, Scale and Shape GAMLSS provide a unified framework for semiparametric regression models in the case of complex response distributions depending on up to four parameters (µ i, σ i, ν i, ξ i ) where usually µ i is the location parameter, σ i is the scale parameter, and ν i and ξ i are shape parameters determining for example skewness or kurtosis. Each parameter is related to a regression predictor via a suitable response function, i.e. µ i = h 1 (η i,µ ), σ i = h 2 (η i,σ ),... Beyond Mean Regression 23
25 Generalized Additive Models for Location, Scale and Shape A very broad class of distributions is supported for both discrete and continuous responses. Most important examples for continuous responses: Two-parameter normal distribution (location and scale). Three-parameter power exponential distribution (location, scale and kurtosis). Three-parameter t distribution (location, scale and degrees of freedom). Three-parameter gamma distribution (location, scale and shape). Four-parameter Box-Cox power distribution (location, scale, skewness and kurtosis). Beyond Mean Regression 24
26 Direct optimization: Generalized Additive Models for Location, Scale and Shape For GAMLSS, the likelihood is available due to the explicit assumption made for the distribution of the response. Maximization can be achieved by penalized iteratively weighted least squares (IWLS) estimation. Estimation and choice of the smoothing parameters is challenging at least for complex models. Bayesian inference: Inference based on Markov chain Monte Carlo (MCMC) simulations is in principle straightforward but requires careful choice of the proposal densities. Promising results obtained based on IWLS proposals. Smoothing parameter choice is immediately included. Beyond Mean Regression 25
27 Boosting: Generalized Additive Models for Location, Scale and Shape Due to the multiple predictors, the usual boosting framework has to be adapted but basically still works. Beyond Mean Regression 26
28 Generalized Additive Models for Location, Scale and Shape Results for the Munich rental guide obtained with an additive model for location and scale: mean: area mean: year of construction area in sqm year of construction Beyond Mean Regression 27
29 Generalized Additive Models for Location, Scale and Shape standard dev.: area standard dev.: year of construction area in sqm year of construction Beyond Mean Regression 28
30 Quantile Regression Quantile Regression The theoretical τ-quantile q τ for a continuous random variable is characterized by P (Y q τ ) τ and P (Y q τ ) 1 τ. Estimation of quantiles based on i.i.d. samples y 1,..., y n can be accomplished by ˆq τ = argmin q n w τ (y i, q) y i q i=1 with asymmetric weights w τ (y i, q) = 1 τ y i < q 0 y i = q τ y i > q. Beyond Mean Regression 29
31 Quantile Regression Plot of the weighted losses w τ (y, q) y q (for q = 0) Beyond Mean Regression 30
32 Quantile regression starts with the regression model Quantile Regression y i = η iτ + ε iτ. Instead of assuming E(ε iτ ) = 0 as in mean regression, we assume i.e. the τ-quantile of the error is zero. F εiτ (0) = P (ε iτ 0) = τ This implies that the predictor coincides with the τ-quantile of the conditional distribution of the response, i.e. F yi (η iτ ) = P (y i η iτ ) = τ. Beyond Mean Regression 31
33 Quantile regression therefore Quantile Regression is distribution-free since it does not make any specific assumptions on the type of errors. does not even require i.i.d. errors. allows for heteroscedasticity. Beyond Mean Regression 32
34 Quantile Regression Note that each parametric regression models also induces a quantile regression model. Example: The heteroscedastic normal model y N(η 1, exp(η 2 ) 2 ) yields q τ = η 1 + exp(η 2 )z τ. Beyond Mean Regression 33
35 Direct optimisation: Quantile Regression Classical estimation is achieved by minimizing n w τ (y i, η iτ ) y i η iτ + i=1 p λ j pen(f j ). j=1 Can be solved with linear programming as long as the penalties are also linear functionals, e.g. for total variation penalization pen(f j ) = f j (x) dx. Does not fit well with the class of quadratic penalties we are considering. Smoothing parameter selection is still challenging in particular with multiple smoothing parameters. Beyond Mean Regression 34
36 Bayesian inference Quantile Regression Although quantile regression is distribution-free, there is an auxiliary error distribution that links ML estimation to quantile regression. Assume an asymmetric Laplace distribution for the responses, i.e. y i ALD(η iτ, σ 2, τ) with density exp ( w τ (y i, η iτ ) y ) i η iτ σ 2. Maximizing the resulting likelihood exp ( n i=1 ) w τ (y i, η iτ ) y i η iτ σ 2 is equivalent to minimizing the quantile loss criterion. Beyond Mean Regression 35
37 Quantile Regression A computationally attractive way of working with the ALD in a Bayesian framework is its scale-mixture representation If z i σ 2 Exp(1/σ 2 ) and y i z i, η iτ, σ 2 N(η iτ + ξz i, σ 2 /w i ) with ξ = 1 2τ τ(1 τ), w i = 1 δ 2 z i, δ 2 = then y i is marginally ALD(η iτ, σ 2, τ) distributed. 2 τ(1 τ). Allows to construct efficient Gibbs samplers or variational Bayes approximations to explore the posterior after imputing z i as additional unknowns. Beyond Mean Regression 36
38 Boosting: Quantile Regression Boosting can be immediately applied in the quantile regression context since it is formulated in terms of a loss function. Negative gradients are defined almost everywhere, i.e. no conceptual problems. Beyond Mean Regression 37
39 Results for a geoadditive Bayesian quantile regression model: Quantile Regression τ=0.1 τ= τ=0.5 τ= Beyond Mean Regression 38
40 Quantile Regression f( living area ) f( year of construction ) living area year of construction f( living area ) f( year of construction ) living area year of construction f( living area ) f( year of construction ) living area year of construction Beyond Mean Regression 39
41 Expectile Regression Expectile Regression What is expectile regression? n y i η i min i=1 median regression n w τ (y i, η iτ ) y i η iτ min i=1 quantile regression n y i η i 2 min i=1 mean regression?? expectile regression Beyond Mean Regression 40
42 Expectile Regression Expectile Regression What is expectile regression? n y i η i min i=1 median regression n w τ (y i, η iτ ) y i η iτ min i=1 quantile regression n y i η i 2 min i=1 mean regression n w τ (y i, η iτ ) y i η iτ 2 min i=1 expectile regression Beyond Mean Regression 41
43 Theoretical expectiles are obtained by solving Expectile Regression τ = eτ y e τ f y (y)dy y e τ f y (y)dy = G y (e τ ) e τ F y (e τ ) 2(G y (e τ ) e τ F y (e τ )) + (e τ µ) where f y ( ) and F y ( ) denote the density and cumulative distribution function of y, G y (e) = e yf y(y)dy is the partial moment function of y and G y ( ) = µ is the expectation of y. Beyond Mean Regression 42
44 Direct optimization: Expectile Regression Since the expectile loss is differentiable, estimates for the basis coefficients can be obtained by iterating ˆγ [t+1] jτ = (Z jw [t] τ Z j + λ j K j ) 1 Z jw [t] τ y. A combination with mixed model methodology allows to estimate the smoothing parameters. Beyond Mean Regression 43
45 Bayesian inference: Expectile Regression Similarly as for quantile regression, an asymmetric normal distribution can be defined as auxiliary distribution for the responses. No scale mixture representation known so far. Bayesian formulation probably less important since inference is directly tractable. Boosting: Boosting can be immediately applied in the expectile regression context. Beyond Mean Regression 44
46 Comparison Comparison Advantages of GAMLSS: One joint model for the distribution of the responses. Interpretability of the estimated effects in terms of parameters of the response distribution. Quantiles (or expectiles) derived from GAMLSS will always be coherent, i.e. ordering will be preserved. Readily available in both frequentist and Bayesian formulation. Disadvantages of GAMLSS: Potential for misspecification of the observation model. Model checking difficult in complex settings. If quantiles are of ultimate interest, GAMLSS do not provide direct estimates for these. Beyond Mean Regression 45
47 Advantages of quantile regression: Comparison Completely distribution-free approach. Easy interpretation in terms of conditional quantiles. Bayesian formulation enables very flexible, fully data-driven semiparametric specifications of the predictor. Disadvantages of quantile regression: Bayesian formulation requires an auxiliary error distribution (that will usually be a misspecification). Estimated cumulative distribution function is a step function even for continuous data. Additional efforts required to avoid crossing of quantile curves. Beyond Mean Regression 46
48 Advantages of expectile regression: Comparison Computationally simple (iteratively weighted least squares). Still allows to characterize the complete conditional distribution of the response. Quantiles (or conditional distributions) can be computed based on expectiles. Expectiles seem to be more efficient in close-to-gaussian situations then quantiles. Expectile crossing seems to be less of an issue as compared to quantile crossing. The estimated expectile curve is smooth. Disadvantages of expectile regression: Immediate interpretation of expectiles is difficult. Beyond Mean Regression 47
49 Summary Summary There is more than mean regression! Semiparametric extensions become available also for models beyond mean regression. You can do this at home: Quantile regression: R-package quantreg. Bayesian quantile regression: BayesX (MCMC) and forthcoming R-package on variational Bayes approximations (VA). GAMLSS: R-packages gamlss and gamboostlss. Expectile regression: R-package expectreg. Interesting addition to the models considered: Modal regression (yet to be explored). Beyond Mean Regression 48
50 Acknowledgements: Summary This talk is mostly based on joint work with Nora Fenske, Benjamin Hofner, Torsten Hothorn, Göran Kauermann, Stefan Lang, Andreas Mayr, Matthias Schmid, Linda Schulze Waltrup, Fabian Sobotka, Elisabeth Waldmann and Yu Yue. Financial support has been provided by the German Research Foundation (DFG). A place called home: Beyond Mean Regression 49
Analysing geoadditive regression data: a mixed model approach
Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression
More informationgamboostlss: boosting generalized additive models for location, scale and shape
gamboostlss: boosting generalized additive models for location, scale and shape Benjamin Hofner Joint work with Andreas Mayr, Nora Fenske, Thomas Kneib and Matthias Schmid Department of Medical Informatics,
More informationModelling geoadditive survival data
Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model
More informationLASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape
LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape Nikolaus Umlauf https://eeecon.uibk.ac.at/~umlauf/ Overview Joint work with Andreas Groll, Julien Hambuckers
More informationModeling Real Estate Data using Quantile Regression
Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices
More informationSchool on Modelling Tools and Capacity Building in Climate and Public Health April 2013
2453-14 School on Modelling Tools and Capacity Building in Climate and Public Health 15-26 April 2013 Expectile and Quantile Regression and Other Extensions KAZEMBE Lawrence University of Malawi Chancellor
More informationA general mixed model approach for spatio-temporal regression data
A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression
More informationmboost - Componentwise Boosting for Generalised Regression Models
mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten Hothorn Department of Statistics Ludwig-Maximilians-University Munich 13.8.2008 Boosting in a Nutshell Boosting
More informationVariable Selection and Model Choice in Survival Models with Time-Varying Effects
Variable Selection and Model Choice in Survival Models with Time-Varying Effects Boosting Survival Models Benjamin Hofner 1 Department of Medical Informatics, Biometry and Epidemiology (IMBE) Friedrich-Alexander-Universität
More informationModel-based boosting in R
Model-based boosting in R Families in mboost Nikolay Robinzonov nikolay.robinzonov@stat.uni-muenchen.de Institut für Statistik Ludwig-Maximilians-Universität München Statistical Computing 2011 Model-based
More informationKneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"
Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/
More informationConditional Transformation Models
IFSPM Institut für Sozial- und Präventivmedizin Conditional Transformation Models Or: More Than Means Can Say Torsten Hothorn, Universität Zürich Thomas Kneib, Universität Göttingen Peter Bühlmann, Eidgenössische
More informationRegularization in Cox Frailty Models
Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationA short introduction to INLA and R-INLA
A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationA Bayesian perspective on GMM and IV
A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all
More informationOn the Dependency of Soccer Scores - A Sparse Bivariate Poisson Model for the UEFA EURO 2016
On the Dependency of Soccer Scores - A Sparse Bivariate Poisson Model for the UEFA EURO 2016 A. Groll & A. Mayr & T. Kneib & G. Schauberger Department of Statistics, Georg-August-University Göttingen MathSport
More informationBoosting Methods: Why They Can Be Useful for High-Dimensional Data
New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationWorking Papers in Economics and Statistics
University of Innsbruck Working Papers in Economics and Statistics Simultaneous probability statements for Bayesian P-splines Andreas Brezger and Stefan Lang 2007-08 University of Innsbruck Working Papers
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationOn Bayesian Computation
On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationVariable Selection and Model Choice in Structured Survival Models
Benjamin Hofner, Torsten Hothorn and Thomas Kneib Variable Selection and Model Choice in Structured Survival Models Technical Report Number 043, 2008 Department of Statistics University of Munich http://www.stat.uni-muenchen.de
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationBayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference
Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationBayesian inference for multivariate skew-normal and skew-t distributions
Bayesian inference for multivariate skew-normal and skew-t distributions Brunero Liseo Sapienza Università di Roma Banff, May 2013 Outline Joint research with Antonio Parisi (Roma Tor Vergata) 1. Inferential
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationAccounting for Complex Sample Designs via Mixture Models
Accounting for Complex Sample Designs via Finite Normal Mixture Models 1 1 University of Michigan School of Public Health August 2009 Talk Outline 1 2 Accommodating Sampling Weights in Mixture Models 3
More informationUsing the package hypergsplines: some examples.
Using the package hypergsplines: some examples. Daniel Sabanés Bové 21st November 2013 This short vignette shall introduce into the usage of the package hypergsplines. For more information on the methodology,
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationSpatially Adaptive Smoothing Splines
Spatially Adaptive Smoothing Splines Paul Speckman University of Missouri-Columbia speckman@statmissouriedu September 11, 23 Banff 9/7/3 Ordinary Simple Spline Smoothing Observe y i = f(t i ) + ε i, =
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationVariable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting
Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Andreas Groll 1 and Gerhard Tutz 2 1 Department of Statistics, University of Munich, Akademiestrasse 1, D-80799, Munich,
More informationOnline appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US
Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationRiemann Manifold Methods in Bayesian Statistics
Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes
More informationBayesian Inference: Probit and Linear Probability Models
Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 5-1-2014 Bayesian Inference: Probit and Linear Probability Models Nate Rex Reasch Utah State University Follow
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationEstimating Timber Volume using Airborne Laser Scanning Data based on Bayesian Methods J. Breidenbach 1 and E. Kublin 2
Estimating Timber Volume using Airborne Laser Scanning Data based on Bayesian Methods J. Breidenbach 1 and E. Kublin 2 1 Norwegian University of Life Sciences, Department of Ecology and Natural Resource
More informationBeyond MCMC in fitting complex Bayesian models: The INLA method
Beyond MCMC in fitting complex Bayesian models: The INLA method Valeska Andreozzi Centre of Statistics and Applications of Lisbon University (valeska.andreozzi at fc.ul.pt) European Congress of Epidemiology
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationKyle Reing University of Southern California April 18, 2018
Renormalization Group and Information Theory Kyle Reing University of Southern California April 18, 2018 Overview Renormalization Group Overview Information Theoretic Preliminaries Real Space Mutual Information
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationPOSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL
COMMUN. STATIST. THEORY METH., 30(5), 855 874 (2001) POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL Hisashi Tanizaki and Xingyuan Zhang Faculty of Economics, Kobe University, Kobe 657-8501,
More informationMonte Carlo in Bayesian Statistics
Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of
More informationBoosting structured additive quantile regression for longitudinal childhood obesity data
Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich wwwzorauzhch Year: 2013 Boosting structured additive quantile regression for longitudinal childhood
More informationCalibrating Environmental Engineering Models and Uncertainty Analysis
Models and Cornell University Oct 14, 2008 Project Team Christine Shoemaker, co-pi, Professor of Civil and works in applied optimization, co-pi Nikolai Blizniouk, PhD student in Operations Research now
More informationSupplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements
Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model
More informationThe Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB)
The Poisson transform for unnormalised statistical models Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB) Part I Unnormalised statistical models Unnormalised statistical models
More informationEstimation of Operational Risk Capital Charge under Parameter Uncertainty
Estimation of Operational Risk Capital Charge under Parameter Uncertainty Pavel V. Shevchenko Principal Research Scientist, CSIRO Mathematical and Information Sciences, Sydney, Locked Bag 17, North Ryde,
More informationBayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples
Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationAn Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationNORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET
NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET Investigating posterior contour probabilities using INLA: A case study on recurrence of bladder tumours by Rupali Akerkar PREPRINT STATISTICS NO. 4/2012 NORWEGIAN
More informationQuantile Regression for Extraordinarily Large Data
Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step
More informationMarginal Specifications and a Gaussian Copula Estimation
Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required
More informationLecture 13 Fundamentals of Bayesian Inference
Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More information