Optimal Screening for multiple regression models with interaction

Size: px
Start display at page:

Download "Optimal Screening for multiple regression models with interaction"

Transcription

1 Optimal Screening for multiple regression models with interaction Marília Antunes 1, Natércia Durão, Maria Antónia Amaral Turkman 1 1 DEIO -CEAUL, Faculdade de Ciências da Universidade de Lisboa Universidade Portucalense Abstract The predictive point of view approach to the screening problem where the relation between the variable of interest Y and a vector of covariates X is established through a multiple regression model with interaction, presented by Durão [], is adapted and implemented. For the implementation, the methodology developed by Antunes [1] was followed. The results are illustrated by an example and a simulation study was carried for comparison purposes. Keywords: predictive distributions, multiple regression, screening. 1 Introduction An individual is considered a success if the measured value y of a random variable Y belongs to a certain region C y. Let γ represent the proportion of successful individuals in the population, that is, γ = PY C y). Admit that Y is difficult and/or expensive to measure and suppose that we wish to identify usually to retain for future observation) a set of individuals for whom Y C y. In that case, it is desirable to select to be analyzed in detail) only individuals seen as having a high probability of being a success. This can be achieved by measuring a feature vector X = X 1,..., X q) t, q 1) which is correlated to Y and easier and/or cheaper to measure. The screening can then be described by a region C x of IR q such that if x C x, then the individual is retained. C x is called the specification region. Let py,x θ) be the joint density function of Y,X) where θ is an unknown parameter vector. If there is available data D = {y 1,x 1),..., y n,x n)} from the unscreened population and a prior distribution pθ) for the parameter is known, then the specification region C x can be chosen such that the predictive probability that an individual will be rated as success is raised. That is, the predictive probability of an individual to be a success, γ = PY C y D) = PY Cy θ)pθ D)dθ 1) Θ 1

2 is raised by screening to a value δ such that PY Cy,X Cx θ)pθ D)dθ δ = PY C y X C x, D) = Θ. ) PX Cx θ)pθ D)dθ Θ δ represents the predictive probability of an individual selected by the screening procedure to be a success. For a future individual y, x), the following predictive probabilities are also of interest: α = PX C x D) = PX Cx θ)pθ D)dθ, 3) Θ the predictive probability of an individual to be selected by the screening procedure, and PY Cy,X / Cx θ)pθ D)dθ ε = PY C y X / C x, D) = Θ, 4) PX / Cx θ)pθ D)dθ Θ the predictive probability of a non selected individual to be a success. Note that ε = γ δ α)/1 α). The optimal screening problem consists in obtaining a specification region C x, optimal in some sense. From the predictive point of view, the specification region C x of size α is optimal if it maximizes the probability of a selected individual to be a success. Definition 1 The specification region C x of size α is optimal if PY C y X C x, D) = sup B x PY C y X B, D), 5) where the supreme is taken over all sets B which belong to the σ-algebra generated by X, such that PX B D) = α. Theorem 1 [3]) Let px Y C y, D) and px D), be, respectively, the predictive density function of X given Y C y and the marginal predictive density function of X. The optimal specification region of size α is given by { } C x = x IR q px Y Cy, D) : k 6) px D) or, equivalently, C x = where k is such that PX C x D) = α. { } x IR q PY Cy x, D) : k, 7) PY C y D) This result ensures that, for a given size α, PY C y X C x, D) is maximum. A reasonable choice for α, seems to be α = γ = PY C y D), since, in principle, we do not wish to retain more cases than those potentially considered as a success.

3 Optimal Screening for multiple regression models with interaction We consider the regression model with interaction Y i = β 0 + β 1X 1i + β X i +β 3X 1iX i +e i, where Y i, i = 1,..., n is the response variable and X 1i and X i, i = 1,..., n are the two explanatory variables. Using matrix notation, the model can be written as Y = Xβ + e, 8) where Y is a n 1 vector of observations Y i, X is a n 4 matrix with maximal characteristic), β = β 0, β 1, β, β 3) t is the vector ) of parameters and e is a n 1 vector of random errors, e i N n 0, σ 0 I n. The elements of e are not correlated, that is, Ee ie k ) = Ee i)ee k ), i, k = 1,..., n. Given the model parameters, β and σ0, and assuming that X 1i, X i) N µ,σ), i = 1,..., n, independent of e i, then X i µ,σ N µ,σ) and Y i X i = x i, β, σ0 Nα + β 1x 1i + β x i + β 3x 1ix i, σ0), [ for i = 1,. ].., n, σ where X i = X 1i, X i) t and µ = µ 1, µ ) t IR. Σ = 1 σ 1 σ 1 σ is a symmetric positive definite matrix, β IR 4 and σ0 IR +. The joint likelihood function, based on data set D = {y i, x 1i, x 1i) : i = 1,..., n}, obtained independently from the non screened population, in a natural informative experience, is given by Lβ, σ0, µ,σ y,x) = py X, β, σ0)px µ,σ). [ ] s Since the statistics m = x 1, x ) t and V = n 1) 1 s 1 s 1 s, where n n n x j = x ji/n, s j = x ji x j) /n 1) and s 1 = x 1i x 1)x i i=1 i=1 x )/n 1) for j = 1,, are jointly sufficient with respect to the model X i µ,σ N µ,σ), i = 1,..., n, the joint likelihood function is Lβ, σ 0, µ,σ y,x) = py X, β, σ 0)pm,V µ,σ) { σ0) n exp 1 [ k 1s + β σ ˆβ) ˆβ)] } t X t Xβ 0 { Σ n exp n [ m µ) t Σ 1 m µ) ]} [ exp 1 tr Σ 1 V )], 9) where k 1 = n 4, k 1s = y Xˆβ) t y Xˆβ) is the sum of squares of the residuals and ˆβ = X t X) 1 X t y is the least squares estimate or the maximum likelihood estimate) for β..1 Bayesian Predictive Analysis with non informative prior distribution Suppose that β, σ 0 and µ and Σ are, a priori, independently distributed. Applying Jeffreys rules for the specification of the marginal distributions, the non informative joint prior distribution is i=1 3

4 pβ, σ 0, µ,σ) = pβ)pσ 0)pµ)pΣ) σ 0) 1 Σ 3/. 10) The posterior distributions of the parameters are σ0 y,x k1 GI, k1s µ Σ,m,V N m, 1 n Σ ) ), β σ 0,y,X N 4 β, σ 0 X t X) 1), and Σ m,v Inv W n 1) V 1 ). For a future individual, y n+1,x n+1) = y,x), the joint predictive distribution can be written as py,x y,x) = py x,y,x)px m,v) = + = py x, β, σ0)pβ, σ0 y,x)dσ 0dβ β δ 0 IR px µ,σ)pµ,σ m,v)dµdσ. 11) The marginal predictive density function of X, px m,v) is given by px m,v) [ x m)t n ) ] n )+ 1 n + 1 nn ) V x m), ) ) 1) n + 1 V that is, X m,v t ) n ;m,, and the marginal predictive distribution of Y is a n n Student-t, Y x,y,x tk 1;X 0 β, s C), 13) where C = I+X 0X t X) 1 X t 0 and X 0 = [1 x 1 n+1 x n+1 x 1 n+1x n+1], since the distribution of the dependent variable for the future individual is NX 0β, σ0i), β σ0,y,x N 4 β, σ0x t X) 1 ) and the marginal posterior distribution for σ 0 is GI k1, k1s ).. Optimal specification region and predictive probabilities Without loss of generality, admit that C y =, l). Recall that X IR and, therefore, C x IR. Also note that all the necessary information, extracted from D, is in y, X, m and V. The predictive probabilities can now be written in a more specific form as l 1. γ = PY C y D) = py x,y,x)px m,v)dx dy, IR predictive probability of a future individual to be a success ; l py x,y,x)px mv)dx dy C. δ = PY C y X C x, D) = x α, predictive probability of a selected individual to be a success ; 4

5 3. α = PX C x D) = px m,v)dx, C x predictive probability of an individual to be selected by the screening procedure; l py x, y, X)px m, V)dx dy C 4. ε = PY C y X / C x, x D) = c 1 α, predictive probability of a non selected individual to be a success. Under these conditions, the specification region is given by C x = { l x IR : } py x,y,x)dy k, 14) where k is such that px m,v)dx = α. Since the condition in 14) C x can not be solved analytically, it is not possible to fix α in advance and then obtain k and C x as functions of α. We must start by fixing the value of k. Second, C x k), the specification region for a fixed k, is obtained by approximation and finally its size is evaluated. We followed the procedure introduced by Antunes [1], described as follows: 1. Build a sufficiently fine grid G = {x 1i, x i) IR } such that PX G D) 1.. For each x G, calculate PY C y x,y,x). 3. For several values of appropriately chosen k, chosen in an adequate way, form the sets Ĉk) x = {x G : PY C y x,y,x) k}. 4. Fit the necessary smooth functions to the borders of Ĉ x k), to approximate further the specification region C k) x. 5. For each k, calculate ˆα k = PX Ĉk) x ). ˆα k and Ĉk) x are used instead of α k and C x k) to evaluate the predictive probabilities. The results are obtained by numerical integration. 3 Application One hundred observations x 1i, x i) from X = X 1, X ) N µ,σ) with µ = 0, 0) t and Σ 11 = Σ = 1.0 and Σ 1 = were generated. Then, the values of y i = α + β 1x 1i + β 3x i + β 3x 1ix i + ε i with α = 1.0, β 1 = 0.15, β = 0.5, β 3 = 0.3 and ε i N0, 1) were simulated. We chose l = 1.0, the 0. quantile of the generated data, and hence defined C y = {y : y 1.0}. 5

6 3.1 Specification Region and predictive probabilities In order to obtain the specification region we considered a grid with ) points covering the interval [ 4.0, 4.0] [ 4.0, 4.0]. The results were as follows. 1. Predictive probability of a future individual from the population to be a success: γ = PY C y D) = 1.0 py x,y,x)px m,v)dx dy = 0.1. IR. Conditional probability of an individual to be considered a success, given x [ 4.0, 4.0] [ 4.0, 4.0]: PY 1.0 x,y,x) was evaluated for x [ 4.0, 4.0] [ 4.0, 4.0]. Figure 1 shows the points x 1i, x i, PY 1.0 x,y,x)), for x 1i, x i) G. Figure 1: x 1i,x i,py 1.0 x,y,x) ). 3. The sets Ĉk) x = {x G : PY C y x,y,x) k} for k = 0., 0.3, 0.4 and 0.5, are represented in Figure. 4. The fitted functions to the borders of Ĉk) x : Ĉ0.) x = {x 1, x ) IR : x 1 > x x x x < x < 4.0) x 1 < < x < x x x x )}; Ĉ0.3) x = {x 1, x ) IR : x 1 > x x x x < x < 4.0) x 1 < < x < 0.086x x x x )}; Ĉ0.4) x = {x 1, x ) IR : x 1 > x x x x < x < 4.0) x 1 < < 6

7 Figure : Regions C xk) for k = 0., 0.3, 0.4 and 0.5. x < x x x x )}; 0.5) C x = {x1, x ) IR : x1 > x x x x < x < 4.0) x1 < < x < x x x x )}. k) 5. For each k, the size of the specification region Cx is approximated k) by the integral of the marginal predictive density of X over C x : α k) = P X Cxk) D) Results are presented in Table 1. k α k) Z k) px m, V)dx. C x Table 1: Size of the regions C xk). 6. To obtain the remaining predictive probabilities, it is necessary to calculate, for each k, Z 1.0 Z ζ k) = P X Cxk), Y Cy py x, y, X)px m, V)dx dy. k) C x Note that δ, the probability of a selected individual to be a success is given by ζ/α. Results are presented in Table. 7

8 k ζ k) Table : Probability of a selected individual to be a success. k γ α k) δ k) ε k) Table 3: γ =Pindividual to be success); α =Pindividual to be selected by the screening procedure); δ =Pselected individual to be a success); and ε =Pnon selected individual to be a success). A summary of the predictive probabilities is presented in Table 3. Note that γ, α and ζ were calculated directly whereas δ and ε are obtained through the expressions relating them to the three probabilities above. The specification region with size closer to γ was obtained for k = 0.3. Concerning the predictive capacity, with this region, the probability of an individual to be a success is about.3 times superior than the rate of successes in the unscreened population. When larger values of k are considered, the rate of successes increases significantly. However, in real situations, such regions may be a bad choice because of their small size. This means that in the first stage of the screening procedure only a very small proportion of the screened individual would be retained for posterior analysis and hence the screening procedure could become itself very expensive. The introduction of a cost function is useful to find a good compromise. 3. Simulation Study We generated data sets of size 100, according to the model considered earlier in the application. For each one of the data sets the values of the predictive probabilities were estimated by the adequate ratios, as described in Table 4. The mean and standard error of the estimates are in Table 5. Except for δ, the standard error of the estimates decreases as k increases. Note that these values are associated to specification regions of very small size, all of them being smaller than the probability of an individual to be a success. Also note that these regions are located in the tails of the joint distribution of the covariates. Therefore, the number of observations in such regions will always be very small, especially if the sample size is not large, originating larger standard error values. 8

9 predictive probability numerator denominator α {x Ĉx} 100 γ {y C y} 100 δ {x, y) : x Ĉx y Cy} {x Ĉx} ε {x, y) : x / Ĉx y Cy} {x / Ĉx} Table 4: α =Pindividual to be selected by the screening procedure); γ =Pindividual to be a success); δ =Pselected individual to be a success); and ε =Pnot selected individual to be a success). Mean k ˆγ ˆα ˆδ ˆε Standard Error k ˆγ ˆα ˆδ ˆε Table 5: Simulation study results: mean and standard error of the estimated predictive probabilities. 4 Conclusions The development of methodologies to obtain optimal specification regions in screening problems is of great importance and utility, not only in practical situations as the ones referred in this work, but also in situations where a huge amount of data is available but just a small part of it is really important to treat. In real situations, it is useful to include the costs associated with bad decisions. Since each region is the one producing the best results among all of its sizein predictive terms), the choice of the optimal procedure will fall in the choice of k leading to the best compromise of the predictive probabilities of interest. References [1] Antunes, M. Some Problems in Non-Linear Prediction, PhD Thesis. Departamento de Estatística e Investigação Operacional, Faculdade de Ciências, Universidade de Lisboa, 00. 9

10 [] Durão, N. Metodologia Bayesiana na Análise de Problemas de Triagem, PhD Thesis. Departamento de Estatística e Investigação Operacional, Faculdade de Ciências, Universidade de Lisboa, 004. [3] Turkman, K. F. and Amaral Turkman, M. A. Optimal screening methods. J. Royal Statist. Soc., B 51:87 95,

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

A Bayesian Treatment of Linear Gaussian Regression

A Bayesian Treatment of Linear Gaussian Regression A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,

More information

Harrison B. Prosper. Bari Lectures

Harrison B. Prosper. Bari Lectures Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 1 h Lecture 1 h Introduction h Classification h Grid Searches

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016 The Normal Linear Regression Model with Natural Conjugate Prior March 7, 2016 The Normal Linear Regression Model with Natural Conjugate Prior The plan Estimate simple regression model using Bayesian methods

More information

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y Predictor or Independent variable x Model with error: for i = 1,..., n, y i = α + βx i + ε i ε i : independent errors (sampling, measurement,

More information

Modeling Real Estate Data using Quantile Regression

Modeling Real Estate Data using Quantile Regression Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August

More information

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Confidence Intervals and Sets

Confidence Intervals and Sets Confidence Intervals and Sets Throughout we adopt the normal-error model, and wish to say some things about the construction of confidence intervals [and sets] for the parameters β 0 β 1. 1. Confidence

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring

More information

Introduction into Bayesian statistics

Introduction into Bayesian statistics Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Remarks on Improper Ignorance Priors

Remarks on Improper Ignorance Priors As a limit of proper priors Remarks on Improper Ignorance Priors Two caveats relating to computations with improper priors, based on their relationship with finitely-additive, but not countably-additive

More information

Monte Carlo Composition Inversion Acceptance/Rejection Sampling. Direct Simulation. Econ 690. Purdue University

Monte Carlo Composition Inversion Acceptance/Rejection Sampling. Direct Simulation. Econ 690. Purdue University Methods Econ 690 Purdue University Outline 1 Monte Carlo Integration 2 The Method of Composition 3 The Method of Inversion 4 Acceptance/Rejection Sampling Monte Carlo Integration Suppose you wish to calculate

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No. 7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2015 Julien Berestycki (University of Oxford) SB2a MT 2015 1 / 16 Lecture 16 : Bayesian analysis

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics 9. Model Selection - Theory David Giles Bayesian Econometrics One nice feature of the Bayesian analysis is that we can apply it to drawing inferences about entire models, not just parameters. Can't do

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014 ECO 312 Fall 2013 Chris Sims Regression January 12, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License What

More information

Massachusetts Institute of Technology Department of Economics Time Series Lecture 6: Additional Results for VAR s

Massachusetts Institute of Technology Department of Economics Time Series Lecture 6: Additional Results for VAR s Massachusetts Institute of Technology Department of Economics Time Series 14.384 Guido Kuersteiner Lecture 6: Additional Results for VAR s 6.1. Confidence Intervals for Impulse Response Functions There

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Nonparametric Regression With Gaussian Processes

Nonparametric Regression With Gaussian Processes Nonparametric Regression With Gaussian Processes From Chap. 45, Information Theory, Inference and Learning Algorithms, D. J. C. McKay Presented by Micha Elsner Nonparametric Regression With Gaussian Processes

More information

Bayesian Ingredients. Hedibert Freitas Lopes

Bayesian Ingredients. Hedibert Freitas Lopes Normal Prior s Ingredients Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu

More information

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate

More information

A Measure of Robustness to Misspecification

A Measure of Robustness to Misspecification A Measure of Robustness to Misspecification Susan Athey Guido W. Imbens December 2014 Graduate School of Business, Stanford University, and NBER. Electronic correspondence: athey@stanford.edu. Graduate

More information

Bayesian Inference: Concept and Practice

Bayesian Inference: Concept and Practice Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Matrices and Multivariate Statistics - II

Matrices and Multivariate Statistics - II Matrices and Multivariate Statistics - II Richard Mott November 2011 Multivariate Random Variables Consider a set of dependent random variables z = (z 1,..., z n ) E(z i ) = µ i cov(z i, z j ) = σ ij =

More information

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution. The binomial model Example. After suspicious performance in the weekly soccer match, 37 mathematical sciences students, staff, and faculty were tested for the use of performance enhancing analytics. Let

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

AMS-207: Bayesian Statistics

AMS-207: Bayesian Statistics Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.

More information

Model Assessment and Comparisons

Model Assessment and Comparisons Model Assessment and Comparisons Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

Linear Models Review

Linear Models Review Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Lecture 13 and 14: Bayesian estimation theory

Lecture 13 and 14: Bayesian estimation theory 1 Lecture 13 and 14: Bayesian estimation theory Spring 2012 - EE 194 Networked estimation and control (Prof. Khan) March 26 2012 I. BAYESIAN ESTIMATORS Mother Nature conducts a random experiment that generates

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object

More information

Conjugate Analysis for the Linear Model

Conjugate Analysis for the Linear Model Conjugate Analysis for the Linear Model If we have good prior knowledge that can help us specify priors for β and σ 2, we can use conjugate priors. Following the procedure in Christensen, Johnson, Branscum,

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Bayesian Interpretations of Regularization

Bayesian Interpretations of Regularization Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S

More information

Spatial and temporal extremes of wildfire sizes in Portugal ( )

Spatial and temporal extremes of wildfire sizes in Portugal ( ) International Journal of Wildland Fire 2009, 18, 983 991. doi:10.1071/wf07044_ac Accessory publication Spatial and temporal extremes of wildfire sizes in Portugal (1984 2004) P. de Zea Bermudez A, J. Mendes

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

A new Hierarchical Bayes approach to ensemble-variational data assimilation

A new Hierarchical Bayes approach to ensemble-variational data assimilation A new Hierarchical Bayes approach to ensemble-variational data assimilation Michael Tsyrulnikov and Alexander Rakitko HydroMetCenter of Russia College Park, 20 Oct 2014 Michael Tsyrulnikov and Alexander

More information

Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection

Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection Prof. Nicholas Zabaras University of Notre Dame Notre Dame, IN, USA Email:

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Likelihood Ratio Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 42

Likelihood Ratio Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 42 Likelihood Ratio Test of a General Linear Hypothesis Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 42 Consider the Likelihood Ratio Test of H 0 : Cβ = d vs H A : Cβ d. Suppose

More information

Accounting for Complex Sample Designs via Mixture Models

Accounting for Complex Sample Designs via Mixture Models Accounting for Complex Sample Designs via Finite Normal Mixture Models 1 1 University of Michigan School of Public Health August 2009 Talk Outline 1 2 Accommodating Sampling Weights in Mixture Models 3

More information

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression 1/37 The linear regression model Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression Ken Rice Department of Biostatistics University of Washington 2/37 The linear regression model

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

DRAGON ADVANCED TRAINING COURSE IN ATMOSPHERE REMOTE SENSING. Inversion basics. Erkki Kyrölä Finnish Meteorological Institute

DRAGON ADVANCED TRAINING COURSE IN ATMOSPHERE REMOTE SENSING. Inversion basics. Erkki Kyrölä Finnish Meteorological Institute Inversion basics y = Kx + ε x ˆ = (K T K) 1 K T y Erkki Kyrölä Finnish Meteorological Institute Day 3 Lecture 1 Retrieval techniques - Erkki Kyrölä 1 Contents 1. Introduction: Measurements, models, inversion

More information

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is Example The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is log p 1 p = β 0 + β 1 f 1 (y 1 ) +... + β d f d (y d ).

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Lecture 1 Basic Statistical Machinery

Lecture 1 Basic Statistical Machinery Lecture 1 Basic Statistical Machinery Bruce Walsh. jbwalsh@u.arizona.edu. University of Arizona. ECOL 519A, Jan 2007. University of Arizona Probabilities, Distributions, and Expectations Discrete and Continuous

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

MID-TERM EXAM ANSWERS. p t + δ t = Rp t 1 + η t (1.1)

MID-TERM EXAM ANSWERS. p t + δ t = Rp t 1 + η t (1.1) ECO 513 Fall 2005 C.Sims MID-TERM EXAM ANSWERS (1) Suppose a stock price p t and the stock dividend δ t satisfy these equations: p t + δ t = Rp t 1 + η t (1.1) δ t = γδ t 1 + φp t 1 + ε t, (1.2) where

More information

ECON 3150/4150, Spring term Lecture 6

ECON 3150/4150, Spring term Lecture 6 ECON 3150/4150, Spring term 2013. Lecture 6 Review of theoretical statistics for econometric modelling (II) Ragnar Nymoen University of Oslo 31 January 2013 1 / 25 References to Lecture 3 and 6 Lecture

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

FENG CHIA UNIVERSITY ECONOMETRICS I: HOMEWORK 4. Prof. Mei-Yuan Chen Spring 2008

FENG CHIA UNIVERSITY ECONOMETRICS I: HOMEWORK 4. Prof. Mei-Yuan Chen Spring 2008 FENG CHIA UNIVERSITY ECONOMETRICS I: HOMEWORK 4 Prof. Mei-Yuan Chen Spring 008. Partition and rearrange the matrix X as [x i X i ]. That is, X i is the matrix X excluding the column x i. Let u i denote

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Use of Bayesian multivariate prediction models to optimize chromatographic methods

Use of Bayesian multivariate prediction models to optimize chromatographic methods Use of Bayesian multivariate prediction models to optimize chromatographic methods UCB Pharma! Braine lʼalleud (Belgium)! May 2010 Pierre Lebrun, ULg & Arlenda Bruno Boulanger, ULg & Arlenda Philippe Lambert,

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information