GWAS IV: Bayesian linear (variance component) models
|
|
- Byron Dean
- 5 years ago
- Views:
Transcription
1 GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian linear models Summer
2 Motivation Regression Lineare regression: Making predictions Comparison of alternative models Bayesian and regularized regression: Uncertainty in model parameters Generalized basis functions Y X x* Oliver Stegle GWAS IV: Bayesian linear models Summer
3 Motivation Regression Lineare regression: Making predictions Comparison of alternative models Bayesian and regularized regression: Uncertainty in model parameters Generalized basis functions Y X x* Oliver Stegle GWAS IV: Bayesian linear models Summer
4 Motivation Regression Lineare regression: Making predictions Comparison of alternative models Bayesian and regularized regression: Uncertainty in model parameters Generalized basis functions Y X x* Oliver Stegle GWAS IV: Bayesian linear models Summer
5 Motivation Further reading, useful material Christopher M. Bishop: Pattern Recognition and Machine learning [Bishop, 2006] Sam Roweis: Gaussian identities [Roweis, 1999] Oliver Stegle GWAS IV: Bayesian linear models Summer
6 Outline Outline Oliver Stegle GWAS IV: Bayesian linear models Summer
7 Linear Regression II Outline Oliver Stegle GWAS IV: Bayesian linear models Summer
8 Linear Regression II Regression Noise model and likelihood Given a dataset D = {x n, y n } N n=1, where x n = {x n,1,..., x n,s } is S dimensional (for example S SNPs), fit parameters θ of a regressor f with added Gaussian noise: y n = f(x n ; θ) + ɛ n where p(ɛ σ 2 ) = N ( ɛ 0, σ 2). Equivalent likelihood formulation: p(y X) = N N ( y n f(x n ), σ 2) n=1 Oliver Stegle GWAS IV: Bayesian linear models Summer
9 Linear Regression II Regression Choosing a regressor Choose f to be linear: p(y X) = N N ( y n xn θ + c, σ 2) n=1 Consider bias free case, c = 0, otherwise inlcude an additional column of ones in each x n. Oliver Stegle GWAS IV: Bayesian linear models Summer
10 Linear Regression II Regression Choosing a regressor Choose f to be linear: N p(y X) = N ( y n xn θ + c, σ 2) n=1 Consider bias free case, c = 0, otherwise inlcude an additional column of ones in each x n. Equivalent graphical model Oliver Stegle GWAS IV: Bayesian linear models Summer
11 Linear Regression II Linear Regression Maximum likelihood Taking the logarithm, we obtain ln p(y θ, X, σ 2 ) = N ln N ( y n xn θ, σ 2) n=1 = N 2 ln 2πσ2 1 2σ 2 N (y n x n θ) 2 n=1 } {{ } Sum of squares The likelihood is maximized when the squared error is minimized. Least squares and maximum likelihood are equivalent. Oliver Stegle GWAS IV: Bayesian linear models Summer
12 Linear Regression II Linear Regression Maximum likelihood Taking the logarithm, we obtain ln p(y θ, X, σ 2 ) = N ln N ( y n xn θ, σ 2) n=1 = N 2 ln 2πσ2 1 2σ 2 N (y n x n θ) 2 n=1 } {{ } Sum of squares The likelihood is maximized when the squared error is minimized. Least squares and maximum likelihood are equivalent. Oliver Stegle GWAS IV: Bayesian linear models Summer
13 Linear Regression II Linear Regression Maximum likelihood Taking the logarithm, we obtain ln p(y θ, X, σ 2 ) = N ln N ( y n xn θ, σ 2) n=1 = N 2 ln 2πσ2 1 2σ 2 N (y n x n θ) 2 n=1 } {{ } Sum of squares The likelihood is maximized when the squared error is minimized. Least squares and maximum likelihood are equivalent. Oliver Stegle GWAS IV: Bayesian linear models Summer
14 Linear Regression II Linear Regression and Least Squares y y n f (xn, w) x n x (C.M. Bishop, Pattern Recognition and Machine Learning) E(θ) = 1 2 N (y n x n θ) 2 n=1 Oliver Stegle GWAS IV: Bayesian linear models Summer
15 Linear Regression II Linear Regression and Least Squares Derivative w.r.t a single weight entry θ i [ ] d ln p(y θ, σ 2 ) = d 1 N dθ i dθ i 2σ 2 (y n x n θ) 2 Set gradient w.r.t to θ to zero n=1 = 1 N σ 2 (y n x n θ)x i n=1 θ ln p(y θ, σ 2 ) = 1 σ 2 N n=1 = θ ML = (X T X) 1 X T y }{{} Pseudo inverse Here, the matrix X is defined as X = (y n x n θ)x T n = 0 x 1,1... x1, D x N,1... x N,D Oliver Stegle GWAS IV: Bayesian linear models Summer
16 Linear Regression II Linear Regression and Least Squares Derivative w.r.t a single weight entry θ i [ ] d ln p(y θ, σ 2 ) = d 1 N dθ i dθ i 2σ 2 (y n x n θ) 2 Set gradient w.r.t to θ to zero n=1 = 1 N σ 2 (y n x n θ)x i n=1 θ ln p(y θ, σ 2 ) = 1 σ 2 N n=1 = θ ML = (X T X) 1 X T y }{{} Pseudo inverse Here, the matrix X is defined as X = (y n x n θ)x T n = 0 x 1,1... x1, D x N,1... x N,D Oliver Stegle GWAS IV: Bayesian linear models Summer
17 Linear Regression II Linear Regression and Least Squares Derivative w.r.t a single weight entry θ i [ ] d ln p(y θ, σ 2 ) = d 1 N dθ i dθ i 2σ 2 (y n x n θ) 2 Set gradient w.r.t to θ to zero n=1 = 1 N σ 2 (y n x n θ)x i n=1 θ ln p(y θ, σ 2 ) = 1 σ 2 N n=1 = θ ML = (X T X) 1 X T y }{{} Pseudo inverse Here, the matrix X is defined as X = (y n x n θ)x T n = 0 x 1,1... x1, D x N,1... x N,D Oliver Stegle GWAS IV: Bayesian linear models Summer
18 Linear Regression II Polynomial Curve Fitting Motivation Non-linear relationships. Multiple SNPs playing a role for a particular phenotype. Y X x* Oliver Stegle GWAS IV: Bayesian linear models Summer
19 Linear Regression II Polynomial Curve Fitting Univariate input x Use the polynomials up to degree K to construct new features from x f(x, θ) = θ 0 + θ 1 x + θ 2 x θ K x K = K θ k φ k (x) = θ T φ(x) k=1 where we defined φ(x) = (1, x, x 2,..., x K ). φ can be any feature mapping. Possible to show: the feature map φ can be expressed in terms of kernels (kernel trick). Oliver Stegle GWAS IV: Bayesian linear models Summer
20 Linear Regression II Polynomial Curve Fitting Univariate input x Use the polynomials up to degree K to construct new features from x f(x, θ) = θ 0 + θ 1 x + θ 2 x θ K x K = K θ k φ k (x) = θ T φ(x) k=1 where we defined φ(x) = (1, x, x 2,..., x K ). φ can be any feature mapping. Possible to show: the feature map φ can be expressed in terms of kernels (kernel trick). Oliver Stegle GWAS IV: Bayesian linear models Summer
21 Linear Regression II Polynomial Curve Fitting Overfitting The degree of the polynomial is crucial to avoid under- and overfitting. t 1 M = x 1 (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer
22 Linear Regression II Polynomial Curve Fitting Overfitting The degree of the polynomial is crucial to avoid under- and overfitting. t 1 M = x 1 (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer
23 Linear Regression II Polynomial Curve Fitting Overfitting The degree of the polynomial is crucial to avoid under- and overfitting. t 1 M = x 1 (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer
24 Linear Regression II Polynomial Curve Fitting Overfitting The degree of the polynomial is crucial to avoid under- and overfitting. t 1 M = x 1 (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer
25 Linear Regression II Multivariate regression Polynomial curve fitting Multivariate regression (SNPs) f(x, θ) = θ 0 + θ 1 x + + θ K x K K = θ k φ k (x) k=1 = φ(x) θ, S f(x, θ) = θ s x s s=1 = x θ Note: When fitting a single binary SNP genotype x i, a linear model is most general! Oliver Stegle GWAS IV: Bayesian linear models Summer
26 Linear Regression II Multivariate regression Polynomial curve fitting Multivariate regression (SNPs) f(x, θ) = θ 0 + θ 1 x + + θ K x K K = θ k φ k (x) k=1 = φ(x) θ, S f(x, θ) = θ s x s s=1 = x θ Note: When fitting a single binary SNP genotype x i, a linear model is most general! Oliver Stegle GWAS IV: Bayesian linear models Summer
27 Linear Regression II Regularized Least Squares Solutions to avoid overfitting: 1. Intelligently choose number of dimensions 2. Regularize the regression weights θ Quadratically regularized objective function E(θ) = 1 2 N (y n φ(x n ) θ) 2 n=1 } {{ } Squared error + λ 2 θt θ }{{} Regularizer Oliver Stegle GWAS IV: Bayesian linear models Summer
28 Linear Regression II Regularized Least Squares Solutions to avoid overfitting: 1. Intelligently choose number of dimensions 2. Regularize the regression weights θ Quadratically regularized objective function E(θ) = 1 2 N (y n φ(x n ) θ) 2 n=1 } {{ } Squared error + λ 2 θt θ }{{} Regularizer Oliver Stegle GWAS IV: Bayesian linear models Summer
29 Linear Regression II Regularized Least Squares More general regularizers More general regularization: E(θ) = 1 N (y n φ(x n ) θ) 2 + λ D θ d q 2 2 n=1 d=1 }{{}}{{} Squared error Regularizer Oliver Stegle GWAS IV: Bayesian linear models Summer
30 Linear Regression II Regularized Least Squares More general regularizers More general regularization: E(θ) = 1 N (y n φ(x n ) θ) 2 + λ D θ d q 2 2 n=1 d=1 }{{}}{{} Squared error Regularizer q = 0.5 q = 1 q = 2 q = 4 (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer
31 Linear Regression II Regularized Least Squares More general regularizers More general regularization: E(θ) = 1 N (y n φ(x n ) θ) 2 + λ D θ d q 2 2 n=1 d=1 }{{}}{{} Squared error Regularizer sparse q = 0.5 q = 1 q = 2 q = 4 Lasso Quadratic (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer
32 Linear Regression II Loss functions and related methods Even more general: general loss function E(θ) = 1 N L(y n φ(x n ) θ) + λ D θ d q 2 2 n=1 d=1 }{{}}{{} Loss Regularizer Many state-of-the-art machine learning methods can be expressed within this framework. Linear Regression: squared loss, squared regularizer. Support Vector Machine: hinge loss, squared regularizer. Lasso: squared loss, L1 regularizer. Inference: minimize the cost function E(θ), yielding a point estimate for θ. Q: How to determine q and the a suitable loss function? Oliver Stegle GWAS IV: Bayesian linear models Summer
33 Linear Regression II Loss functions and related methods Even more general: general loss function E(θ) = 1 N L(y n φ(x n ) θ) + λ D θ d q 2 2 n=1 d=1 }{{}}{{} Loss Regularizer Many state-of-the-art machine learning methods can be expressed within this framework. Linear Regression: squared loss, squared regularizer. Support Vector Machine: hinge loss, squared regularizer. Lasso: squared loss, L1 regularizer. Inference: minimize the cost function E(θ), yielding a point estimate for θ. Q: How to determine q and the a suitable loss function? Oliver Stegle GWAS IV: Bayesian linear models Summer
34 Linear Regression II Loss functions and related methods Even more general: general loss function E(θ) = 1 N L(y n φ(x n ) θ) + λ D θ d q 2 2 n=1 d=1 }{{}}{{} Loss Regularizer Many state-of-the-art machine learning methods can be expressed within this framework. Linear Regression: squared loss, squared regularizer. Support Vector Machine: hinge loss, squared regularizer. Lasso: squared loss, L1 regularizer. Inference: minimize the cost function E(θ), yielding a point estimate for θ. Q: How to determine q and the a suitable loss function? Oliver Stegle GWAS IV: Bayesian linear models Summer
35 Linear Regression II Loss functions and related methods Even more general: general loss function E(θ) = 1 N L(y n φ(x n ) θ) + λ D θ d q 2 2 n=1 d=1 }{{}}{{} Loss Regularizer Many state-of-the-art machine learning methods can be expressed within this framework. Linear Regression: squared loss, squared regularizer. Support Vector Machine: hinge loss, squared regularizer. Lasso: squared loss, L1 regularizer. Inference: minimize the cost function E(θ), yielding a point estimate for θ. Q: How to determine q and the a suitable loss function? Oliver Stegle GWAS IV: Bayesian linear models Summer
36 Linear Regression II Loss functions and related methods Cross validation: minimization of expected loss For each candidate model H: Split data into K folds Training-test evaluation for each fold Assess average loss on test set E H = 1 K L test k K k=1 test set Total number of samples training set fold 1 fold 2 fold 3 Oliver Stegle GWAS IV: Bayesian linear models Summer
37 Linear Regression II Probabilistic interpretation So far: minimization of error functions. Back to probabilities? E(θ) = 1 N (y n φ(x n ) θ) 2 2 n=1 }{{} Squared error + λ 2 θt θ }{{} Regularizer Most alternative choices of regularizers and loss functions can be mapped to an equivalent probabilistic representation in a similar way. Oliver Stegle GWAS IV: Bayesian linear models Summer
38 Linear Regression II Probabilistic interpretation So far: minimization of error functions. Back to probabilities? E(θ) = 1 N (y n φ(x n ) θ) 2 2 n=1 }{{} Squared error N = ln N ( y n φ(xn ) θ, σ 2) n=1 + λ 2 θt θ }{{} Regularizer ( ln N θ 0, 1 ) λ I Most alternative choices of regularizers and loss functions can be mapped to an equivalent probabilistic representation in a similar way. Oliver Stegle GWAS IV: Bayesian linear models Summer
39 Linear Regression II Probabilistic interpretation So far: minimization of error functions. Back to probabilities? E(θ) = 1 N (y n φ(x n ) θ) 2 2 n=1 }{{} Squared error N = ln N ( y n φ(x n ) θ, σ 2) n=1 + λ 2 θt θ }{{} Regularizer ( ln N θ 0, 1 ) λ I = ln p(y θ, Φ(X), σ 2 ) ln p(θ) Most alternative choices of regularizers and loss functions can be mapped to an equivalent probabilistic representation in a similar way. Oliver Stegle GWAS IV: Bayesian linear models Summer
40 Linear Regression II Probabilistic interpretation So far: minimization of error functions. Back to probabilities? E(θ) = 1 N (y n φ(x n ) θ) 2 2 n=1 }{{} Squared error N = ln N ( y n φ(x n ) θ, σ 2) n=1 + λ 2 θt θ }{{} Regularizer ( ln N θ 0, 1 ) λ I = ln p(y θ, Φ(X), σ 2 ) ln p(θ) Most alternative choices of regularizers and loss functions can be mapped to an equivalent probabilistic representation in a similar way. Oliver Stegle GWAS IV: Bayesian linear models Summer
41 Bayesian linear regression Outline Oliver Stegle GWAS IV: Bayesian linear models Summer
42 Bayesian linear regression Bayesian linear regression Likelihood as before N p(y X, θ, σ 2 ) = N ( y n φ(x n ) θ, σ 2) n=1 Define a conjugate prior over θ p(θ) = N (θ m 0, S 0 ) Oliver Stegle GWAS IV: Bayesian linear models Summer
43 Bayesian linear regression Bayesian linear regression Likelihood as before N p(y X, θ, σ 2 ) = N ( y n φ(x n ) θ, σ 2) n=1 Define a conjugate prior over θ p(θ) = N (θ m 0, S 0 ) Oliver Stegle GWAS IV: Bayesian linear models Summer
44 Bayesian linear regression Bayesian linear regression Posterior probability of θ p(θ y, X, σ 2 ) N N ( y n φ(xn ) θ, σ 2) N (θ m 0, S 0 ) n=1 = N ( y Φ(X) θ, σ 2 I ) N (θ m 0, S 0 ) = N ( θ µ θ, Σ θ ) where ( µ θ = Σ θ S 1 0 m ) σ 2 Φ(X)T y [ Σ θ = S ] 1 σ 2 Φ(X)T Φ(X) Oliver Stegle GWAS IV: Bayesian linear models Summer
45 Bayesian linear regression Bayesian linear regression Prior choice Choice of prior: regularized (ridge) regression In this case p(θ) = N ( θ m 0, S 0 ). p(θ y, X, σ 2 ) N ( θ ) µ θ, Σ ( θ µ θ = Σ θ S 1 0 m ) σ 2 Φ(X)T y [ Σ θ = S ] 1 σ 2 Φ(X)T Φ(X) Equivalent to maximum likelihood estimate for λ 0! Oliver Stegle GWAS IV: Bayesian linear models Summer
46 Bayesian linear regression Bayesian linear regression Prior choice Choice of prior: regularized (ridge) regression In this case p(θ) = N ( θ 0, 1 λ I). p(θ y, X, σ 2 ) N ( θ µ θ, Σ θ ) µ θ = Σ θ ( 1 Σ θ = ) σ 2 Φ(X)T y [ λi + 1 σ 2 Φ(X)T Φ(X) Equivalent to maximum likelihood estimate for λ 0! ] 1 Oliver Stegle GWAS IV: Bayesian linear models Summer
47 Bayesian linear regression Bayesian linear regression Prior choice Choice of prior: regularized (ridge) regression In this case p(θ y, X, σ 2 ) N ( θ µ θ, Σ θ ) µ θ = Σ θ ( 1 Σ θ = ) σ 2 Φ(X)T y [ λi + 1 σ 2 Φ(X)T Φ(X) Equivalent to maximum likelihood estimate for λ 0! ] 1 Oliver Stegle GWAS IV: Bayesian linear models Summer
48 Bayesian linear regression Bayesian linear regression Example 0 Data points (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer
49 Bayesian linear regression Bayesian linear regression Example 1 Data point (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer
50 Bayesian linear regression Bayesian linear regression Example 20 Data points (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer
51 Bayesian linear regression Making predictions Prediction for fixed weight ˆθ at input x trivial: p(y x, ˆθ, σ 2 ) = N (y φ(x )ˆθ, σ 2) Integrate over θ to take the posterior uncertainty into account p(y x, D) = θ p(y x, θ, σ 2 )p(θ X, y, σ 2 ) = θ N ( y φ(x )θ, σ 2) N ( θ ) µ θ, Σ θ = N ( y φ(x ) µ θ, σ 2 + φ(x ) T Σ θ φ(x ) ) Key: prediction is again Gaussian Predictive variance is increase due to the posterior uncertainty in θ. Oliver Stegle GWAS IV: Bayesian linear models Summer
52 Bayesian linear regression Making predictions Prediction for fixed weight ˆθ at input x trivial: p(y x, ˆθ, σ 2 ) = N (y φ(x )ˆθ, σ 2) Integrate over θ to take the posterior uncertainty into account p(y x, D) = θ p(y x, θ, σ 2 )p(θ X, y, σ 2 ) = θ N ( y φ(x )θ, σ 2) N ( θ ) µθ, Σ θ = N ( y φ(x ) µ θ, σ 2 + φ(x ) T Σ θ φ(x ) ) Key: prediction is again Gaussian Predictive variance is increase due to the posterior uncertainty in θ. Oliver Stegle GWAS IV: Bayesian linear models Summer
53 Bayesian linear regression Making predictions Prediction for fixed weight ˆθ at input x trivial: p(y x, ˆθ, σ 2 ) = N (y φ(x )ˆθ, σ 2) Integrate over θ to take the posterior uncertainty into account p(y x, D) = θ p(y x, θ, σ 2 )p(θ X, y, σ 2 ) = θ N ( y φ(x )θ, σ 2) N ( θ ) µθ, Σ θ = N ( y φ(x ) µ θ, σ 2 + φ(x ) T Σ θ φ(x ) ) Key: prediction is again Gaussian Predictive variance is increase due to the posterior uncertainty in θ. Oliver Stegle GWAS IV: Bayesian linear models Summer
54 Model comparison and hypothesis testing Outline Oliver Stegle GWAS IV: Bayesian linear models Summer
55 Model comparison and hypothesis testing Model comparison Motivation What degree of polynomials describes the data best? Is the linear model at all appropriate? Association testing. Oliver Stegle GWAS IV: Bayesian linear models Summer
56 Model comparison and hypothesis testing Model comparison Motivation What degree of polynomials describes the data best? Is the linear model at all appropriate? Association testing. Genome? Phenome individuals ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT yy 1 phenotypes SNPs Oliver Stegle GWAS IV: Bayesian linear models Summer
57 Model comparison and hypothesis testing Bayesian model comparison How do we choose among alternative models? Assume we want to choose among models H 0,..., H M for a dataset D. Posterior probability for a particular model i p(h i D) p(d H i ) p(h }{{} i ) }{{} Evidence Prior Oliver Stegle GWAS IV: Bayesian linear models Summer
58 Model comparison and hypothesis testing Bayesian model comparison How do we choose among alternative models? Assume we want to choose among models H 0,..., H M for a dataset D. Posterior probability for a particular model i p(h i D) p(d H i ) p(h }{{} i ) }{{} Evidence Prior Oliver Stegle GWAS IV: Bayesian linear models Summer
59 Model comparison and hypothesis testing Bayesian model comparison How to calculate the evidence The evidence is not the model likelihood! p(d H i ) = dθp(d Θ)p(Θ) for model parameters Θ. Θ Remember: p(θ H i, D) = p(d H i, Θ)p(Θ) p(d H i ) Oliver Stegle GWAS IV: Bayesian linear models Summer
60 Model comparison and hypothesis testing Bayesian model comparison How to calculate the evidence The evidence is not the model likelihood! p(d H i ) = dθp(d Θ)p(Θ) for model parameters Θ. Θ Remember: p(θ H i, D) = p(d H i, Θ)p(Θ) p(d H i ) likelihood prior posterior = Evidence Oliver Stegle GWAS IV: Bayesian linear models Summer
61 Model comparison and hypothesis testing Bayesian model comparison Ocam s razor The evidence integral penalizes overly complex models. A model with few parameters and lower maximum likelihood (H 1 ) may win over a model with a peaked likelihood that requires many more parameters (H 2 ). Likelihood H2 H1 w MAP w (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer
62 Model comparison and hypothesis testing Bayesian model comparison Ocam s razor The evidence integral penalizes overly complex models. A model with few parameters and lower maximum likelihood (H 1 ) may win over a model with a peaked likelihood that requires many more parameters (H 2 ). Likelihood H2 H1 w MAP w (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer
63 Model comparison and hypothesis testing Application to GWA Relevance of a single SNP Consider an association study. H 0 : no association p(y H 0, X, Θ 0 ) = N ( y 0, σ 2 I ) p(d H 0 ) = N ( y 0, σ 2 I ) p(σ 2 ) σ 2 H1 : linear association p(y H 1, x i, Θ 1 ) = N ( y xi θ, σ 2 I ) p(d H 1 ) = N ( y xi θ, σ 2 I ) p(σ 2 )p(θ) σ 2,θ Depending on the choice of priors, p(σ 2 ) and p(θ), the required integrals are often tractable in closed form. Oliver Stegle GWAS IV: Bayesian linear models Summer
64 Model comparison and hypothesis testing Application to GWA Relevance of a single SNP Consider an association study. H 0 : no association p(y H 0, X, Θ 0 ) = N ( y 0, σ 2 I ) p(d H 0 ) = N ( y 0, σ 2 I ) p(σ 2 ) σ 2 H1 : linear association p(y H 1, x i, Θ 1 ) = N ( y xi θ, σ 2 I ) p(d H 1 ) = N ( y xi θ, σ 2 I ) p(σ 2 )p(θ) σ 2,θ Depending on the choice of priors, p(σ 2 ) and p(θ), the required integrals are often tractable in closed form. Oliver Stegle GWAS IV: Bayesian linear models Summer
65 Model comparison and hypothesis testing Application to GWA Relevance of a single SNP Consider an association study. H 0 : no association p(y H 0, X, Θ 0 ) = N ( y 0, σ 2 I ) p(d H 0 ) = N ( y 0, σ 2 I ) p(σ 2 ) σ 2 H1 : linear association p(y H 1, x i, Θ 1 ) = N ( y xi θ, σ 2 I ) p(d H 1 ) = N ( y xi θ, σ 2 I ) p(σ 2 )p(θ) σ 2,θ Depending on the choice of priors, p(σ 2 ) and p(θ), the required integrals are often tractable in closed form. Oliver Stegle GWAS IV: Bayesian linear models Summer
66 Model comparison and hypothesis testing Application to GWA Scoring models Similar to likelihood ratios, the ratio of the evidences, the Bayes factor can be used to score alternative models: BF = ln p(d H 1) p(d H 0 ). Oliver Stegle GWAS IV: Bayesian linear models Summer
67 Model comparison and hypothesis testing Application to GWA Scoring models Similar to likelihood ratios, the ratio of the evidences, the Bayes factor can be used to score alternative models: BF = ln p(d H 1) p(d H 0 ) SLC35B4 SLC35B4 LOD/BF 10 5 FPR 0.01% FPR 0.01% Position in chr. 7 x 10 8 Oliver Stegle GWAS IV: Bayesian linear models Summer
68 Model comparison and hypothesis testing Application to GWA Posterior probability of an association Bayes factors are useful, however we would like a probabilistic answer how certain an association really is. Posterior probability of H 1 p(h 1 D) = p(d H 1)p(H 1 ) p(d) p(d H 1 )p(h 1 ) = p(d H 1 )p(h 1 ) + p(d H 0 )p(h 0 ) p(h 1 D) + p(h 0 D) = 1, prior probability of observing a real association. Oliver Stegle GWAS IV: Bayesian linear models Summer
69 Model comparison and hypothesis testing Application to GWA Posterior probability of an association Bayes factors are useful, however we would like a probabilistic answer how certain an association really is. Posterior probability of H 1 p(h 1 D) = p(d H 1)p(H 1 ) p(d) p(d H 1 )p(h 1 ) = p(d H 1 )p(h 1 ) + p(d H 0 )p(h 0 ) p(h 1 D) + p(h 0 D) = 1, prior probability of observing a real association. Oliver Stegle GWAS IV: Bayesian linear models Summer
70 Model comparison and hypothesis testing Application to GWA Posterior probability of an association Bayes factors are useful, however we would like a probabilistic answer how certain an association really is. Posterior probability of H 1 p(h 1 D) = p(d H 1)p(H 1 ) p(d) p(d H 1 )p(h 1 ) = p(d H 1 )p(h 1 ) + p(d H 0 )p(h 0 ) p(h 1 D) + p(h 0 D) = 1, prior probability of observing a real association. Oliver Stegle GWAS IV: Bayesian linear models Summer
71 Model comparison and hypothesis testing Bayes factor verus likelihood ratio Bayes factor Models of different complexity can be objectively compared. Statistical significance as posterior probability of a model. Typically hard to compute. Likelihood ratio Likelihood ratio scales with the number of parameters. Likelihood ratios have known null distribution, yielding p-values. Often easy to compute. Oliver Stegle GWAS IV: Bayesian linear models Summer
72 Model comparison and hypothesis testing Bayes factor verus likelihood ratio Bayes factor Models of different complexity can be objectively compared. Statistical significance as posterior probability of a model. Typically hard to compute. Likelihood ratio Likelihood ratio scales with the number of parameters. Likelihood ratios have known null distribution, yielding p-values. Often easy to compute. Oliver Stegle GWAS IV: Bayesian linear models Summer
73 Model comparison and hypothesis testing Marginal likelihood of variance component models Consider a linear model, ( accounting for ) a set of measured SNPs X p(y X, θ, σ 2 S ) = N y x s θ s, σ 2 I s=1 Choose identical Gaussian prior for all weights S p(θ) = N ( ) θ s 0, σ 2 g s=1 Marginal likelihood p(y X, ) = θ N ( y Xθ, σ 2 I ) N ( θ 0, σgi 2 ) Number of hyperparameters independent of number of SNPs Oliver Stegle GWAS IV: Bayesian linear models Summer
74 Model comparison and hypothesis testing Marginal likelihood of variance component models Consider a linear model, ( accounting for ) a set of measured SNPs X p(y X, θ, σ 2 S ) = N y x s θ s, σ 2 I s=1 Choose identical Gaussian prior for all weights S p(θ) = N ( ) θ s 0, σ 2 g s=1 Marginal likelihood p(y X, ) = θ N ( y Xθ, σ 2 I ) N ( θ 0, σgi 2 ) Number of hyperparameters independent of number of SNPs Oliver Stegle GWAS IV: Bayesian linear models Summer
75 Model comparison and hypothesis testing Marginal likelihood of variance component models Consider a linear model, ( accounting for ) a set of measured SNPs X p(y X, θ, σ 2 S ) = N y x s θ s, σ 2 I s=1 Choose identical Gaussian prior for all weights S p(θ) = N ( θ s 0, σg 2 ) s=1 Marginal likelihood p(y X, σ 2, σg) 2 = θ N ( y Xθ, σ 2 I ) N ( θ 0, σgi 2 ) = N ( y 0, σgxx 2 T + σ 2 I ) Number of hyperparameters independent of number of SNPs Oliver Stegle GWAS IV: Bayesian linear models Summer
76 Model comparison and hypothesis testing Marginal likelihood of variance component models Consider a linear model, ( accounting for ) a set of measured SNPs X p(y X, θ, σ 2 S ) = N y x s θ s, σ 2 I s=1 Choose identical Gaussian prior for all weights S p(θ) = N ( θ s 0, σg 2 ) s=1 Marginal likelihood p(y X, σ 2, σg) 2 = θ N ( y Xθ, σ 2 I ) N ( θ 0, σgi 2 ) = N ( y 0, σgxx 2 T + σ 2 I ) Number of hyperparameters independent of number of SNPs Oliver Stegle GWAS IV: Bayesian linear models Summer
77 Model comparison and hypothesis testing Marginal likelihood of variance component models Application to GWAs The missing heritability paradox Complex traits are regulated by a large number of small effects Human height: the best single SNP explains little variance. But: the parents are highly predictive for the height of the child! Oliver Stegle GWAS IV: Bayesian linear models Summer
78 Model comparison and hypothesis testing Marginal likelihood of variance component models Application to GWAs Multivariate additive models for complex traits Multivariate model over causal SNPs p(y X, θ, σ 2 ) = N ( y s causal x s θ s, σ 2 I ) Common variance prior for causal SNPs p(θ s ) = N ( θ s 0, σg 2 ) Marinalize out weights p(y X, σg, 2 σe) 2 = N ( y 0, σg 2 x s x T s + σei 2 ) s causal Which SNPs are causal? Approximation: consider all SNPs [Yang et al., 2011] p(y X, σg, 2 σe) 2 = N ( y 0, σgxx 2 T + σei 2 ) Oliver Stegle GWAS IV: Bayesian linear models Summer
79 Model comparison and hypothesis testing Marginal likelihood of variance component models Application to GWAs Multivariate additive models for complex traits Multivariate model over causal SNPs p(y X, θ, σ 2 ) = N ( y s causal x s θ s, σ 2 I ) Common variance prior for causal SNPs p(θ s ) = N ( θ s 0, σg 2 ) Marinalize out weights p(y X, σg, 2 σe) 2 = N ( y 0, σg 2 x s x T s + σei 2 ) s causal Which SNPs are causal? Approximation: consider all SNPs [Yang et al., 2011] p(y X, σg, 2 σe) 2 = N ( y 0, σgxx 2 T + σei 2 ) Oliver Stegle GWAS IV: Bayesian linear models Summer
80 Model comparison and hypothesis testing Marginal likelihood of variance component models Application to GWAs Multivariate additive models for complex traits Multivariate model over causal SNPs p(y X, θ, σ 2 ) = N ( y s causal x s θ s, σ 2 I ) Common variance prior for causal SNPs p(θ s ) = N ( θ s 0, σg 2 ) Marinalize out weights p(y X, σg, 2 σe) 2 = N ( y 0, σg 2 x s x T s + σei 2 ) s causal Which SNPs are causal? Approximation: consider all SNPs [Yang et al., 2011] p(y X, σg, 2 σe) 2 = N ( y 0, σgxx 2 T + σei 2 ) Oliver Stegle GWAS IV: Bayesian linear models Summer
81 Model comparison and hypothesis testing Marginal likelihood of variance component models Application to GWAs Multivariate additive models for complex traits Multivariate model over causal SNPs p(y X, θ, σ 2 ) = N ( y s causal x s θ s, σ 2 I ) Common variance prior for causal SNPs p(θ s ) = N ( θ s 0, σg 2 ) Marinalize out weights p(y X, σg, 2 σe) 2 = N ( y 0, σg 2 x s x T s + σei 2 ) s causal Which SNPs are causal? Approximation: consider all SNPs [Yang et al., 2011] p(y X, σg, 2 σe) 2 = N ( y 0, σgxx 2 T + σei 2 ) Oliver Stegle GWAS IV: Bayesian linear models Summer
82 Model comparison and hypothesis testing Marginal likelihood of variance component models Application to GWAs Approximate variance model p(y X, σ 2 g, σ 2 e) = N ( y 0, σ 2 gxx T + σ 2 ei ) Genetic variance σ 2 g across chromosomes σ2 g Heritability h 2 = σg 2 + σe 2 Oliver Stegle GWAS IV: Bayesian linear models Summer
83 Model comparison and hypothesis testing Marginal likelihood of variance component models Application to GWAs Approximate variance model p(y X, σ 2 g, σ 2 e) = N ( y 0, σ 2 gxx T + σ 2 ei ) Genetic variance σ 2 g across chromosomes σ2 g Heritability h 2 = σg 2 + σe 2 [Yang et al., 2011] Oliver Stegle GWAS IV: Bayesian linear models Summer
84 Model comparison and hypothesis testing Marginal likelihood of variance component models Application to GWAs Approximate variance model p(y X, σ 2 g, σ 2 e) = N ( y 0, σ 2 gxx T + σ 2 ei ) Genetic variance σ 2 g across chromosomes σ2 g Heritability h 2 = σg 2 + σe 2 [Yang et al., 2011] Oliver Stegle GWAS IV: Bayesian linear models Summer
85 Summary Outline Oliver Stegle GWAS IV: Bayesian linear models Summer
86 Summary Summary Generalized linear models for Curve fitting and multivariate regression. Maximum likelihood and least squares regression are identical. Construction of features using a mapping φ. Regularized least squares and other models that correspond to different choices of loss functions. Bayesian linear regression. Model comparison and ocam s razor. Variance component models in GWAs. Oliver Stegle GWAS IV: Bayesian linear models Summer
87 Summary Tasks Prove that the product of two Gaussians is Gaussian distributed. Try to understand the convolution formula of Gaussian random variables. Oliver Stegle GWAS IV: Bayesian linear models Summer
88 Summary References I C. Bishop. Pattern recognition and machine learning, volume 4. Springer New York, S. Roweis. Gaussian identities. technical report, URL J. Yang, T. Manolio, L. Pasquale, E. Boerwinkle, N. Caporaso, J. Cunningham, M. de Andrade, B. Feenstra, E. Feingold, M. Hayes, et al. Genome partitioning of genetic variation for complex traits using common snps. Nature Genetics, 43(6): , Oliver Stegle GWAS IV: Bayesian linear models Summer
GWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationLatent Variable models for GWAs
Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationMultiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar
Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationMachine learning - HT Basis Expansion, Regularization, Validation
Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding
More informationLinear Models for Regression
Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning
More informationComputational Approaches to Statistical Genetics
Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationSCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati
SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA Sistemi di Elaborazione dell Informazione Regressione Ruggero Donida Labati Dipartimento di Informatica via Bramante 65, 26013 Crema (CR), Italy http://homes.di.unimi.it/donida
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationMultivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationRegression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.
Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationLecture 1b: Linear Models for Regression
Lecture 1b: Linear Models for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced
More informationThe Laplace Approximation
The Laplace Approximation Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative Models
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationProbabilistic Models for Learning Data Representations. Andreas Damianou
Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More information1 Bayesian Linear Regression (BLR)
Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationBayesian Linear Regression. Sargur Srihari
Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationCSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes
CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationRegularized Regression A Bayesian point of view
Regularized Regression A Bayesian point of view Vincent MICHEL Director : Gilles Celeux Supervisor : Bertrand Thirion Parietal Team, INRIA Saclay Ile-de-France LRI, Université Paris Sud CEA, DSV, I2BM,
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationBayesian Decision and Bayesian Learning
Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationBayesian Deep Learning
Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More informationIntroduction to Bayesian Inference
University of Pennsylvania EABCN Training School May 10, 2016 Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ
More informationProbabilistic & Bayesian deep learning. Andreas Damianou
Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationBayesian Interpretations of Regularization
Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More information