GWAS IV: Bayesian linear (variance component) models

Size: px
Start display at page:

Download "GWAS IV: Bayesian linear (variance component) models"

Transcription

1 GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian linear models Summer

2 Motivation Regression Lineare regression: Making predictions Comparison of alternative models Bayesian and regularized regression: Uncertainty in model parameters Generalized basis functions Y X x* Oliver Stegle GWAS IV: Bayesian linear models Summer

3 Motivation Regression Lineare regression: Making predictions Comparison of alternative models Bayesian and regularized regression: Uncertainty in model parameters Generalized basis functions Y X x* Oliver Stegle GWAS IV: Bayesian linear models Summer

4 Motivation Regression Lineare regression: Making predictions Comparison of alternative models Bayesian and regularized regression: Uncertainty in model parameters Generalized basis functions Y X x* Oliver Stegle GWAS IV: Bayesian linear models Summer

5 Motivation Further reading, useful material Christopher M. Bishop: Pattern Recognition and Machine learning [Bishop, 2006] Sam Roweis: Gaussian identities [Roweis, 1999] Oliver Stegle GWAS IV: Bayesian linear models Summer

6 Outline Outline Oliver Stegle GWAS IV: Bayesian linear models Summer

7 Linear Regression II Outline Oliver Stegle GWAS IV: Bayesian linear models Summer

8 Linear Regression II Regression Noise model and likelihood Given a dataset D = {x n, y n } N n=1, where x n = {x n,1,..., x n,s } is S dimensional (for example S SNPs), fit parameters θ of a regressor f with added Gaussian noise: y n = f(x n ; θ) + ɛ n where p(ɛ σ 2 ) = N ( ɛ 0, σ 2). Equivalent likelihood formulation: p(y X) = N N ( y n f(x n ), σ 2) n=1 Oliver Stegle GWAS IV: Bayesian linear models Summer

9 Linear Regression II Regression Choosing a regressor Choose f to be linear: p(y X) = N N ( y n xn θ + c, σ 2) n=1 Consider bias free case, c = 0, otherwise inlcude an additional column of ones in each x n. Oliver Stegle GWAS IV: Bayesian linear models Summer

10 Linear Regression II Regression Choosing a regressor Choose f to be linear: N p(y X) = N ( y n xn θ + c, σ 2) n=1 Consider bias free case, c = 0, otherwise inlcude an additional column of ones in each x n. Equivalent graphical model Oliver Stegle GWAS IV: Bayesian linear models Summer

11 Linear Regression II Linear Regression Maximum likelihood Taking the logarithm, we obtain ln p(y θ, X, σ 2 ) = N ln N ( y n xn θ, σ 2) n=1 = N 2 ln 2πσ2 1 2σ 2 N (y n x n θ) 2 n=1 } {{ } Sum of squares The likelihood is maximized when the squared error is minimized. Least squares and maximum likelihood are equivalent. Oliver Stegle GWAS IV: Bayesian linear models Summer

12 Linear Regression II Linear Regression Maximum likelihood Taking the logarithm, we obtain ln p(y θ, X, σ 2 ) = N ln N ( y n xn θ, σ 2) n=1 = N 2 ln 2πσ2 1 2σ 2 N (y n x n θ) 2 n=1 } {{ } Sum of squares The likelihood is maximized when the squared error is minimized. Least squares and maximum likelihood are equivalent. Oliver Stegle GWAS IV: Bayesian linear models Summer

13 Linear Regression II Linear Regression Maximum likelihood Taking the logarithm, we obtain ln p(y θ, X, σ 2 ) = N ln N ( y n xn θ, σ 2) n=1 = N 2 ln 2πσ2 1 2σ 2 N (y n x n θ) 2 n=1 } {{ } Sum of squares The likelihood is maximized when the squared error is minimized. Least squares and maximum likelihood are equivalent. Oliver Stegle GWAS IV: Bayesian linear models Summer

14 Linear Regression II Linear Regression and Least Squares y y n f (xn, w) x n x (C.M. Bishop, Pattern Recognition and Machine Learning) E(θ) = 1 2 N (y n x n θ) 2 n=1 Oliver Stegle GWAS IV: Bayesian linear models Summer

15 Linear Regression II Linear Regression and Least Squares Derivative w.r.t a single weight entry θ i [ ] d ln p(y θ, σ 2 ) = d 1 N dθ i dθ i 2σ 2 (y n x n θ) 2 Set gradient w.r.t to θ to zero n=1 = 1 N σ 2 (y n x n θ)x i n=1 θ ln p(y θ, σ 2 ) = 1 σ 2 N n=1 = θ ML = (X T X) 1 X T y }{{} Pseudo inverse Here, the matrix X is defined as X = (y n x n θ)x T n = 0 x 1,1... x1, D x N,1... x N,D Oliver Stegle GWAS IV: Bayesian linear models Summer

16 Linear Regression II Linear Regression and Least Squares Derivative w.r.t a single weight entry θ i [ ] d ln p(y θ, σ 2 ) = d 1 N dθ i dθ i 2σ 2 (y n x n θ) 2 Set gradient w.r.t to θ to zero n=1 = 1 N σ 2 (y n x n θ)x i n=1 θ ln p(y θ, σ 2 ) = 1 σ 2 N n=1 = θ ML = (X T X) 1 X T y }{{} Pseudo inverse Here, the matrix X is defined as X = (y n x n θ)x T n = 0 x 1,1... x1, D x N,1... x N,D Oliver Stegle GWAS IV: Bayesian linear models Summer

17 Linear Regression II Linear Regression and Least Squares Derivative w.r.t a single weight entry θ i [ ] d ln p(y θ, σ 2 ) = d 1 N dθ i dθ i 2σ 2 (y n x n θ) 2 Set gradient w.r.t to θ to zero n=1 = 1 N σ 2 (y n x n θ)x i n=1 θ ln p(y θ, σ 2 ) = 1 σ 2 N n=1 = θ ML = (X T X) 1 X T y }{{} Pseudo inverse Here, the matrix X is defined as X = (y n x n θ)x T n = 0 x 1,1... x1, D x N,1... x N,D Oliver Stegle GWAS IV: Bayesian linear models Summer

18 Linear Regression II Polynomial Curve Fitting Motivation Non-linear relationships. Multiple SNPs playing a role for a particular phenotype. Y X x* Oliver Stegle GWAS IV: Bayesian linear models Summer

19 Linear Regression II Polynomial Curve Fitting Univariate input x Use the polynomials up to degree K to construct new features from x f(x, θ) = θ 0 + θ 1 x + θ 2 x θ K x K = K θ k φ k (x) = θ T φ(x) k=1 where we defined φ(x) = (1, x, x 2,..., x K ). φ can be any feature mapping. Possible to show: the feature map φ can be expressed in terms of kernels (kernel trick). Oliver Stegle GWAS IV: Bayesian linear models Summer

20 Linear Regression II Polynomial Curve Fitting Univariate input x Use the polynomials up to degree K to construct new features from x f(x, θ) = θ 0 + θ 1 x + θ 2 x θ K x K = K θ k φ k (x) = θ T φ(x) k=1 where we defined φ(x) = (1, x, x 2,..., x K ). φ can be any feature mapping. Possible to show: the feature map φ can be expressed in terms of kernels (kernel trick). Oliver Stegle GWAS IV: Bayesian linear models Summer

21 Linear Regression II Polynomial Curve Fitting Overfitting The degree of the polynomial is crucial to avoid under- and overfitting. t 1 M = x 1 (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer

22 Linear Regression II Polynomial Curve Fitting Overfitting The degree of the polynomial is crucial to avoid under- and overfitting. t 1 M = x 1 (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer

23 Linear Regression II Polynomial Curve Fitting Overfitting The degree of the polynomial is crucial to avoid under- and overfitting. t 1 M = x 1 (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer

24 Linear Regression II Polynomial Curve Fitting Overfitting The degree of the polynomial is crucial to avoid under- and overfitting. t 1 M = x 1 (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer

25 Linear Regression II Multivariate regression Polynomial curve fitting Multivariate regression (SNPs) f(x, θ) = θ 0 + θ 1 x + + θ K x K K = θ k φ k (x) k=1 = φ(x) θ, S f(x, θ) = θ s x s s=1 = x θ Note: When fitting a single binary SNP genotype x i, a linear model is most general! Oliver Stegle GWAS IV: Bayesian linear models Summer

26 Linear Regression II Multivariate regression Polynomial curve fitting Multivariate regression (SNPs) f(x, θ) = θ 0 + θ 1 x + + θ K x K K = θ k φ k (x) k=1 = φ(x) θ, S f(x, θ) = θ s x s s=1 = x θ Note: When fitting a single binary SNP genotype x i, a linear model is most general! Oliver Stegle GWAS IV: Bayesian linear models Summer

27 Linear Regression II Regularized Least Squares Solutions to avoid overfitting: 1. Intelligently choose number of dimensions 2. Regularize the regression weights θ Quadratically regularized objective function E(θ) = 1 2 N (y n φ(x n ) θ) 2 n=1 } {{ } Squared error + λ 2 θt θ }{{} Regularizer Oliver Stegle GWAS IV: Bayesian linear models Summer

28 Linear Regression II Regularized Least Squares Solutions to avoid overfitting: 1. Intelligently choose number of dimensions 2. Regularize the regression weights θ Quadratically regularized objective function E(θ) = 1 2 N (y n φ(x n ) θ) 2 n=1 } {{ } Squared error + λ 2 θt θ }{{} Regularizer Oliver Stegle GWAS IV: Bayesian linear models Summer

29 Linear Regression II Regularized Least Squares More general regularizers More general regularization: E(θ) = 1 N (y n φ(x n ) θ) 2 + λ D θ d q 2 2 n=1 d=1 }{{}}{{} Squared error Regularizer Oliver Stegle GWAS IV: Bayesian linear models Summer

30 Linear Regression II Regularized Least Squares More general regularizers More general regularization: E(θ) = 1 N (y n φ(x n ) θ) 2 + λ D θ d q 2 2 n=1 d=1 }{{}}{{} Squared error Regularizer q = 0.5 q = 1 q = 2 q = 4 (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer

31 Linear Regression II Regularized Least Squares More general regularizers More general regularization: E(θ) = 1 N (y n φ(x n ) θ) 2 + λ D θ d q 2 2 n=1 d=1 }{{}}{{} Squared error Regularizer sparse q = 0.5 q = 1 q = 2 q = 4 Lasso Quadratic (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer

32 Linear Regression II Loss functions and related methods Even more general: general loss function E(θ) = 1 N L(y n φ(x n ) θ) + λ D θ d q 2 2 n=1 d=1 }{{}}{{} Loss Regularizer Many state-of-the-art machine learning methods can be expressed within this framework. Linear Regression: squared loss, squared regularizer. Support Vector Machine: hinge loss, squared regularizer. Lasso: squared loss, L1 regularizer. Inference: minimize the cost function E(θ), yielding a point estimate for θ. Q: How to determine q and the a suitable loss function? Oliver Stegle GWAS IV: Bayesian linear models Summer

33 Linear Regression II Loss functions and related methods Even more general: general loss function E(θ) = 1 N L(y n φ(x n ) θ) + λ D θ d q 2 2 n=1 d=1 }{{}}{{} Loss Regularizer Many state-of-the-art machine learning methods can be expressed within this framework. Linear Regression: squared loss, squared regularizer. Support Vector Machine: hinge loss, squared regularizer. Lasso: squared loss, L1 regularizer. Inference: minimize the cost function E(θ), yielding a point estimate for θ. Q: How to determine q and the a suitable loss function? Oliver Stegle GWAS IV: Bayesian linear models Summer

34 Linear Regression II Loss functions and related methods Even more general: general loss function E(θ) = 1 N L(y n φ(x n ) θ) + λ D θ d q 2 2 n=1 d=1 }{{}}{{} Loss Regularizer Many state-of-the-art machine learning methods can be expressed within this framework. Linear Regression: squared loss, squared regularizer. Support Vector Machine: hinge loss, squared regularizer. Lasso: squared loss, L1 regularizer. Inference: minimize the cost function E(θ), yielding a point estimate for θ. Q: How to determine q and the a suitable loss function? Oliver Stegle GWAS IV: Bayesian linear models Summer

35 Linear Regression II Loss functions and related methods Even more general: general loss function E(θ) = 1 N L(y n φ(x n ) θ) + λ D θ d q 2 2 n=1 d=1 }{{}}{{} Loss Regularizer Many state-of-the-art machine learning methods can be expressed within this framework. Linear Regression: squared loss, squared regularizer. Support Vector Machine: hinge loss, squared regularizer. Lasso: squared loss, L1 regularizer. Inference: minimize the cost function E(θ), yielding a point estimate for θ. Q: How to determine q and the a suitable loss function? Oliver Stegle GWAS IV: Bayesian linear models Summer

36 Linear Regression II Loss functions and related methods Cross validation: minimization of expected loss For each candidate model H: Split data into K folds Training-test evaluation for each fold Assess average loss on test set E H = 1 K L test k K k=1 test set Total number of samples training set fold 1 fold 2 fold 3 Oliver Stegle GWAS IV: Bayesian linear models Summer

37 Linear Regression II Probabilistic interpretation So far: minimization of error functions. Back to probabilities? E(θ) = 1 N (y n φ(x n ) θ) 2 2 n=1 }{{} Squared error + λ 2 θt θ }{{} Regularizer Most alternative choices of regularizers and loss functions can be mapped to an equivalent probabilistic representation in a similar way. Oliver Stegle GWAS IV: Bayesian linear models Summer

38 Linear Regression II Probabilistic interpretation So far: minimization of error functions. Back to probabilities? E(θ) = 1 N (y n φ(x n ) θ) 2 2 n=1 }{{} Squared error N = ln N ( y n φ(xn ) θ, σ 2) n=1 + λ 2 θt θ }{{} Regularizer ( ln N θ 0, 1 ) λ I Most alternative choices of regularizers and loss functions can be mapped to an equivalent probabilistic representation in a similar way. Oliver Stegle GWAS IV: Bayesian linear models Summer

39 Linear Regression II Probabilistic interpretation So far: minimization of error functions. Back to probabilities? E(θ) = 1 N (y n φ(x n ) θ) 2 2 n=1 }{{} Squared error N = ln N ( y n φ(x n ) θ, σ 2) n=1 + λ 2 θt θ }{{} Regularizer ( ln N θ 0, 1 ) λ I = ln p(y θ, Φ(X), σ 2 ) ln p(θ) Most alternative choices of regularizers and loss functions can be mapped to an equivalent probabilistic representation in a similar way. Oliver Stegle GWAS IV: Bayesian linear models Summer

40 Linear Regression II Probabilistic interpretation So far: minimization of error functions. Back to probabilities? E(θ) = 1 N (y n φ(x n ) θ) 2 2 n=1 }{{} Squared error N = ln N ( y n φ(x n ) θ, σ 2) n=1 + λ 2 θt θ }{{} Regularizer ( ln N θ 0, 1 ) λ I = ln p(y θ, Φ(X), σ 2 ) ln p(θ) Most alternative choices of regularizers and loss functions can be mapped to an equivalent probabilistic representation in a similar way. Oliver Stegle GWAS IV: Bayesian linear models Summer

41 Bayesian linear regression Outline Oliver Stegle GWAS IV: Bayesian linear models Summer

42 Bayesian linear regression Bayesian linear regression Likelihood as before N p(y X, θ, σ 2 ) = N ( y n φ(x n ) θ, σ 2) n=1 Define a conjugate prior over θ p(θ) = N (θ m 0, S 0 ) Oliver Stegle GWAS IV: Bayesian linear models Summer

43 Bayesian linear regression Bayesian linear regression Likelihood as before N p(y X, θ, σ 2 ) = N ( y n φ(x n ) θ, σ 2) n=1 Define a conjugate prior over θ p(θ) = N (θ m 0, S 0 ) Oliver Stegle GWAS IV: Bayesian linear models Summer

44 Bayesian linear regression Bayesian linear regression Posterior probability of θ p(θ y, X, σ 2 ) N N ( y n φ(xn ) θ, σ 2) N (θ m 0, S 0 ) n=1 = N ( y Φ(X) θ, σ 2 I ) N (θ m 0, S 0 ) = N ( θ µ θ, Σ θ ) where ( µ θ = Σ θ S 1 0 m ) σ 2 Φ(X)T y [ Σ θ = S ] 1 σ 2 Φ(X)T Φ(X) Oliver Stegle GWAS IV: Bayesian linear models Summer

45 Bayesian linear regression Bayesian linear regression Prior choice Choice of prior: regularized (ridge) regression In this case p(θ) = N ( θ m 0, S 0 ). p(θ y, X, σ 2 ) N ( θ ) µ θ, Σ ( θ µ θ = Σ θ S 1 0 m ) σ 2 Φ(X)T y [ Σ θ = S ] 1 σ 2 Φ(X)T Φ(X) Equivalent to maximum likelihood estimate for λ 0! Oliver Stegle GWAS IV: Bayesian linear models Summer

46 Bayesian linear regression Bayesian linear regression Prior choice Choice of prior: regularized (ridge) regression In this case p(θ) = N ( θ 0, 1 λ I). p(θ y, X, σ 2 ) N ( θ µ θ, Σ θ ) µ θ = Σ θ ( 1 Σ θ = ) σ 2 Φ(X)T y [ λi + 1 σ 2 Φ(X)T Φ(X) Equivalent to maximum likelihood estimate for λ 0! ] 1 Oliver Stegle GWAS IV: Bayesian linear models Summer

47 Bayesian linear regression Bayesian linear regression Prior choice Choice of prior: regularized (ridge) regression In this case p(θ y, X, σ 2 ) N ( θ µ θ, Σ θ ) µ θ = Σ θ ( 1 Σ θ = ) σ 2 Φ(X)T y [ λi + 1 σ 2 Φ(X)T Φ(X) Equivalent to maximum likelihood estimate for λ 0! ] 1 Oliver Stegle GWAS IV: Bayesian linear models Summer

48 Bayesian linear regression Bayesian linear regression Example 0 Data points (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer

49 Bayesian linear regression Bayesian linear regression Example 1 Data point (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer

50 Bayesian linear regression Bayesian linear regression Example 20 Data points (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer

51 Bayesian linear regression Making predictions Prediction for fixed weight ˆθ at input x trivial: p(y x, ˆθ, σ 2 ) = N (y φ(x )ˆθ, σ 2) Integrate over θ to take the posterior uncertainty into account p(y x, D) = θ p(y x, θ, σ 2 )p(θ X, y, σ 2 ) = θ N ( y φ(x )θ, σ 2) N ( θ ) µ θ, Σ θ = N ( y φ(x ) µ θ, σ 2 + φ(x ) T Σ θ φ(x ) ) Key: prediction is again Gaussian Predictive variance is increase due to the posterior uncertainty in θ. Oliver Stegle GWAS IV: Bayesian linear models Summer

52 Bayesian linear regression Making predictions Prediction for fixed weight ˆθ at input x trivial: p(y x, ˆθ, σ 2 ) = N (y φ(x )ˆθ, σ 2) Integrate over θ to take the posterior uncertainty into account p(y x, D) = θ p(y x, θ, σ 2 )p(θ X, y, σ 2 ) = θ N ( y φ(x )θ, σ 2) N ( θ ) µθ, Σ θ = N ( y φ(x ) µ θ, σ 2 + φ(x ) T Σ θ φ(x ) ) Key: prediction is again Gaussian Predictive variance is increase due to the posterior uncertainty in θ. Oliver Stegle GWAS IV: Bayesian linear models Summer

53 Bayesian linear regression Making predictions Prediction for fixed weight ˆθ at input x trivial: p(y x, ˆθ, σ 2 ) = N (y φ(x )ˆθ, σ 2) Integrate over θ to take the posterior uncertainty into account p(y x, D) = θ p(y x, θ, σ 2 )p(θ X, y, σ 2 ) = θ N ( y φ(x )θ, σ 2) N ( θ ) µθ, Σ θ = N ( y φ(x ) µ θ, σ 2 + φ(x ) T Σ θ φ(x ) ) Key: prediction is again Gaussian Predictive variance is increase due to the posterior uncertainty in θ. Oliver Stegle GWAS IV: Bayesian linear models Summer

54 Model comparison and hypothesis testing Outline Oliver Stegle GWAS IV: Bayesian linear models Summer

55 Model comparison and hypothesis testing Model comparison Motivation What degree of polynomials describes the data best? Is the linear model at all appropriate? Association testing. Oliver Stegle GWAS IV: Bayesian linear models Summer

56 Model comparison and hypothesis testing Model comparison Motivation What degree of polynomials describes the data best? Is the linear model at all appropriate? Association testing. Genome? Phenome individuals ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT yy 1 phenotypes SNPs Oliver Stegle GWAS IV: Bayesian linear models Summer

57 Model comparison and hypothesis testing Bayesian model comparison How do we choose among alternative models? Assume we want to choose among models H 0,..., H M for a dataset D. Posterior probability for a particular model i p(h i D) p(d H i ) p(h }{{} i ) }{{} Evidence Prior Oliver Stegle GWAS IV: Bayesian linear models Summer

58 Model comparison and hypothesis testing Bayesian model comparison How do we choose among alternative models? Assume we want to choose among models H 0,..., H M for a dataset D. Posterior probability for a particular model i p(h i D) p(d H i ) p(h }{{} i ) }{{} Evidence Prior Oliver Stegle GWAS IV: Bayesian linear models Summer

59 Model comparison and hypothesis testing Bayesian model comparison How to calculate the evidence The evidence is not the model likelihood! p(d H i ) = dθp(d Θ)p(Θ) for model parameters Θ. Θ Remember: p(θ H i, D) = p(d H i, Θ)p(Θ) p(d H i ) Oliver Stegle GWAS IV: Bayesian linear models Summer

60 Model comparison and hypothesis testing Bayesian model comparison How to calculate the evidence The evidence is not the model likelihood! p(d H i ) = dθp(d Θ)p(Θ) for model parameters Θ. Θ Remember: p(θ H i, D) = p(d H i, Θ)p(Θ) p(d H i ) likelihood prior posterior = Evidence Oliver Stegle GWAS IV: Bayesian linear models Summer

61 Model comparison and hypothesis testing Bayesian model comparison Ocam s razor The evidence integral penalizes overly complex models. A model with few parameters and lower maximum likelihood (H 1 ) may win over a model with a peaked likelihood that requires many more parameters (H 2 ). Likelihood H2 H1 w MAP w (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer

62 Model comparison and hypothesis testing Bayesian model comparison Ocam s razor The evidence integral penalizes overly complex models. A model with few parameters and lower maximum likelihood (H 1 ) may win over a model with a peaked likelihood that requires many more parameters (H 2 ). Likelihood H2 H1 w MAP w (C.M. Bishop, Pattern Recognition and Machine Learning) Oliver Stegle GWAS IV: Bayesian linear models Summer

63 Model comparison and hypothesis testing Application to GWA Relevance of a single SNP Consider an association study. H 0 : no association p(y H 0, X, Θ 0 ) = N ( y 0, σ 2 I ) p(d H 0 ) = N ( y 0, σ 2 I ) p(σ 2 ) σ 2 H1 : linear association p(y H 1, x i, Θ 1 ) = N ( y xi θ, σ 2 I ) p(d H 1 ) = N ( y xi θ, σ 2 I ) p(σ 2 )p(θ) σ 2,θ Depending on the choice of priors, p(σ 2 ) and p(θ), the required integrals are often tractable in closed form. Oliver Stegle GWAS IV: Bayesian linear models Summer

64 Model comparison and hypothesis testing Application to GWA Relevance of a single SNP Consider an association study. H 0 : no association p(y H 0, X, Θ 0 ) = N ( y 0, σ 2 I ) p(d H 0 ) = N ( y 0, σ 2 I ) p(σ 2 ) σ 2 H1 : linear association p(y H 1, x i, Θ 1 ) = N ( y xi θ, σ 2 I ) p(d H 1 ) = N ( y xi θ, σ 2 I ) p(σ 2 )p(θ) σ 2,θ Depending on the choice of priors, p(σ 2 ) and p(θ), the required integrals are often tractable in closed form. Oliver Stegle GWAS IV: Bayesian linear models Summer

65 Model comparison and hypothesis testing Application to GWA Relevance of a single SNP Consider an association study. H 0 : no association p(y H 0, X, Θ 0 ) = N ( y 0, σ 2 I ) p(d H 0 ) = N ( y 0, σ 2 I ) p(σ 2 ) σ 2 H1 : linear association p(y H 1, x i, Θ 1 ) = N ( y xi θ, σ 2 I ) p(d H 1 ) = N ( y xi θ, σ 2 I ) p(σ 2 )p(θ) σ 2,θ Depending on the choice of priors, p(σ 2 ) and p(θ), the required integrals are often tractable in closed form. Oliver Stegle GWAS IV: Bayesian linear models Summer

66 Model comparison and hypothesis testing Application to GWA Scoring models Similar to likelihood ratios, the ratio of the evidences, the Bayes factor can be used to score alternative models: BF = ln p(d H 1) p(d H 0 ). Oliver Stegle GWAS IV: Bayesian linear models Summer

67 Model comparison and hypothesis testing Application to GWA Scoring models Similar to likelihood ratios, the ratio of the evidences, the Bayes factor can be used to score alternative models: BF = ln p(d H 1) p(d H 0 ) SLC35B4 SLC35B4 LOD/BF 10 5 FPR 0.01% FPR 0.01% Position in chr. 7 x 10 8 Oliver Stegle GWAS IV: Bayesian linear models Summer

68 Model comparison and hypothesis testing Application to GWA Posterior probability of an association Bayes factors are useful, however we would like a probabilistic answer how certain an association really is. Posterior probability of H 1 p(h 1 D) = p(d H 1)p(H 1 ) p(d) p(d H 1 )p(h 1 ) = p(d H 1 )p(h 1 ) + p(d H 0 )p(h 0 ) p(h 1 D) + p(h 0 D) = 1, prior probability of observing a real association. Oliver Stegle GWAS IV: Bayesian linear models Summer

69 Model comparison and hypothesis testing Application to GWA Posterior probability of an association Bayes factors are useful, however we would like a probabilistic answer how certain an association really is. Posterior probability of H 1 p(h 1 D) = p(d H 1)p(H 1 ) p(d) p(d H 1 )p(h 1 ) = p(d H 1 )p(h 1 ) + p(d H 0 )p(h 0 ) p(h 1 D) + p(h 0 D) = 1, prior probability of observing a real association. Oliver Stegle GWAS IV: Bayesian linear models Summer

70 Model comparison and hypothesis testing Application to GWA Posterior probability of an association Bayes factors are useful, however we would like a probabilistic answer how certain an association really is. Posterior probability of H 1 p(h 1 D) = p(d H 1)p(H 1 ) p(d) p(d H 1 )p(h 1 ) = p(d H 1 )p(h 1 ) + p(d H 0 )p(h 0 ) p(h 1 D) + p(h 0 D) = 1, prior probability of observing a real association. Oliver Stegle GWAS IV: Bayesian linear models Summer

71 Model comparison and hypothesis testing Bayes factor verus likelihood ratio Bayes factor Models of different complexity can be objectively compared. Statistical significance as posterior probability of a model. Typically hard to compute. Likelihood ratio Likelihood ratio scales with the number of parameters. Likelihood ratios have known null distribution, yielding p-values. Often easy to compute. Oliver Stegle GWAS IV: Bayesian linear models Summer

72 Model comparison and hypothesis testing Bayes factor verus likelihood ratio Bayes factor Models of different complexity can be objectively compared. Statistical significance as posterior probability of a model. Typically hard to compute. Likelihood ratio Likelihood ratio scales with the number of parameters. Likelihood ratios have known null distribution, yielding p-values. Often easy to compute. Oliver Stegle GWAS IV: Bayesian linear models Summer

73 Model comparison and hypothesis testing Marginal likelihood of variance component models Consider a linear model, ( accounting for ) a set of measured SNPs X p(y X, θ, σ 2 S ) = N y x s θ s, σ 2 I s=1 Choose identical Gaussian prior for all weights S p(θ) = N ( ) θ s 0, σ 2 g s=1 Marginal likelihood p(y X, ) = θ N ( y Xθ, σ 2 I ) N ( θ 0, σgi 2 ) Number of hyperparameters independent of number of SNPs Oliver Stegle GWAS IV: Bayesian linear models Summer

74 Model comparison and hypothesis testing Marginal likelihood of variance component models Consider a linear model, ( accounting for ) a set of measured SNPs X p(y X, θ, σ 2 S ) = N y x s θ s, σ 2 I s=1 Choose identical Gaussian prior for all weights S p(θ) = N ( ) θ s 0, σ 2 g s=1 Marginal likelihood p(y X, ) = θ N ( y Xθ, σ 2 I ) N ( θ 0, σgi 2 ) Number of hyperparameters independent of number of SNPs Oliver Stegle GWAS IV: Bayesian linear models Summer

75 Model comparison and hypothesis testing Marginal likelihood of variance component models Consider a linear model, ( accounting for ) a set of measured SNPs X p(y X, θ, σ 2 S ) = N y x s θ s, σ 2 I s=1 Choose identical Gaussian prior for all weights S p(θ) = N ( θ s 0, σg 2 ) s=1 Marginal likelihood p(y X, σ 2, σg) 2 = θ N ( y Xθ, σ 2 I ) N ( θ 0, σgi 2 ) = N ( y 0, σgxx 2 T + σ 2 I ) Number of hyperparameters independent of number of SNPs Oliver Stegle GWAS IV: Bayesian linear models Summer

76 Model comparison and hypothesis testing Marginal likelihood of variance component models Consider a linear model, ( accounting for ) a set of measured SNPs X p(y X, θ, σ 2 S ) = N y x s θ s, σ 2 I s=1 Choose identical Gaussian prior for all weights S p(θ) = N ( θ s 0, σg 2 ) s=1 Marginal likelihood p(y X, σ 2, σg) 2 = θ N ( y Xθ, σ 2 I ) N ( θ 0, σgi 2 ) = N ( y 0, σgxx 2 T + σ 2 I ) Number of hyperparameters independent of number of SNPs Oliver Stegle GWAS IV: Bayesian linear models Summer

77 Model comparison and hypothesis testing Marginal likelihood of variance component models Application to GWAs The missing heritability paradox Complex traits are regulated by a large number of small effects Human height: the best single SNP explains little variance. But: the parents are highly predictive for the height of the child! Oliver Stegle GWAS IV: Bayesian linear models Summer

78 Model comparison and hypothesis testing Marginal likelihood of variance component models Application to GWAs Multivariate additive models for complex traits Multivariate model over causal SNPs p(y X, θ, σ 2 ) = N ( y s causal x s θ s, σ 2 I ) Common variance prior for causal SNPs p(θ s ) = N ( θ s 0, σg 2 ) Marinalize out weights p(y X, σg, 2 σe) 2 = N ( y 0, σg 2 x s x T s + σei 2 ) s causal Which SNPs are causal? Approximation: consider all SNPs [Yang et al., 2011] p(y X, σg, 2 σe) 2 = N ( y 0, σgxx 2 T + σei 2 ) Oliver Stegle GWAS IV: Bayesian linear models Summer

79 Model comparison and hypothesis testing Marginal likelihood of variance component models Application to GWAs Multivariate additive models for complex traits Multivariate model over causal SNPs p(y X, θ, σ 2 ) = N ( y s causal x s θ s, σ 2 I ) Common variance prior for causal SNPs p(θ s ) = N ( θ s 0, σg 2 ) Marinalize out weights p(y X, σg, 2 σe) 2 = N ( y 0, σg 2 x s x T s + σei 2 ) s causal Which SNPs are causal? Approximation: consider all SNPs [Yang et al., 2011] p(y X, σg, 2 σe) 2 = N ( y 0, σgxx 2 T + σei 2 ) Oliver Stegle GWAS IV: Bayesian linear models Summer

80 Model comparison and hypothesis testing Marginal likelihood of variance component models Application to GWAs Multivariate additive models for complex traits Multivariate model over causal SNPs p(y X, θ, σ 2 ) = N ( y s causal x s θ s, σ 2 I ) Common variance prior for causal SNPs p(θ s ) = N ( θ s 0, σg 2 ) Marinalize out weights p(y X, σg, 2 σe) 2 = N ( y 0, σg 2 x s x T s + σei 2 ) s causal Which SNPs are causal? Approximation: consider all SNPs [Yang et al., 2011] p(y X, σg, 2 σe) 2 = N ( y 0, σgxx 2 T + σei 2 ) Oliver Stegle GWAS IV: Bayesian linear models Summer

81 Model comparison and hypothesis testing Marginal likelihood of variance component models Application to GWAs Multivariate additive models for complex traits Multivariate model over causal SNPs p(y X, θ, σ 2 ) = N ( y s causal x s θ s, σ 2 I ) Common variance prior for causal SNPs p(θ s ) = N ( θ s 0, σg 2 ) Marinalize out weights p(y X, σg, 2 σe) 2 = N ( y 0, σg 2 x s x T s + σei 2 ) s causal Which SNPs are causal? Approximation: consider all SNPs [Yang et al., 2011] p(y X, σg, 2 σe) 2 = N ( y 0, σgxx 2 T + σei 2 ) Oliver Stegle GWAS IV: Bayesian linear models Summer

82 Model comparison and hypothesis testing Marginal likelihood of variance component models Application to GWAs Approximate variance model p(y X, σ 2 g, σ 2 e) = N ( y 0, σ 2 gxx T + σ 2 ei ) Genetic variance σ 2 g across chromosomes σ2 g Heritability h 2 = σg 2 + σe 2 Oliver Stegle GWAS IV: Bayesian linear models Summer

83 Model comparison and hypothesis testing Marginal likelihood of variance component models Application to GWAs Approximate variance model p(y X, σ 2 g, σ 2 e) = N ( y 0, σ 2 gxx T + σ 2 ei ) Genetic variance σ 2 g across chromosomes σ2 g Heritability h 2 = σg 2 + σe 2 [Yang et al., 2011] Oliver Stegle GWAS IV: Bayesian linear models Summer

84 Model comparison and hypothesis testing Marginal likelihood of variance component models Application to GWAs Approximate variance model p(y X, σ 2 g, σ 2 e) = N ( y 0, σ 2 gxx T + σ 2 ei ) Genetic variance σ 2 g across chromosomes σ2 g Heritability h 2 = σg 2 + σe 2 [Yang et al., 2011] Oliver Stegle GWAS IV: Bayesian linear models Summer

85 Summary Outline Oliver Stegle GWAS IV: Bayesian linear models Summer

86 Summary Summary Generalized linear models for Curve fitting and multivariate regression. Maximum likelihood and least squares regression are identical. Construction of features using a mapping φ. Regularized least squares and other models that correspond to different choices of loss functions. Bayesian linear regression. Model comparison and ocam s razor. Variance component models in GWAs. Oliver Stegle GWAS IV: Bayesian linear models Summer

87 Summary Tasks Prove that the product of two Gaussians is Gaussian distributed. Try to understand the convolution formula of Gaussian random variables. Oliver Stegle GWAS IV: Bayesian linear models Summer

88 Summary References I C. Bishop. Pattern recognition and machine learning, volume 4. Springer New York, S. Roweis. Gaussian identities. technical report, URL J. Yang, T. Manolio, L. Pasquale, E. Boerwinkle, N. Caporaso, J. Cunningham, M. de Andrade, B. Feenstra, E. Feingold, M. Hayes, et al. Genome partitioning of genetic variation for complex traits using common snps. Nature Genetics, 43(6): , Oliver Stegle GWAS IV: Bayesian linear models Summer

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Latent Variable models for GWAs

Latent Variable models for GWAs Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Machine learning - HT Basis Expansion, Regularization, Validation

Machine learning - HT Basis Expansion, Regularization, Validation Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning

More information

Computational Approaches to Statistical Genetics

Computational Approaches to Statistical Genetics Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

Bayesian methods in economics and finance

Bayesian methods in economics and finance 1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA Sistemi di Elaborazione dell Informazione Regressione Ruggero Donida Labati Dipartimento di Informatica via Bramante 65, 26013 Crema (CR), Italy http://homes.di.unimi.it/donida

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August

More information

y Xw 2 2 y Xw λ w 2 2

y Xw 2 2 y Xw λ w 2 2 CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Lecture 1b: Linear Models for Regression

Lecture 1b: Linear Models for Regression Lecture 1b: Linear Models for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced

More information

The Laplace Approximation

The Laplace Approximation The Laplace Approximation Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative Models

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Probabilistic Models for Learning Data Representations. Andreas Damianou

Probabilistic Models for Learning Data Representations. Andreas Damianou Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

1 Bayesian Linear Regression (BLR)

1 Bayesian Linear Regression (BLR) Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Bayesian Linear Regression. Sargur Srihari

Bayesian Linear Regression. Sargur Srihari Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Regularized Regression A Bayesian point of view

Regularized Regression A Bayesian point of view Regularized Regression A Bayesian point of view Vincent MICHEL Director : Gilles Celeux Supervisor : Bertrand Thirion Parietal Team, INRIA Saclay Ile-de-France LRI, Université Paris Sud CEA, DSV, I2BM,

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Introduction to Bayesian Inference

Introduction to Bayesian Inference University of Pennsylvania EABCN Training School May 10, 2016 Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ

More information

Probabilistic & Bayesian deep learning. Andreas Damianou

Probabilistic & Bayesian deep learning. Andreas Damianou Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Bayesian Interpretations of Regularization

Bayesian Interpretations of Regularization Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information