Introduction Linear Regression Gaussian Process Regression (GPR) Examples Adv. topics Summary. Regression methods

Size: px
Start display at page:

Download "Introduction Linear Regression Gaussian Process Regression (GPR) Examples Adv. topics Summary. Regression methods"

Transcription

1 Regression methods (CIV654 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapters 8 Goulet (218). Probabilistic Machine Learning for Civil Engineers. Regression methods V1. Machine Learning for Civil Engineers 1 / 77

2 What is regression? What is regression? Patogens [ppm] Data D = {(x i, y i ), i = 1 : D} Covariate x i R : attribute regressor y i R : Observation [ ] cl+ Regression methods: mathematical models for g(x) Regression methods V1. Machine Learning for Civil Engineers 2 / 77

3 What is regression? What is regression? # of accidnets hour of the day Data D = {(x i, y i ), i = 1 : D} Covariate x i R : attribute regressor y i R : Observation Model g(x) fct(x) Regression methods: mathematical models for g(x) Regression methods V1. Machine Learning for Civil Engineers 2 / 77

4 What is regression? What is regression? concrete strength Water/cement ratio Data D = {(x i, y i ), i = 1 : D} Covariate x i R : attribute regressor y i R : Observation Model g(x) fct(x) Regression methods: mathematical models for g(x) Regression methods V1. Machine Learning for Civil Engineers 2 / 77

5 What is regression? Regression in > 1D Data D = {(x i, y i ), i = 1 : D} Covariates x i = [x 1, x 2 ] i R 2 : attributes regressors y i R 1 : Observation Model g(x) fct(x) R 1 General case: x i = [x 1, x 2,, x X ] i R X, y i R 1, g(x i ) R 1 Regression methods V1. Machine Learning for Civil Engineers 3 / 77

6 Introduction Linear Regression Gaussian Process Regression (GPR) Examples Adv. topics Summary What is regression? Module #5 Outline Topics organization n Decision Making ( µ µ+σ f X(x) FX(x) 4 2 x 2 µ σ µ µ+σ x 2 4 Introduction Bayesian Estimation p(b A)p(A) p(b) p(a B) = s = MCMC sampling & Newton Regression θ Model driven methods Regression methods V1. Machine Learning for Civil Engineers µ σ.4 2 Probability distributions θ1 5 θ θ1 [ ] cl+ d= Classification d=1 d=2 d=3 d=4 d= S={4,5} 1 x.6 S={2,3}.8 1 S={,1} State-space model for time-series 2 8 Model Calibration y2 Data-driven methods 1 Revision probability & linear algebra Patogens [ppm] Machine Learning Basics ( fx D(x d) Probability theory P(D S) Introduction Linear Regression Gaussian Process Regression (GPR) Examples Adv. topics Summary 2 y2 2 2 y y1 2 E[U] = 3.7 E[U] = 7. 9 Decision Theory 1 AI & Sequential decision problems Polytechnique Montre al 4 / 77

7 Introduction Wich one is an example of linear regression? yi training set g(x) J train (b) = x i yi 6 training set g(x) x i Both. Regression methods V1. Machine Learning for Civil Engineers 5 / 77

8 Introduction Section Outline Linear Regression 2.1 Introduction 2.2 Mathematical formulation 1D 2.3 Cross-validation 2.4 Mathematical formulation >1D 2.5 Limitation Regression methods V1. Machine Learning for Civil Engineers 6 / 77

9 Mathematical formulation 1D Nomenclature & hypotheses yi Reality x i Data D = {(x i, y i ), i = 1 : D} Covariate x i R : attribute regressor y i R : Observation Model g(x) fct(x) Hypothesis: g(x) reality, i.e. no variability y = g(x) + v, v : V N (v;, σ 2 V ), independent observation errors {}}{ V i V j, i j Regression methods V1. Machine Learning for Civil Engineers 7 / 77

10 Mathematical formulation 1D Mathematical formulation special case Let us assume that our model takes the form [ ] 1 g(x) = b + b 1 x = [b b 1 ] x For the entire dataset b = [ b b 1 y = g(x) + v, v : V N (v;, σ 2 V ) ], y = y 1 y 2. y D 1 x 1 1 x 2, X =.., ɛ = 1 x D y = b X + v, v : V N (v;, σ 2 V I) ɛ 1 ɛ 2. ɛ D Regression methods V1. Machine Learning for Civil Engineers 8 / 77

11 Mathematical formulation 1D Joint likelihood of observations y = b X + v, v : V N (v;, σ 2 V I) f (y x, b) = D N (y i ; g(x i ), σv 2 ) i=1 D ( exp 1 (y i g(x i )) 2 ) 2 σ 2 i=1 V ( ) = exp 1 D (y i g(x i )) 2 2 σ 2 i=1 V ( = exp 1 ) 2σV 2 (y b X) (y b X) Regression methods V1. Machine Learning for Civil Engineers 9 / 77

12 Mathematical formulation 1D Estimating optimal parameters b The goal is to estimate optimal parameters b which maximize the likelihood f (y x, b),or equivalently which maximize the log-likelihood ln f (y x, b) = 1 2σV 2 (y b X) (y b X) 1 2 (y b X) (y b X) = J(b) (Loss function) In the context of linear regression, we minimize J(b) which is equivalent to maximizing f (y x, b) Regression methods V1. Machine Learning for Civil Engineers 1 / 77

13 Mathematical formulation 1D Estimating optimal parameters b (cont.) Loss function: measures the model quality J(b) = 1 2 D (y i g(x i )) 2 = 1 2 (y b X) (y b X) i=1 b = arg min b J(b) {}}{ 1 2 (y b X) (y b X) J(b ) b = X Xb X y b = (X X) 1 X y Regression methods V1. Machine Learning for Civil Engineers 11 / 77

14 Mathematical formulation 1D Example #1 [ ] yi training set x i g(x) = b + b 1 x b = [.96, 4.9] [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 12 / 77

15 Mathematical formulation 1D Example #1 [ ] yi training set g(x) J train (b) = x i g(x) = b + b 1 x b = [.96, 4.9] [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 12 / 77

16 Mathematical formulation 1D What if the data is non-linear? yi training set x i Regression methods V1. Machine Learning for Civil Engineers 13 / 77

17 Mathematical formulation 1D Mathematical formulation general case g(x) = b + b 1 φ 1 (x) + b 2 φ 2 (x) + + b B φ B (x) = b X where φ j (x) can be any linear or non-linear function, e.g.: φ j (x) = x 2, φ j (x) = sin(x), For the entire dataset b b 1 b =. b P, y = y 1 y 2. y D, X = 1 φ 1 (x 1 ) φ 2 (x 1 ) φ B (x 1 ) 1 φ 1 (x 2 ) φ 2 (x 2 ) φ B (x 2 ) φ 1 (x D ) φ 2 (x D ) φ B (x D ) Linear regression: Linear with respect to φ j (x) and b not x Regression methods V1. Machine Learning for Civil Engineers 14 / 77

18 Mathematical formulation 1D Example #2a [ ] g(x) = b + b 1 x 2 yi 7 training set g(x) J train (b) = x i [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 15 / 77

19 Mathematical formulation 1D Example #2b [ ] g(x) = b + b 1 x 3 + b 2 sin(1x) training set g(x) yi [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 16 / 77 x i

20 Cross-validation What model structure is best? g(x) = b + b 1 x g(x) = b +b 1 x +b 2 x 2 +b 3 x 3 +b 4 x 4 +b 5 x 5 yi training set g(x) J train (b) = x i yi training set g(x) J train (b) = x i Linear regression is sensitive to over-fitting Regression methods V1. Machine Learning for Civil Engineers 17 / 77

21 Introduction Linear Regression Gaussian Process Regression (GPR) Examples Adv. topics Summary Cross-validation Overfitting & Bias-Variance tradeoff (a) Low bias & (b) High bias & (c) Low bias & (d) High bias & Low variance Low variance High variance High variance Regression methods V1. Machine Learning for Civil Engineers Polytechnique Montre al 18 / 77

22 Cross-validation Cross-validation methodology Over-fitting is prevented by employing cross-validation Cross-validation: Do not employ the same data to train (i.e. estimate optimal parameters) and to test the loss function N-fold cross-validation: Split the data into N subsets and that are one at a time employ as an independent test set fold # entire dataset training subset testing subset Regression methods V1. Machine Learning for Civil Engineers 19 / 77

23 Cross-validation Algorithm 1: N-fold cross-validation 1 define x, y, g m (x); 2 CV idx=randperm(x); 3 ncv=round(x/n); 4 for m = 1, 2,, M do 5 for i = 1, 2,, N do 6 test idx= CV idx((i 1) ncv + 1 : min(x, i ncv )); 7 x train=x; x train(test idx,:)=[]; 8 y train=x; y train(test idx,:)=[]; 9 x test=x(test idx,:); 1 y test=x(test idx,:); 11 Estimate b i : y train = g m (b, x train); 12 Compute J i (b, x test, y test) 13 J m test = N i=1 J i; 14 Select m : Jtest m = min Jtest; m 15 Estimate b : y = g m (b, x); Regression methods V1. Machine Learning for Civil Engineers 2 / 77

24 Cross-validation Example #3 Cross-validation [ ] g(x) = b + b 1 x g(x) = b +b 1 x +b 2 x 2 +b 3 x 3 +b 4 x 4 +b 5 x 5 yi training set J test (b) = 1.56 J train (b) = x i g(x) test set Jtrain = 1.97 Jtest = 2.2 } yi training set J test (b) = J train (b) = x i g(x) test set Jtrain = 1.14 Jtest = } [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 21 / 77

25 Cross-validation Example #3 Cross-validation [ ] g(x) = b + b 1 x g(x) = b +b 1 x +b 2 x 2 +b 3 x 3 +b 4 x 4 +b 5 x 5 yi training set J test (b) =.461 J train (b) = x i g(x) test set Jtrain = 1.97 Jtest = 2.2 } yi training set J test (b) = J train (b) = x i g(x) test set Jtrain = 1.14 Jtest = } [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 21 / 77

26 Cross-validation N-fold cross-validation The greater is the number of folds N, the more accurate are results. If N = D it is called leave-one-out cross-validation (LOOCV) In practice LOOCV to computationally expansive when D is large because it requires as many calibration loops Instead in practice, it is common to employ 5 to 1-fold cross-validation Regression methods V1. Machine Learning for Civil Engineers 22 / 77

27 Mathematical formulation >1D Regression in > 1D Data D = {(x i, y i ), i = 1 : D} Covariates x i = [x 1, x 2 ] i R 2 : attributes regressors y i R 1 : Observation Model g(x) fct(x) R 1 General case: x i = [x 1, x 2,, x X ] i R X, y i R 1, g(x i ) R 1 Regression methods V1. Machine Learning for Civil Engineers 23 / 77

28 Mathematical formulation >1D Mathematical formulation general case >1D g(x) b + b 1 φ 1 (x 1 ) + b 2 φ 2 (x 2 ) + = b X For the entire dataset b b 1 b =. b B, y = y 1 y 2. y D, X = 1 φ 1 (x 1,1 ) φ 2 (x 1,2 ) 1 φ 1 (x 2,1 ) φ 2 (x 2,2 ) φ 1 (x D,1 ) φ 2 (x D,2 ) Regression methods V1. Machine Learning for Civil Engineers 24 / 77

29 Mathematical formulation >1D Example #4 [ ] g(x) = b + b 1 x 1/2 1 + b 2 x 2 2 training set g(x) yi x 2,i x 1,i.8 1 [CIV ML/Linear regression 2D.m] Regression methods V1. Machine Learning for Civil Engineers 25 / 77

30 Limitation Limitations Does not differentiate between interpolating and extrapolating Cannot handle autocorrelation which is comon in time-series Sensitive to outliers Assumes homoscedastic errors The performance depends on the capacity to hand-pick the correct transformation functions (Difficult when D > 1) Regression methods V1. Machine Learning for Civil Engineers 26 / 77

31 Limitation Summary In short: Linear regression is simple and quick, yet it is outperformed by modern techniques such as Gaussian Process Regresssion Note: Cross-validation is not limited to linear regression and can be employed with any regression technique Regression methods V1. Machine Learning for Civil Engineers 27 / 77

32 Introduction Example Tº distribution along a pipeline Given that we know the pipeline temperature to be y = 8 C at x = 52 km. What is the temperature at x = 52.1 km? What is the temperature at x = 7 km? We will take advantage of the conditional dependency between a system response y and covariate values x [Bloomberg] Regression methods V1. Machine Learning for Civil Engineers 28 / 77

33 Introduction Section Outline Gaussian Process Regression (GPR) 3.1 Introduction 3.2 Definition 3.3 Correlation functions 3.4 Updating a GP using exact observations 3.5 Updating a GP using noise-contaminated observations 3.6 Multiple covariates: x = [x 1, x 2,, x X ] i 3.7 Parameter estimation 3.8 Evaluating GPR predictive capacity 3.9 GPML Regression methods V1. Machine Learning for Civil Engineers 29 / 77

34 Definition Definition Given a system response so that reality {}}{ g }{{} i = g(x i ), x i = [x 1, x 2,, x X ] i }{{} observation covariates Data: D = {(x i, g i ), i = 1 : D} g = [g 1, g 2,, g D ], x = [x 1, x 2,, x D ] X D Gaussian process: g(x) : G N (m G, Σ G ) Set of discrete Gaussian random variables for which the pairwise correlation between G i and G j is a function of the distance between attributes [Σ G ] ij = ρ(x i, x j )σ Gi σ Gj, ρ(x i, x j ) = fct( x i x j ) Regression methods V1. Machine Learning for Civil Engineers 3 / 77

35 Correlation functions Example - Tº distribution along a pipeline [Bloomberg] Regression methods V1. Machine Learning for Civil Engineers 31 / 77

36 Correlation functions Example - Tº distribution along a pipeline Prior knowledge: µ Gi = µ G = C, σ Gi = σ G ( = 2.5 C ) Gaussian correlation function: ρ(x i, x j ) = exp 1 (x i x j ) 2 2 l 2 ρ(x 1, x 2 ) =.999 for l = 25 km ρ(x 1, x 3 ) =.278 ρ(x 2, x 3 ) =.296 Regression methods V1. Machine Learning for Civil Engineers 32 / 77

37 Correlation functions Example - Tº distribution along a pipeline [ ] Pour x = [, 1,, 1] Prior knowledge {}}{ G N (m G, Σ G ), [Σ G ] ij = ρ(x i, x j )σ 2 G i g i : realizations of our prior knowledge G [CIV ML/M14 GP pipeline.m] Regression methods V1. Machine Learning for Civil Engineers 33 / 77

38 Correlation functions Example - Tº distribution along a pipeline [ ] Pour x = [, 1,, 1] Prior knowledge {}}{ G N (m G, Σ G ), [Σ G ] ij = ρ(x i, x j )σ 2 G i g i : realizations of our prior knowledge G [CIV ML/M14 GP pipeline.m] Regression methods V1. Machine Learning for Civil Engineers 33 / 77

39 Updating a GP using exact observations Updating a GP using exact observations Given D = {(x i, g i ), i = 1, D} a set of D observations and x a set of X covariates for which we want to predict f (g X, D) Reminder: Gaussian conditionals are also gaussian { } { } [ ] G mg ΣG Σ, m =, Σ = G G m Σ G Σ }{{} Prior knowledge [Σ G ] ij = ρ(x i, x j )σ 2 G [Σ G ] ij = ρ(x i, x j )σ 2 G [Σ ] ij = ρ(x i, x j )σ 2 G f G x,d(g x, D) = N (g ; m D, Σ D ) m D m D = m G + Σ G Σ 1 G (g m G) Σ D Σ D = Σ Σ G Σ 1 G Σ G }{{} Posterior knowledge Regression methods V1. Machine Learning for Civil Engineers 34 / 77

40 Updating a GP using exact observations Example Multivariate gaussian conditional Soit µ G = C, σ G = 2.5 C, ρ(x 1, x 2 ) =.8, g 2 = 2 C }{{}}{{} prior knowledge observation [ ] m G = f G1 G 2 (g 1, g 2 ) = N (m G, Σ G ) [ 2.5 Σ G = ] { f G1 g 2 (g 1 g 2 ) = f g G 1 G 2 (g 1, g 2 ) = N (µ f G2 (g 2 ) 1 2, σ1 2 2 ) µ 1 2 = µ 1 + ρσ 2 µ 2 1 σ 2 σ 1 2 = σ 1 1 ρ 2 µ 1 2 = σ 1 2 = = 1.5 C = 1.6 C Regression methods V1. Machine Learning for Civil Engineers 35 / 77

41 Updating a GP using exact observations Example Multivariate gaussian conditional [ ] f G1 g 2 (g 1 g 2 ) = f G 1 G 2 (g 1, g 2 ) f G2 (g 2 ) = N (µ 1 2, σ ) { µ 1 2 = µ 1 + ρσ 1 y 2 µ 2 σ 2 σ 1 2 = σ 1 1 ρ 2 g 2 = -2 µ 1 2 =-1.6 σ 1 2 =1.5 fg1g2 (g 1,g2) f G 1 g2 (g 1 g2) -2 g 2-5 g g 2-5 g 1 5 [CIV ML/M14 Gauss conditionnelle.m] Regression methods V1. Machine Learning for Civil Engineers 36 / 77

42 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {( [ ], [ ] )} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77

43 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {( [ ], [ ] )} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77

44 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {([31], [4.13])} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77

45 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {([31], [4.13])} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77

46 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {([31, 7], [4.13, 3.62] )} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77

47 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {([31, 7], [4.13, 3.62] )} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77

48 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {([31, 7, 3], [4.13, 3.62, 3.71] )} }{{}}{{} x y [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77

49 Updating a GP using noise-contaminated observations Noise-contaminated observations Given a system response contaminated by measurement noise reality so that {}}{ Y }{{} i = g(x i ) + }{{} V, V N (v;, σv 2 ) observation measurement error [Σ Y ] ij = ρ(x i, x j )σ 2 G + σ2 V δ ij, δ ij = 1 si i = j { Y G } { my, m = m G } [ ΣY Σ, Σ = Y Σ Y Σ } {{ } prior knowledge f G x,d(g x, D) = N (g ; m D, Σ D ) m D = m G + Σ Y Σ 1 Y (y m Y), Σ D = Σ Σ Y Σ 1 Y Σ Y }{{} posterior knowledge ] Regression methods V1. Machine Learning for Civil Engineers 38 / 77

50 Updating a GP using noise-contaminated observations Example - Tº distribution (V ) [ ] 1 y [ºc] x [km] Measurement error: σ V =.5 Observations: D = {( [ ], [ ] )} }{{}}{{} x y [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 39 / 77

51 Updating a GP using noise-contaminated observations Example - Tº distribution (V ) [ ] 1 y [ºc] x [km] Measurement error: σ V =.5 Observations: D = {([31], [3.42])} }{{}}{{} x y [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 39 / 77

52 Updating a GP using noise-contaminated observations Example - Tº distribution (V ) [ ] 1 y [ºc] x [km] Measurement error: σ V =.5 Observations: D = {([31, 7], [3.42, 2.92] )} }{{}}{{} x y [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 39 / 77

53 Updating a GP using noise-contaminated observations Example - Tº distribution (V ) [ ] 1 y [ºc] x [km] Measurement error: σ V =.5 Observations: D = {([31, 7, 3], [3.42, 2.92, 3.38] )} }{{}}{{} x y [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 39 / 77

54 Multiple covariates: x = [x 1, x 2,, x X] i Multiple covariates: x = [x 1, x 2,, x X ] i y }{{} i = observation reality {}}{ g(x i ) +v i, x i = [x 1, x 2,, x X ] i } {{ } covariates, for i = 1,, D y = [y 1, y 2,, y D ], x = [x 1, x 2,, x D ] X D, l = [l 1, l 2,, l X ] X 1 }{{} one l k per covariate x k ( ) ρ(x i, x j ) = exp 1 X ([x k ] i [x k ] j ) 2 2 l 2 k=1 k ( = exp 1 ) 2 (x i x j ) diag(l 2 ) 1 (x i x j ) Regression methods V1. Machine Learning for Civil Engineers 4 / 77

55 Multiple covariates: x = [x 1, x 2,, x X] i Multivariate correlation function [ ] ( ) ρ(x i, x j ) = exp 1 X ([x i ] k [x j ] k ) 2 2 l 2 k=1 k ( = exp 1 ) 2 (x i x j ) diag(l) 2 (x i x j ) 1 si l k σ Xk : Imp(X k y) ρ(xi, xj ) si l k < σ Xk : Imp(X k y).2 5 [xi ]2 [xj ] [xi ]1 [xj ]1 5 [CIV ML/M14 multivariate corr fct.m] Regression methods V1. Machine Learning for Civil Engineers 41 / 77

56 Parameter estimation Parameter estimation σ G & l Up to now, we made the hypothesis that we knew σ G and l = [l 1, l 2,, l X ]. In practice, we need to learn hyperparameters θ = [σ G, l] using training data D = {D x, D y } = {(x i, y i ), i = 1, D}. Bayes rule: p(θ D x, D y ) = }{{} posterior likelihood {}}{ p(d y D x, θ) p(d y ) }{{} cte. prior {}}{ p(θ)! In practice, evaluating p(θ x, y) is computationally expensive... Regression methods V1. Machine Learning for Civil Engineers 42 / 77

57 Parameter estimation MLE approximation prior likelihood D {}}{{}}{{}}{ p(d y D x, θ) p(θ) p(θ D x, D y ) = }{{} p(d y ) posterior }{{} cte. If p(θ) = cte., θ p(θ D x, D y ) p(d y D x, θ) p(d y D x ) θ = arg max θ p(d y D x, θ) p(d y D x, θ) = p(d y D x ) = N (D y ; m Y, Σ Y ) Regression methods V1. Machine Learning for Civil Engineers 43 / 77

58 Parameter estimation MLE approximation (cont.) When the size of Σ Y increases, p(y x, θ) causing numerical instabilities!. Solution: θ = arg max θ likelihood {}}{ p(y x, θ) arg max θ log-likelihood {}}{ log p(y x, θ) p(y x, θ) s maximum corresponds to log p(y x, θ) s maximum and there are no more instabilities. log p(y x, θ) = log N (y; m Y, Σ Y ) = 1 2 (y m Y) Σ 1 Y (y m Y) 1 2 log Σ Y N 2 log 2π Regression methods V1. Machine Learning for Civil Engineers 44 / 77

59 Parameter estimation Maximization How to find θ? d log p(y x, θ) = 1 dθ j 2 y Σ 1 dσ Y Y Σ 1 dθ j Y y 1 2 tr(σ 1 Y We employ iterative optimization techniques (such as Newton-Raphson) to find θ. Efficient algorithms are already implemented in GPML. dσ Y dθ j ) = Regression methods V1. Machine Learning for Civil Engineers 45 / 77

60 Parameter estimation Interactive example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] [CIV ML/GPR example.m] Regression methods V1. Machine Learning for Civil Engineers 46 / 77

61 Evaluating GPR predictive capacity Evaluating GPR predictive capacity We should employ Leave-one-out-cross-validation to estimate the GPR predictive capacity 1. set-up the GPR model 2. estimate hyper-parameters {l, σ G, σ V } using the entire dataset 1 3. use LOO-CV to predict each data point while using only the training set remaining data 4. plot the histogram of normalized prediction errors: m G D,i y i σ G D,i 5. plot a scatter of predicted and observed values 1 This can be done because GPR s hyper-parameters l and σ V are not sensitive to over-fitting (this is not true for linear-regression or for more advanced formulation) Regression methods V1. Machine Learning for Civil Engineers 47 / 77

62 Evaluating GPR predictive capacity Example Evaluating GPR predictive capacity.8 Predictions Theoretical (y i, M C D,i ) y i = M C D,i (y i, M C D,i ± σ C D,i ) M C D,i yi σ C D,i M C D,i y i Regression methods V1. Machine Learning for Civil Engineers 48 / 77

63 GPML GPML package [ ] %% Data x_obs ; % Attribute values associated with observations y_obs ; % Observed / simulated values s_v ; % Measurement noise std %% GPML initial_guess_l =1; % initial guess for l initial_guess_s_g =1; % initial guess for s_g covfunc covseiso ; hyp2. cov = log ([ initial_guess_l ; initial_guess_s_g ]); hyp2. lik = log ( s_v ); likfunc likgauss ; hyp2 = minimize -5,@ infexact,[], covfunc, likfunc, x_obs, y_obs ); %% Estimated values l_gpml = exp ( hyp2. cov (1) ); % estimated value for l s_g_gpml = exp ( hyp2. cov (2) ); % estimated value for s_p [CIV ML/M14 demogmpl.m] Regression methods V1. Machine Learning for Civil Engineers 49 / 77

64 GPML Cross-Validation %% GPML set -up meanfunc meanconst ; % GP mean fct. hyp. mean = [?]; % Prior mean likfunc likgauss ; % GP likelihood fct. hyp. lik = log ([?]) ; % Measurement std covfunc = { ' covseard '};% Gaussian cov function ell = log ([?]) ; % GP correlation length sg = log ([?]) ; % GP prior standatd deviation hyp. cov = [ ell ; sg ]; % Covariance function hyperparameters %% Parameter optimization [hyp,fx] = minimize infexact, meanfunc, covfunc, likfunc, x_train, y_train ); %% Leave -one -out Cross - Validation for cv_loop =1: opt. nb_train idx_cv_loop = idx_cv ; % re - initialize CV indices idx_cv_loop ( cv_loop )=[]; % remove curr. CV test sample [~,~,GP_m, GP_s2 ] = meanfunc, covfunc, likfunc, x_train ( idx_cv_loop,:) ), y_train ( idx_cv_loop ),x_train ( cv_loop,:) ); LL_cv=log ( normpdf ( y_train ( cv_loop ),GP_m,sqrt ( GP_s2 ))); % log - likelihood residual = y_train ( cv_loop )-GP_m ; % Prediction errors cv_results (cv_loop,:) =[ GP_m,GP_s2,LL_cv, residual ]; % write -up results end %% Normalized prediction error histogram histogram ( cv_results (:,4)./ sqrt ( cv_results (:,2) )) %% Normalized prediction error histogram scatter (y_train, cv_results (:,1) ) Regression methods V1. Machine Learning for Civil Engineers 5 / 77

65 Section Outline Examples 4.1 Soil contamination characterization 4.2 Metamodel Regression methods V1. Machine Learning for Civil Engineers 51 / 77

66 Soil contamination characterization Soil contamination characterization [ ] Observed concentration, pb [mg/kg] z [dm] -5-5 y [m] x [m] 2 1 [CIV ML/Soil contamination data.fig] Regression methods V1. Machine Learning for Civil Engineers 52 / 77

67 Soil contamination characterization Problem set-up Covariates and observations: Observation model: log y }{{} i = observation = [x, y, z] i }{{} geodesic coordinates l {Y } i D = {(l {Y } i, y i ), i = 1 : 116} true contaminant [ ] in log space {}}{ ) + v i g(l {Y } i }{{} meas. error in log space, v : V N (, σ 2 V ) Regression methods V1. Machine Learning for Civil Engineers 53 / 77

68 Soil contamination characterization Problem set-up (cont.) Prior: m = { my m G } = { } [ ΣY Σ, Σ = YG Σ YG Σ G [Σ Y ] ij = ρ(l {Y } i, l {Y } j )σ 2 G + σ V 2 δ ij, δ ij = 1 if i = j, else δ ij = [Σ G ] kl = ρ(l {G} k [Σ YG ] ik = ρ(l {Y } i ρ(l i, l j ) = exp Posterior knowledge:, l {G} l, l {G} k )σ 2 G )σ 2 G ( 1 ) 2 (l i l j ) diag(l 2 ) 1 (l i l j ) m G D = m G + Σ YG Σ 1 Y (y m Y) Σ G D = Σ G Σ YG Σ 1 Y Σ YG ] Regression methods V1. Machine Learning for Civil Engineers 54 / 77

69 Soil contamination characterization Problem set-up (cont.) Hyper-parameters estimation: P = {l x, l y, l z, σ G, σ V } P = arg max P = log-likelihood {}}{ log p(y l {Y }, P) l x = 56.4 m l y = 53.7 m l z =.64 m σ G = 2.86 σ V =.18 Regression methods V1. Machine Learning for Civil Engineers 55 / 77

70 Soil contamination characterization Results contaminant concentration [ ] [CIV ML/Soil contamination concentration.fig] Regression methods V1. Machine Learning for Civil Engineers 56 / 77

71 Soil contamination characterization Results Prediction quality Predictions Theoretical log M G D,i log yi σ G D,i (log yi, log M G D,i) log yi = log M G D,i (log yi, log M G D,i ± σ G D,i) log M G D,i log yi! Overall, the predictive capacity is poor need more data Regression methods V1. Machine Learning for Civil Engineers 57 / 77

72 Soil contamination characterization Soil contamination concentration All the details of this study are presented in: Quach, A., Tabor, L. Dumont, D., Courcelles, B., and Goulet, J.-A. (217). A machine learning approach for characterizing soil contamination in the presence of physical site discontinuities and aggregated samples. Submitted to Advanced Engineering Informatics. Regression methods V1. Machine Learning for Civil Engineers 58 / 77

73 Metamodel GPR & Metamodels } {{ } y = g(x), x = [x 1, x 2,, x N ] i g(x): Mathematical model taking as inputs parameters x x: Vector containing model parameters (e.g. boundary condition, material properties, etc...) y: Model response (e.g. displacement, strain, temperature etc...) Regression methods V1. Machine Learning for Civil Engineers 59 / 77

74 Metamodel Metamodel GPR set-up model {}}{ y }{{} i = g(x i ), x i = [x 1, x 2,, x X ] i }{{} simulation covariates for i = 1,, D, where D is the number of simulations generated y = [y 1, y 2,, y D ] x = [x 1, x 2,, x D ] X D l = [l 1, l 2,, l X ] X 1 }{{} one l k per covariate x k m = cte. σ V = (no ( observation errors) ρ(x i, x j ) = exp 1 ) 2 (x i x j ) diag(l) 2 (x i x j ) Regression methods V1. Machine Learning for Civil Engineers 6 / 77

75 Metamodel Example Metamodel Tamar Bridge Regression methods V1. Machine Learning for Civil Engineers 61 / 77

76 Metamodel Metamodel Tamar Bridge Response y i We are interested in modelling the bridge first natural frequency We have a set of 1 model simulations where model parameters are randomly sampled Regression methods V1. Machine Learning for Civil Engineers 62 / 77

77 Metamodel Metamodel Tamar Bridge Covariates x i = [x 1 : x 5 ] i Saltash side Sidespan cables initial strains Main cable initial strains Plymouth side Saltash side deck expansion joint Saltash tower deck expansion joint Plymouth side deck expansion joint 1. Main-cable initial strain: (5E-4, 3E-3) mm/mm 2. Sidespan cable initial strains (5E-4, 3E-3) mm/mm 3. Plymouth-side support long. stiff. 1 (4,11) kn/mm 4. Saltash-side support long. stiff. 1 (4,11) kn/mm 5. Saltash tower deck expansion joint long. stiff. 1 (4,11) kn/mm Regression methods V1. Machine Learning for Civil Engineers 63 / 77

78 Metamodel Problem set-up Covariates and observations: x {Y } i = [l 1, l 2, l 3, l 4, l 5, σ g ] i D = {(x i, y i ), i = 1 : 1} Observation model: model prediction {}}{ Y }{{} i = g(x i ) observation Regression methods V1. Machine Learning for Civil Engineers 64 / 77

79 Metamodel Metamodel Tamar Bridge hyper-parameters estimation Hyper-parameters estimation: P = {l 1, l 2, l 3, l 4, l 5, σ G } P = arg max P = log-likelihood {}}{ log p(y l {Y }, P) l 1 =.6 mm/mm l 2 =.4 mm/mm l 3 = 2.21 kn/mm l 4 = 2.31 kn/mm l 5 = 1.76 kn/mm σ G =.7 Hz Regression methods V1. Machine Learning for Civil Engineers 65 / 77

80 Metamodel Metamodel Tamar Bridge Prediction quality[ ].8 Predictions Theoretical (y i, M C D,i ) y i = M C D,i (y i, M C D,i ± σ C D,i ) M C D,i yi σ C D,i M C D,i y i [CIV ML/Meta model tamar.m] Regression methods V1. Machine Learning for Civil Engineers 66 / 77

81 Section Outline Adv. topics 5.1 Prior knowledge mean values 5.2 Covariance functions 5.3 Homo/Heteroscedasticity 5.4 Other methods Regression methods V1. Machine Learning for Civil Engineers 67 / 77

82 Prior knowledge mean values Prior knowledge mean values So far we have model our prior mean as being either or a constant. When predicting in the neighbourhood of observed data the prior knowledge has a negligible impact. However, the impact of the prior increase as the distance from observed data increases Solution: your prior knowledge can be parameterized by a function depending on covariates. e.g.: m(x) = ax + b Where {a, b} are hyper-parameters that needs to be estimated using data. Regression methods V1. Machine Learning for Civil Engineers 68 / 77

83 Prior knowledge mean values Interactive example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] [CIV ML/GPR example.m] Regression methods V1. Machine Learning for Civil Engineers 69 / 77

84 Prior knowledge mean values GPML package mean functions %% Zero prior mean meanfunc meanzero ; %% Constant prior mean meanfunc meanconst ; % GP mean fct. hyp. mean =.39; % Prior mean %% Linear prior mean meanfunc meanlinear ; % GP mean fct. hyp. mean =.1; % Prior mean slope %% composite prior mean msu = { ' meansum ',{@ meanconst,@ meanlinear }}; hyp. mean = [.39,.1]; Note: parameters in hyp.mean will be estimated using the minimize function Regression methods V1. Machine Learning for Civil Engineers 7 / 77

85 Covariance functions Square-exponential covariance function ( ρ(x i, x j ) = exp 1 ) 2 (x i x j ) diag(l) 2 (x i x j ) = exp ( ) X ([x i ] k [x j ] k ) 2 k=1 [l] k Regression methods V1. Machine Learning for Civil Engineers 71 / 77

86 Covariance functions Matern covariance function ρ(x i, x j ) = exp ( X k=1 ) [x i ] k [x j ] k [l] k Regression methods V1. Machine Learning for Civil Engineers 72 / 77

87 Covariance functions Covariance function and cross-validation How to choose between two covariance function structures? Cross-Validation or Bayesian Model selection Regression methods V1. Machine Learning for Civil Engineers 73 / 77

88 Homo/Heteroscedasticity Homo/Heteroscedasticity Observations Mean CI 9% y (a) Homoscedastic and independent x x (b) Heteroscedastic and independent x (c) Heteroscedastic and dependent a) Standard GPR,GPML b) Tolvanen et al. (214). Expectation propagation for non-stationary heteroscedastic gaussian process regression. IEEE-MLSP, GPStuff c) Tabor et al. (217). Probabilistic modeling of heteroscedastic laboratory experiments using gaussian process regression. Journal of Engineering Mechanics Regression methods V1. Machine Learning for Civil Engineers 74 / 77

89 Other methods Regression methods Linear regression and Gaussian Process regression are not the only approaches available Neural-networks (Deep-Learning) Support vector macine (SVM) Decision tree (Random-forest)... Most of current research carries on Deep-learning Regression methods V1. Machine Learning for Civil Engineers 75 / 77

90 Summary Linear Regression: Quick & easy, yet outperformed by modern methods Gaussian Process Regression: Quick & easy Allows interpolating and extrapolating (uncertainty estimates) l allows quantifying the relevance of each covariate Cross-Validation: Prevents over-fitting The standard method to employ for quantifying prediction accuracy Allows comparing model structures Regression methods V1. Machine Learning for Civil Engineers 76 / 77

91 Gaussian Process Limitations The GPR method covered here only allows modelling deterministic phenomenons More advanced methods are required to handle heteroscedasticity Depends on the selection of proper prior mean and covariance functions P is a point estimate for model hyper-parameters. A full Bayesian approach is required of the amount of data is small. The GP is assumed to be stationary, i.e., same l for any covariate x Regression methods V1. Machine Learning for Civil Engineers 77 / 77

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

Intro Utility theory Utility & Loss Functions Value of Information Summary. Decision Theory. (CIV Machine Learning for Civil Engineers)

Intro Utility theory Utility & Loss Functions Value of Information Summary. Decision Theory. (CIV Machine Learning for Civil Engineers) Decision Theory (CIV654 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 14 Goulet (219). Probabilistic Machine Learning

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,

More information

Modular Bayesian uncertainty assessment for Structural Health Monitoring

Modular Bayesian uncertainty assessment for Structural Health Monitoring uncertainty assessment for Structural Health Monitoring Warwick Centre for Predictive Modelling André Jesus a.jesus@warwick.ac.uk June 26, 2017 Thesis advisor: Irwanda Laory & Peter Brommer Structural

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

Gaussian Processes for Machine Learning (GPML) Toolbox

Gaussian Processes for Machine Learning (GPML) Toolbox Journal of Machine Learning Research 11 (2010) 3011-3015 Submitted 8/10; Revised 9/10; Published 11/10 Gaussian Processes for Machine Learning (GPML) Toolbox Carl Edward Rasmussen Department of Engineering

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Probabilistic Modeling of Heteroscedastic Laboratory Experiments Using Gaussian Process Regression

Probabilistic Modeling of Heteroscedastic Laboratory Experiments Using Gaussian Process Regression Probabilistic Modeling of Heteroscedastic Laboratory Experiments Using Gaussian Process Regression Lucie Tabor 1, James-A. Goulet 1, Jean-Philippe Charron 1, and Clelia Desmettre 1 1 Department of Civil,

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Can you predict the future..?

Can you predict the future..? Can you predict the future..? Gaussian Process Modelling for Forward Prediction Anna Scaife 1 1 Jodrell Bank Centre for Astrophysics University of Manchester @radastrat September 7, 2017 Anna Scaife University

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren 1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

Introduction to SVM and RVM

Introduction to SVM and RVM Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance

More information

Bayesian Approaches Data Mining Selected Technique

Bayesian Approaches Data Mining Selected Technique Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning, Midterm Exam: Spring 2009 SOLUTION 10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

y Xw 2 2 y Xw λ w 2 2

y Xw 2 2 y Xw λ w 2 2 CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Gaussian Processes for Computer Experiments

Gaussian Processes for Computer Experiments Gaussian Processes for Computer Experiments Jeremy Oakley School of Mathematics and Statistics, University of Sheffield www.jeremy-oakley.staff.shef.ac.uk 1 / 43 Computer models Computer model represented

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information