Introduction Linear Regression Gaussian Process Regression (GPR) Examples Adv. topics Summary. Regression methods

Size: px

Start display at page:

Download "Introduction Linear Regression Gaussian Process Regression (GPR) Examples Adv. topics Summary. Regression methods"

Calvin Patterson
6 years ago
Views:

1 Regression methods (CIV654 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapters 8 Goulet (218). Probabilistic Machine Learning for Civil Engineers. Regression methods V1. Machine Learning for Civil Engineers 1 / 77

2 What is regression? What is regression? Patogens [ppm] Data D = {(x i, y i ), i = 1 : D} Covariate x i R : attribute regressor y i R : Observation [ ] cl+ Regression methods: mathematical models for g(x) Regression methods V1. Machine Learning for Civil Engineers 2 / 77

3 What is regression? What is regression? # of accidnets hour of the day Data D = {(x i, y i ), i = 1 : D} Covariate x i R : attribute regressor y i R : Observation Model g(x) fct(x) Regression methods: mathematical models for g(x) Regression methods V1. Machine Learning for Civil Engineers 2 / 77

4 What is regression? What is regression? concrete strength Water/cement ratio Data D = {(x i, y i ), i = 1 : D} Covariate x i R : attribute regressor y i R : Observation Model g(x) fct(x) Regression methods: mathematical models for g(x) Regression methods V1. Machine Learning for Civil Engineers 2 / 77

5 What is regression? Regression in > 1D Data D = {(x i, y i ), i = 1 : D} Covariates x i = [x 1, x 2 ] i R 2 : attributes regressors y i R 1 : Observation Model g(x) fct(x) R 1 General case: x i = [x 1, x 2,, x X ] i R X, y i R 1, g(x i ) R 1 Regression methods V1. Machine Learning for Civil Engineers 3 / 77

Introduction Linear Regression Gaussian Process Regression (GPR) Examples Adv. topics Summary What is regression?

5 4 2 x 2 4 Introduction Bayesian Estimation p(b A)p(A) p(b) p(a B) = s = 1 5.

Machine Learning for Civil Engineers µ σ.4 2 Probability distributions.5.

6 Introduction Linear Regression Gaussian Process Regression (GPR) Examples Adv. topics Summary What is regression? Module #5 Outline Topics organization n Decision Making ( µ µ+σ f X(x) FX(x) 4 2 x 2 µ σ µ µ+σ x 2 4 Introduction Bayesian Estimation p(b A)p(A) p(b) p(a B) = s = MCMC sampling & Newton Regression θ Model driven methods Regression methods V1. Machine Learning for Civil Engineers µ σ.4 2 Probability distributions θ1 5 θ θ1 [ ] cl+ d= Classification d=1 d=2 d=3 d=4 d= S={4,5} 1 x.6 S={2,3}.8 1 S={,1} State-space model for time-series 2 8 Model Calibration y2 Data-driven methods 1 Revision probability & linear algebra Patogens [ppm] Machine Learning Basics ( fx D(x d) Probability theory P(D S) Introduction Linear Regression Gaussian Process Regression (GPR) Examples Adv. topics Summary 2 y2 2 2 y y1 2 E[U] = 3.7 E[U] = 7. 9 Decision Theory 1 AI & Sequential decision problems Polytechnique Montre al 4 / 77

7 Introduction Wich one is an example of linear regression? yi training set g(x) J train (b) = x i yi 6 training set g(x) x i Both. Regression methods V1. Machine Learning for Civil Engineers 5 / 77

8 Introduction Section Outline Linear Regression 2.1 Introduction 2.2 Mathematical formulation 1D 2.3 Cross-validation 2.4 Mathematical formulation >1D 2.5 Limitation Regression methods V1. Machine Learning for Civil Engineers 6 / 77

9 Mathematical formulation 1D Nomenclature & hypotheses yi Reality x i Data D = {(x i, y i ), i = 1 : D} Covariate x i R : attribute regressor y i R : Observation Model g(x) fct(x) Hypothesis: g(x) reality, i.e. no variability y = g(x) + v, v : V N (v;, σ 2 V ), independent observation errors {}}{ V i V j, i j Regression methods V1. Machine Learning for Civil Engineers 7 / 77

10 Mathematical formulation 1D Mathematical formulation special case Let us assume that our model takes the form [ ] 1 g(x) = b + b 1 x = [b b 1 ] x For the entire dataset b = [ b b 1 y = g(x) + v, v : V N (v;, σ 2 V ) ], y = y 1 y 2. y D 1 x 1 1 x 2, X =.., ɛ = 1 x D y = b X + v, v : V N (v;, σ 2 V I) ɛ 1 ɛ 2. ɛ D Regression methods V1. Machine Learning for Civil Engineers 8 / 77

11 Mathematical formulation 1D Joint likelihood of observations y = b X + v, v : V N (v;, σ 2 V I) f (y x, b) = D N (y i ; g(x i ), σv 2 ) i=1 D ( exp 1 (y i g(x i )) 2 ) 2 σ 2 i=1 V ( ) = exp 1 D (y i g(x i )) 2 2 σ 2 i=1 V ( = exp 1 ) 2σV 2 (y b X) (y b X) Regression methods V1. Machine Learning for Civil Engineers 9 / 77

12 Mathematical formulation 1D Estimating optimal parameters b The goal is to estimate optimal parameters b which maximize the likelihood f (y x, b),or equivalently which maximize the log-likelihood ln f (y x, b) = 1 2σV 2 (y b X) (y b X) 1 2 (y b X) (y b X) = J(b) (Loss function) In the context of linear regression, we minimize J(b) which is equivalent to maximizing f (y x, b) Regression methods V1. Machine Learning for Civil Engineers 1 / 77

13 Mathematical formulation 1D Estimating optimal parameters b (cont.) Loss function: measures the model quality J(b) = 1 2 D (y i g(x i )) 2 = 1 2 (y b X) (y b X) i=1 b = arg min b J(b) {}}{ 1 2 (y b X) (y b X) J(b ) b = X Xb X y b = (X X) 1 X y Regression methods V1. Machine Learning for Civil Engineers 11 / 77

14 Mathematical formulation 1D Example #1 [ ] yi training set x i g(x) = b + b 1 x b = [.96, 4.9] [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 12 / 77

15 Mathematical formulation 1D Example #1 [ ] yi training set g(x) J train (b) = x i g(x) = b + b 1 x b = [.96, 4.9] [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 12 / 77

16 Mathematical formulation 1D What if the data is non-linear? yi training set x i Regression methods V1. Machine Learning for Civil Engineers 13 / 77

17 Mathematical formulation 1D Mathematical formulation general case g(x) = b + b 1 φ 1 (x) + b 2 φ 2 (x) + + b B φ B (x) = b X where φ j (x) can be any linear or non-linear function, e.g.: φ j (x) = x 2, φ j (x) = sin(x), For the entire dataset b b 1 b =. b P, y = y 1 y 2. y D, X = 1 φ 1 (x 1 ) φ 2 (x 1 ) φ B (x 1 ) 1 φ 1 (x 2 ) φ 2 (x 2 ) φ B (x 2 ) φ 1 (x D ) φ 2 (x D ) φ B (x D ) Linear regression: Linear with respect to φ j (x) and b not x Regression methods V1. Machine Learning for Civil Engineers 14 / 77

18 Mathematical formulation 1D Example #2a [ ] g(x) = b + b 1 x 2 yi 7 training set g(x) J train (b) = x i [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 15 / 77

19 Mathematical formulation 1D Example #2b [ ] g(x) = b + b 1 x 3 + b 2 sin(1x) training set g(x) yi [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 16 / 77 x i

20 Cross-validation What model structure is best? g(x) = b + b 1 x g(x) = b +b 1 x +b 2 x 2 +b 3 x 3 +b 4 x 4 +b 5 x 5 yi training set g(x) J train (b) = x i yi training set g(x) J train (b) = x i Linear regression is sensitive to over-fitting Regression methods V1. Machine Learning for Civil Engineers 17 / 77

21 Introduction Linear Regression Gaussian Process Regression (GPR) Examples Adv. topics Summary Cross-validation Overfitting & Bias-Variance tradeoff (a) Low bias & (b) High bias & (c) Low bias & (d) High bias & Low variance Low variance High variance High variance Regression methods V1. Machine Learning for Civil Engineers Polytechnique Montre al 18 / 77

22 Cross-validation Cross-validation methodology Over-fitting is prevented by employing cross-validation Cross-validation: Do not employ the same data to train (i.e. estimate optimal parameters) and to test the loss function N-fold cross-validation: Split the data into N subsets and that are one at a time employ as an independent test set fold # entire dataset training subset testing subset Regression methods V1. Machine Learning for Civil Engineers 19 / 77

23 Cross-validation Algorithm 1: N-fold cross-validation 1 define x, y, g m (x); 2 CV idx=randperm(x); 3 ncv=round(x/n); 4 for m = 1, 2,, M do 5 for i = 1, 2,, N do 6 test idx= CV idx((i 1) ncv + 1 : min(x, i ncv )); 7 x train=x; x train(test idx,:)=[]; 8 y train=x; y train(test idx,:)=[]; 9 x test=x(test idx,:); 1 y test=x(test idx,:); 11 Estimate b i : y train = g m (b, x train); 12 Compute J i (b, x test, y test) 13 J m test = N i=1 J i; 14 Select m : Jtest m = min Jtest; m 15 Estimate b : y = g m (b, x); Regression methods V1. Machine Learning for Civil Engineers 2 / 77

24 Cross-validation Example #3 Cross-validation [ ] g(x) = b + b 1 x g(x) = b +b 1 x +b 2 x 2 +b 3 x 3 +b 4 x 4 +b 5 x 5 yi training set J test (b) = 1.56 J train (b) = x i g(x) test set Jtrain = 1.97 Jtest = 2.2 } yi training set J test (b) = J train (b) = x i g(x) test set Jtrain = 1.14 Jtest = } [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 21 / 77

25 Cross-validation Example #3 Cross-validation [ ] g(x) = b + b 1 x g(x) = b +b 1 x +b 2 x 2 +b 3 x 3 +b 4 x 4 +b 5 x 5 yi training set J test (b) =.461 J train (b) = x i g(x) test set Jtrain = 1.97 Jtest = 2.2 } yi training set J test (b) = J train (b) = x i g(x) test set Jtrain = 1.14 Jtest = } [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 21 / 77

26 Cross-validation N-fold cross-validation The greater is the number of folds N, the more accurate are results. If N = D it is called leave-one-out cross-validation (LOOCV) In practice LOOCV to computationally expansive when D is large because it requires as many calibration loops Instead in practice, it is common to employ 5 to 1-fold cross-validation Regression methods V1. Machine Learning for Civil Engineers 22 / 77

Observation Model g(x) fct(x) R 1 General case: x i = [x 1, x 2,, x X ] i R X,

27 Mathematical formulation >1D Regression in > 1D Data D = {(x i, y i ), i = 1 : D} Covariates x i = [x 1, x 2 ] i R 2 : attributes regressors y i R 1 : Observation Model g(x) fct(x) R 1 General case: x i = [x 1, x 2,, x X ] i R X, y i R 1, g(x i ) R 1 Regression methods V1. Machine Learning for Civil Engineers 23 / 77

28 Mathematical formulation >1D Mathematical formulation general case >1D g(x) b + b 1 φ 1 (x 1 ) + b 2 φ 2 (x 2 ) + = b X For the entire dataset b b 1 b =. b B, y = y 1 y 2. y D, X = 1 φ 1 (x 1,1 ) φ 2 (x 1,2 ) 1 φ 1 (x 2,1 ) φ 2 (x 2,2 ) φ 1 (x D,1 ) φ 2 (x D,2 ) Regression methods V1. Machine Learning for Civil Engineers 24 / 77

29 Mathematical formulation >1D Example #4 [ ] g(x) = b + b 1 x 1/2 1 + b 2 x 2 2 training set g(x) yi x 2,i x 1,i.8 1 [CIV ML/Linear regression 2D.m] Regression methods V1. Machine Learning for Civil Engineers 25 / 77

30 Limitation Limitations Does not differentiate between interpolating and extrapolating Cannot handle autocorrelation which is comon in time-series Sensitive to outliers Assumes homoscedastic errors The performance depends on the capacity to hand-pick the correct transformation functions (Difficult when D > 1) Regression methods V1. Machine Learning for Civil Engineers 26 / 77

31 Limitation Summary In short: Linear regression is simple and quick, yet it is outperformed by modern techniques such as Gaussian Process Regresssion Note: Cross-validation is not limited to linear regression and can be employed with any regression technique Regression methods V1. Machine Learning for Civil Engineers 27 / 77

32 Introduction Example Tº distribution along a pipeline Given that we know the pipeline temperature to be y = 8 C at x = 52 km. What is the temperature at x = 52.1 km? What is the temperature at x = 7 km? We will take advantage of the conditional dependency between a system response y and covariate values x [Bloomberg] Regression methods V1. Machine Learning for Civil Engineers 28 / 77

33 Introduction Section Outline Gaussian Process Regression (GPR) 3.1 Introduction 3.2 Definition 3.3 Correlation functions 3.4 Updating a GP using exact observations 3.5 Updating a GP using noise-contaminated observations 3.6 Multiple covariates: x = [x 1, x 2,, x X ] i 3.7 Parameter estimation 3.8 Evaluating GPR predictive capacity 3.9 GPML Regression methods V1. Machine Learning for Civil Engineers 29 / 77

34 Definition Definition Given a system response so that reality {}}{ g }{{} i = g(x i ), x i = [x 1, x 2,, x X ] i }{{} observation covariates Data: D = {(x i, g i ), i = 1 : D} g = [g 1, g 2,, g D ], x = [x 1, x 2,, x D ] X D Gaussian process: g(x) : G N (m G, Σ G ) Set of discrete Gaussian random variables for which the pairwise correlation between G i and G j is a function of the distance between attributes [Σ G ] ij = ρ(x i, x j )σ Gi σ Gj, ρ(x i, x j ) = fct( x i x j ) Regression methods V1. Machine Learning for Civil Engineers 3 / 77

35 Correlation functions Example - Tº distribution along a pipeline [Bloomberg] Regression methods V1. Machine Learning for Civil Engineers 31 / 77

36 Correlation functions Example - Tº distribution along a pipeline Prior knowledge: µ Gi = µ G = C, σ Gi = σ G ( = 2.5 C ) Gaussian correlation function: ρ(x i, x j ) = exp 1 (x i x j ) 2 2 l 2 ρ(x 1, x 2 ) =.999 for l = 25 km ρ(x 1, x 3 ) =.278 ρ(x 2, x 3 ) =.296 Regression methods V1. Machine Learning for Civil Engineers 32 / 77

37 Correlation functions Example - Tº distribution along a pipeline [ ] Pour x = [, 1,, 1] Prior knowledge {}}{ G N (m G, Σ G ), [Σ G ] ij = ρ(x i, x j )σ 2 G i g i : realizations of our prior knowledge G [CIV ML/M14 GP pipeline.m] Regression methods V1. Machine Learning for Civil Engineers 33 / 77

38 Correlation functions Example - Tº distribution along a pipeline [ ] Pour x = [, 1,, 1] Prior knowledge {}}{ G N (m G, Σ G ), [Σ G ] ij = ρ(x i, x j )σ 2 G i g i : realizations of our prior knowledge G [CIV ML/M14 GP pipeline.m] Regression methods V1. Machine Learning for Civil Engineers 33 / 77

39 Updating a GP using exact observations Updating a GP using exact observations Given D = {(x i, g i ), i = 1, D} a set of D observations and x a set of X covariates for which we want to predict f (g X, D) Reminder: Gaussian conditionals are also gaussian { } { } [ ] G mg ΣG Σ, m =, Σ = G G m Σ G Σ }{{} Prior knowledge [Σ G ] ij = ρ(x i, x j )σ 2 G [Σ G ] ij = ρ(x i, x j )σ 2 G [Σ ] ij = ρ(x i, x j )σ 2 G f G x,d(g x, D) = N (g ; m D, Σ D ) m D m D = m G + Σ G Σ 1 G (g m G) Σ D Σ D = Σ Σ G Σ 1 G Σ G }{{} Posterior knowledge Regression methods V1. Machine Learning for Civil Engineers 34 / 77

40 Updating a GP using exact observations Example Multivariate gaussian conditional Soit µ G = C, σ G = 2.5 C, ρ(x 1, x 2 ) =.8, g 2 = 2 C }{{}}{{} prior knowledge observation [ ] m G = f G1 G 2 (g 1, g 2 ) = N (m G, Σ G ) [ 2.5 Σ G = ] { f G1 g 2 (g 1 g 2 ) = f g G 1 G 2 (g 1, g 2 ) = N (µ f G2 (g 2 ) 1 2, σ1 2 2 ) µ 1 2 = µ 1 + ρσ 2 µ 2 1 σ 2 σ 1 2 = σ 1 1 ρ 2 µ 1 2 = σ 1 2 = = 1.5 C = 1.6 C Regression methods V1. Machine Learning for Civil Engineers 35 / 77

41 Updating a GP using exact observations Example Multivariate gaussian conditional [ ] f G1 g 2 (g 1 g 2 ) = f G 1 G 2 (g 1, g 2 ) f G2 (g 2 ) = N (µ 1 2, σ ) { µ 1 2 = µ 1 + ρσ 1 y 2 µ 2 σ 2 σ 1 2 = σ 1 1 ρ 2 g 2 = -2 µ 1 2 =-1.6 σ 1 2 =1.5 fg1g2 (g 1,g2) f G 1 g2 (g 1 g2) -2 g 2-5 g g 2-5 g 1 5 [CIV ML/M14 Gauss conditionnelle.m] Regression methods V1. Machine Learning for Civil Engineers 36 / 77

42 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {( [ ], [ ] )} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77

43 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {( [ ], [ ] )} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77

44 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {([31], [4.13])} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77

45 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {([31], [4.13])} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77

46 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {([31, 7], [4.13, 3.62] )} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77

47 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {([31, 7], [4.13, 3.62] )} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77

48 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {([31, 7, 3], [4.13, 3.62, 3.71] )} }{{}}{{} x y [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77

49 Updating a GP using noise-contaminated observations Noise-contaminated observations Given a system response contaminated by measurement noise reality so that {}}{ Y }{{} i = g(x i ) + }{{} V, V N (v;, σv 2 ) observation measurement error [Σ Y ] ij = ρ(x i, x j )σ 2 G + σ2 V δ ij, δ ij = 1 si i = j { Y G } { my, m = m G } [ ΣY Σ, Σ = Y Σ Y Σ } {{ } prior knowledge f G x,d(g x, D) = N (g ; m D, Σ D ) m D = m G + Σ Y Σ 1 Y (y m Y), Σ D = Σ Σ Y Σ 1 Y Σ Y }{{} posterior knowledge ] Regression methods V1. Machine Learning for Civil Engineers 38 / 77

50 Updating a GP using noise-contaminated observations Example - Tº distribution (V ) [ ] 1 y [ºc] x [km] Measurement error: σ V =.5 Observations: D = {( [ ], [ ] )} }{{}}{{} x y [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 39 / 77

51 Updating a GP using noise-contaminated observations Example - Tº distribution (V ) [ ] 1 y [ºc] x [km] Measurement error: σ V =.5 Observations: D = {([31], [3.42])} }{{}}{{} x y [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 39 / 77

52 Updating a GP using noise-contaminated observations Example - Tº distribution (V ) [ ] 1 y [ºc] x [km] Measurement error: σ V =.5 Observations: D = {([31, 7], [3.42, 2.92] )} }{{}}{{} x y [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 39 / 77

53 Updating a GP using noise-contaminated observations Example - Tº distribution (V ) [ ] 1 y [ºc] x [km] Measurement error: σ V =.5 Observations: D = {([31, 7, 3], [3.42, 2.92, 3.38] )} }{{}}{{} x y [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 39 / 77

54 Multiple covariates: x = [x 1, x 2,, x X] i Multiple covariates: x = [x 1, x 2,, x X ] i y }{{} i = observation reality {}}{ g(x i ) +v i, x i = [x 1, x 2,, x X ] i } {{ } covariates, for i = 1,, D y = [y 1, y 2,, y D ], x = [x 1, x 2,, x D ] X D, l = [l 1, l 2,, l X ] X 1 }{{} one l k per covariate x k ( ) ρ(x i, x j ) = exp 1 X ([x k ] i [x k ] j ) 2 2 l 2 k=1 k ( = exp 1 ) 2 (x i x j ) diag(l 2 ) 1 (x i x j ) Regression methods V1. Machine Learning for Civil Engineers 4 / 77

55 Multiple covariates: x = [x 1, x 2,, x X] i Multivariate correlation function [ ] ( ) ρ(x i, x j ) = exp 1 X ([x i ] k [x j ] k ) 2 2 l 2 k=1 k ( = exp 1 ) 2 (x i x j ) diag(l) 2 (x i x j ) 1 si l k σ Xk : Imp(X k y) ρ(xi, xj ) si l k < σ Xk : Imp(X k y).2 5 [xi ]2 [xj ] [xi ]1 [xj ]1 5 [CIV ML/M14 multivariate corr fct.m] Regression methods V1. Machine Learning for Civil Engineers 41 / 77

56 Parameter estimation Parameter estimation σ G & l Up to now, we made the hypothesis that we knew σ G and l = [l 1, l 2,, l X ]. In practice, we need to learn hyperparameters θ = [σ G, l] using training data D = {D x, D y } = {(x i, y i ), i = 1, D}. Bayes rule: p(θ D x, D y ) = }{{} posterior likelihood {}}{ p(d y D x, θ) p(d y ) }{{} cte. prior {}}{ p(θ)! In practice, evaluating p(θ x, y) is computationally expensive... Regression methods V1. Machine Learning for Civil Engineers 42 / 77

57 Parameter estimation MLE approximation prior likelihood D {}}{{}}{{}}{ p(d y D x, θ) p(θ) p(θ D x, D y ) = }{{} p(d y ) posterior }{{} cte. If p(θ) = cte., θ p(θ D x, D y ) p(d y D x, θ) p(d y D x ) θ = arg max θ p(d y D x, θ) p(d y D x, θ) = p(d y D x ) = N (D y ; m Y, Σ Y ) Regression methods V1. Machine Learning for Civil Engineers 43 / 77

58 Parameter estimation MLE approximation (cont.) When the size of Σ Y increases, p(y x, θ) causing numerical instabilities!. Solution: θ = arg max θ likelihood {}}{ p(y x, θ) arg max θ log-likelihood {}}{ log p(y x, θ) p(y x, θ) s maximum corresponds to log p(y x, θ) s maximum and there are no more instabilities. log p(y x, θ) = log N (y; m Y, Σ Y ) = 1 2 (y m Y) Σ 1 Y (y m Y) 1 2 log Σ Y N 2 log 2π Regression methods V1. Machine Learning for Civil Engineers 44 / 77

59 Parameter estimation Maximization How to find θ? d log p(y x, θ) = 1 dθ j 2 y Σ 1 dσ Y Y Σ 1 dθ j Y y 1 2 tr(σ 1 Y We employ iterative optimization techniques (such as Newton-Raphson) to find θ. Efficient algorithms are already implemented in GPML. dσ Y dθ j ) = Regression methods V1. Machine Learning for Civil Engineers 45 / 77

60 Parameter estimation Interactive example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] [CIV ML/GPR example.m] Regression methods V1. Machine Learning for Civil Engineers 46 / 77

61 Evaluating GPR predictive capacity Evaluating GPR predictive capacity We should employ Leave-one-out-cross-validation to estimate the GPR predictive capacity 1. set-up the GPR model 2. estimate hyper-parameters {l, σ G, σ V } using the entire dataset 1 3. use LOO-CV to predict each data point while using only the training set remaining data 4. plot the histogram of normalized prediction errors: m G D,i y i σ G D,i 5. plot a scatter of predicted and observed values 1 This can be done because GPR s hyper-parameters l and σ V are not sensitive to over-fitting (this is not true for linear-regression or for more advanced formulation) Regression methods V1. Machine Learning for Civil Engineers 47 / 77

62 Evaluating GPR predictive capacity Example Evaluating GPR predictive capacity.8 Predictions Theoretical (y i, M C D,i ) y i = M C D,i (y i, M C D,i ± σ C D,i ) M C D,i yi σ C D,i M C D,i y i Regression methods V1. Machine Learning for Civil Engineers 48 / 77

63 GPML GPML package [ ] %% Data x_obs ; % Attribute values associated with observations y_obs ; % Observed / simulated values s_v ; % Measurement noise std %% GPML initial_guess_l =1; % initial guess for l initial_guess_s_g =1; % initial guess for s_g covfunc covseiso ; hyp2. cov = log ([ initial_guess_l ; initial_guess_s_g ]); hyp2. lik = log ( s_v ); likfunc likgauss ; hyp2 = minimize -5,@ infexact,[], covfunc, likfunc, x_obs, y_obs ); %% Estimated values l_gpml = exp ( hyp2. cov (1) ); % estimated value for l s_g_gpml = exp ( hyp2. cov (2) ); % estimated value for s_p [CIV ML/M14 demogmpl.m] Regression methods V1. Machine Learning for Civil Engineers 49 / 77

64 GPML Cross-Validation %% GPML set -up meanfunc meanconst ; % GP mean fct. hyp. mean = [?]; % Prior mean likfunc likgauss ; % GP likelihood fct. hyp. lik = log ([?]) ; % Measurement std covfunc = { ' covseard '};% Gaussian cov function ell = log ([?]) ; % GP correlation length sg = log ([?]) ; % GP prior standatd deviation hyp. cov = [ ell ; sg ]; % Covariance function hyperparameters %% Parameter optimization [hyp,fx] = minimize infexact, meanfunc, covfunc, likfunc, x_train, y_train ); %% Leave -one -out Cross - Validation for cv_loop =1: opt. nb_train idx_cv_loop = idx_cv ; % re - initialize CV indices idx_cv_loop ( cv_loop )=[]; % remove curr. CV test sample [~,~,GP_m, GP_s2 ] = meanfunc, covfunc, likfunc, x_train ( idx_cv_loop,:) ), y_train ( idx_cv_loop ),x_train ( cv_loop,:) ); LL_cv=log ( normpdf ( y_train ( cv_loop ),GP_m,sqrt ( GP_s2 ))); % log - likelihood residual = y_train ( cv_loop )-GP_m ; % Prediction errors cv_results (cv_loop,:) =[ GP_m,GP_s2,LL_cv, residual ]; % write -up results end %% Normalized prediction error histogram histogram ( cv_results (:,4)./ sqrt ( cv_results (:,2) )) %% Normalized prediction error histogram scatter (y_train, cv_results (:,1) ) Regression methods V1. Machine Learning for Civil Engineers 5 / 77

65 Section Outline Examples 4.1 Soil contamination characterization 4.2 Metamodel Regression methods V1. Machine Learning for Civil Engineers 51 / 77

66 Soil contamination characterization Soil contamination characterization [ ] Observed concentration, pb [mg/kg] z [dm] -5-5 y [m] x [m] 2 1 [CIV ML/Soil contamination data.fig] Regression methods V1. Machine Learning for Civil Engineers 52 / 77

67 Soil contamination characterization Problem set-up Covariates and observations: Observation model: log y }{{} i = observation = [x, y, z] i }{{} geodesic coordinates l {Y } i D = {(l {Y } i, y i ), i = 1 : 116} true contaminant [ ] in log space {}}{ ) + v i g(l {Y } i }{{} meas. error in log space, v : V N (, σ 2 V ) Regression methods V1. Machine Learning for Civil Engineers 53 / 77

68 Soil contamination characterization Problem set-up (cont.) Prior: m = { my m G } = { } [ ΣY Σ, Σ = YG Σ YG Σ G [Σ Y ] ij = ρ(l {Y } i, l {Y } j )σ 2 G + σ V 2 δ ij, δ ij = 1 if i = j, else δ ij = [Σ G ] kl = ρ(l {G} k [Σ YG ] ik = ρ(l {Y } i ρ(l i, l j ) = exp Posterior knowledge:, l {G} l, l {G} k )σ 2 G )σ 2 G ( 1 ) 2 (l i l j ) diag(l 2 ) 1 (l i l j ) m G D = m G + Σ YG Σ 1 Y (y m Y) Σ G D = Σ G Σ YG Σ 1 Y Σ YG ] Regression methods V1. Machine Learning for Civil Engineers 54 / 77

69 Soil contamination characterization Problem set-up (cont.) Hyper-parameters estimation: P = {l x, l y, l z, σ G, σ V } P = arg max P = log-likelihood {}}{ log p(y l {Y }, P) l x = 56.4 m l y = 53.7 m l z =.64 m σ G = 2.86 σ V =.18 Regression methods V1. Machine Learning for Civil Engineers 55 / 77

70 Soil contamination characterization Results contaminant concentration [ ] [CIV ML/Soil contamination concentration.fig] Regression methods V1. Machine Learning for Civil Engineers 56 / 77

71 Soil contamination characterization Results Prediction quality Predictions Theoretical log M G D,i log yi σ G D,i (log yi, log M G D,i) log yi = log M G D,i (log yi, log M G D,i ± σ G D,i) log M G D,i log yi! Overall, the predictive capacity is poor need more data Regression methods V1. Machine Learning for Civil Engineers 57 / 77

72 Soil contamination characterization Soil contamination concentration All the details of this study are presented in: Quach, A., Tabor, L. Dumont, D., Courcelles, B., and Goulet, J.-A. (217). A machine learning approach for characterizing soil contamination in the presence of physical site discontinuities and aggregated samples. Submitted to Advanced Engineering Informatics. Regression methods V1. Machine Learning for Civil Engineers 58 / 77

73 Metamodel GPR & Metamodels } {{ } y = g(x), x = [x 1, x 2,, x N ] i g(x): Mathematical model taking as inputs parameters x x: Vector containing model parameters (e.g. boundary condition, material properties, etc...) y: Model response (e.g. displacement, strain, temperature etc...) Regression methods V1. Machine Learning for Civil Engineers 59 / 77

74 Metamodel Metamodel GPR set-up model {}}{ y }{{} i = g(x i ), x i = [x 1, x 2,, x X ] i }{{} simulation covariates for i = 1,, D, where D is the number of simulations generated y = [y 1, y 2,, y D ] x = [x 1, x 2,, x D ] X D l = [l 1, l 2,, l X ] X 1 }{{} one l k per covariate x k m = cte. σ V = (no ( observation errors) ρ(x i, x j ) = exp 1 ) 2 (x i x j ) diag(l) 2 (x i x j ) Regression methods V1. Machine Learning for Civil Engineers 6 / 77

75 Metamodel Example Metamodel Tamar Bridge Regression methods V1. Machine Learning for Civil Engineers 61 / 77

76 Metamodel Metamodel Tamar Bridge Response y i We are interested in modelling the bridge first natural frequency We have a set of 1 model simulations where model parameters are randomly sampled Regression methods V1. Machine Learning for Civil Engineers 62 / 77

77 Metamodel Metamodel Tamar Bridge Covariates x i = [x 1 : x 5 ] i Saltash side Sidespan cables initial strains Main cable initial strains Plymouth side Saltash side deck expansion joint Saltash tower deck expansion joint Plymouth side deck expansion joint 1. Main-cable initial strain: (5E-4, 3E-3) mm/mm 2. Sidespan cable initial strains (5E-4, 3E-3) mm/mm 3. Plymouth-side support long. stiff. 1 (4,11) kn/mm 4. Saltash-side support long. stiff. 1 (4,11) kn/mm 5. Saltash tower deck expansion joint long. stiff. 1 (4,11) kn/mm Regression methods V1. Machine Learning for Civil Engineers 63 / 77

78 Metamodel Problem set-up Covariates and observations: x {Y } i = [l 1, l 2, l 3, l 4, l 5, σ g ] i D = {(x i, y i ), i = 1 : 1} Observation model: model prediction {}}{ Y }{{} i = g(x i ) observation Regression methods V1. Machine Learning for Civil Engineers 64 / 77

79 Metamodel Metamodel Tamar Bridge hyper-parameters estimation Hyper-parameters estimation: P = {l 1, l 2, l 3, l 4, l 5, σ G } P = arg max P = log-likelihood {}}{ log p(y l {Y }, P) l 1 =.6 mm/mm l 2 =.4 mm/mm l 3 = 2.21 kn/mm l 4 = 2.31 kn/mm l 5 = 1.76 kn/mm σ G =.7 Hz Regression methods V1. Machine Learning for Civil Engineers 65 / 77

80 Metamodel Metamodel Tamar Bridge Prediction quality[ ].8 Predictions Theoretical (y i, M C D,i ) y i = M C D,i (y i, M C D,i ± σ C D,i ) M C D,i yi σ C D,i M C D,i y i [CIV ML/Meta model tamar.m] Regression methods V1. Machine Learning for Civil Engineers 66 / 77

81 Section Outline Adv. topics 5.1 Prior knowledge mean values 5.2 Covariance functions 5.3 Homo/Heteroscedasticity 5.4 Other methods Regression methods V1. Machine Learning for Civil Engineers 67 / 77

82 Prior knowledge mean values Prior knowledge mean values So far we have model our prior mean as being either or a constant. When predicting in the neighbourhood of observed data the prior knowledge has a negligible impact. However, the impact of the prior increase as the distance from observed data increases Solution: your prior knowledge can be parameterized by a function depending on covariates. e.g.: m(x) = ax + b Where {a, b} are hyper-parameters that needs to be estimated using data. Regression methods V1. Machine Learning for Civil Engineers 68 / 77

Prior knowledge mean values Interactive example - Tº distribution along a pipeline [ ] 1 g [ºc] -1 2 4

83 Prior knowledge mean values Interactive example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] [CIV ML/GPR example.m] Regression methods V1. Machine Learning for Civil Engineers 69 / 77

84 Prior knowledge mean values GPML package mean functions %% Zero prior mean meanfunc meanzero ; %% Constant prior mean meanfunc meanconst ; % GP mean fct. hyp. mean =.39; % Prior mean %% Linear prior mean meanfunc meanlinear ; % GP mean fct. hyp. mean =.1; % Prior mean slope %% composite prior mean msu = { ' meansum ',{@ meanconst,@ meanlinear }}; hyp. mean = [.39,.1]; Note: parameters in hyp.mean will be estimated using the minimize function Regression methods V1. Machine Learning for Civil Engineers 7 / 77

85 Covariance functions Square-exponential covariance function ( ρ(x i, x j ) = exp 1 ) 2 (x i x j ) diag(l) 2 (x i x j ) = exp ( ) X ([x i ] k [x j ] k ) 2 k=1 [l] k Regression methods V1. Machine Learning for Civil Engineers 71 / 77

86 Covariance functions Matern covariance function ρ(x i, x j ) = exp ( X k=1 ) [x i ] k [x j ] k [l] k Regression methods V1. Machine Learning for Civil Engineers 72 / 77

87 Covariance functions Covariance function and cross-validation How to choose between two covariance function structures? Cross-Validation or Bayesian Model selection Regression methods V1. Machine Learning for Civil Engineers 73 / 77

88 Homo/Heteroscedasticity Homo/Heteroscedasticity Observations Mean CI 9% y (a) Homoscedastic and independent x x (b) Heteroscedastic and independent x (c) Heteroscedastic and dependent a) Standard GPR,GPML b) Tolvanen et al. (214). Expectation propagation for non-stationary heteroscedastic gaussian process regression. IEEE-MLSP, GPStuff c) Tabor et al. (217). Probabilistic modeling of heteroscedastic laboratory experiments using gaussian process regression. Journal of Engineering Mechanics Regression methods V1. Machine Learning for Civil Engineers 74 / 77

89 Other methods Regression methods Linear regression and Gaussian Process regression are not the only approaches available Neural-networks (Deep-Learning) Support vector macine (SVM) Decision tree (Random-forest)... Most of current research carries on Deep-learning Regression methods V1. Machine Learning for Civil Engineers 75 / 77

90 Summary Linear Regression: Quick & easy, yet outperformed by modern methods Gaussian Process Regression: Quick & easy Allows interpolating and extrapolating (uncertainty estimates) l allows quantifying the relevance of each covariate Cross-Validation: Prevents over-fitting The standard method to employ for quantifying prediction accuracy Allows comparing model structures Regression methods V1. Machine Learning for Civil Engineers 76 / 77

91 Gaussian Process Limitations The GPR method covered here only allows modelling deterministic phenomenons More advanced methods are required to handle heteroscedasticity Depends on the selection of proper prior mean and covariance functions P is a point estimate for model hyper-parameters. A full Bayesian approach is required of the amount of data is small. The GP is assumed to be stationary, i.e., same l for any covariate x Regression methods V1. Machine Learning for Civil Engineers 77 / 77

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet