Introduction Linear Regression Gaussian Process Regression (GPR) Examples Adv. topics Summary. Regression methods
|
|
- Calvin Patterson
- 6 years ago
- Views:
Transcription
1 Regression methods (CIV654 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapters 8 Goulet (218). Probabilistic Machine Learning for Civil Engineers. Regression methods V1. Machine Learning for Civil Engineers 1 / 77
2 What is regression? What is regression? Patogens [ppm] Data D = {(x i, y i ), i = 1 : D} Covariate x i R : attribute regressor y i R : Observation [ ] cl+ Regression methods: mathematical models for g(x) Regression methods V1. Machine Learning for Civil Engineers 2 / 77
3 What is regression? What is regression? # of accidnets hour of the day Data D = {(x i, y i ), i = 1 : D} Covariate x i R : attribute regressor y i R : Observation Model g(x) fct(x) Regression methods: mathematical models for g(x) Regression methods V1. Machine Learning for Civil Engineers 2 / 77
4 What is regression? What is regression? concrete strength Water/cement ratio Data D = {(x i, y i ), i = 1 : D} Covariate x i R : attribute regressor y i R : Observation Model g(x) fct(x) Regression methods: mathematical models for g(x) Regression methods V1. Machine Learning for Civil Engineers 2 / 77
5 What is regression? Regression in > 1D Data D = {(x i, y i ), i = 1 : D} Covariates x i = [x 1, x 2 ] i R 2 : attributes regressors y i R 1 : Observation Model g(x) fct(x) R 1 General case: x i = [x 1, x 2,, x X ] i R X, y i R 1, g(x i ) R 1 Regression methods V1. Machine Learning for Civil Engineers 3 / 77
6 Introduction Linear Regression Gaussian Process Regression (GPR) Examples Adv. topics Summary What is regression? Module #5 Outline Topics organization n Decision Making ( µ µ+σ f X(x) FX(x) 4 2 x 2 µ σ µ µ+σ x 2 4 Introduction Bayesian Estimation p(b A)p(A) p(b) p(a B) = s = MCMC sampling & Newton Regression θ Model driven methods Regression methods V1. Machine Learning for Civil Engineers µ σ.4 2 Probability distributions θ1 5 θ θ1 [ ] cl+ d= Classification d=1 d=2 d=3 d=4 d= S={4,5} 1 x.6 S={2,3}.8 1 S={,1} State-space model for time-series 2 8 Model Calibration y2 Data-driven methods 1 Revision probability & linear algebra Patogens [ppm] Machine Learning Basics ( fx D(x d) Probability theory P(D S) Introduction Linear Regression Gaussian Process Regression (GPR) Examples Adv. topics Summary 2 y2 2 2 y y1 2 E[U] = 3.7 E[U] = 7. 9 Decision Theory 1 AI & Sequential decision problems Polytechnique Montre al 4 / 77
7 Introduction Wich one is an example of linear regression? yi training set g(x) J train (b) = x i yi 6 training set g(x) x i Both. Regression methods V1. Machine Learning for Civil Engineers 5 / 77
8 Introduction Section Outline Linear Regression 2.1 Introduction 2.2 Mathematical formulation 1D 2.3 Cross-validation 2.4 Mathematical formulation >1D 2.5 Limitation Regression methods V1. Machine Learning for Civil Engineers 6 / 77
9 Mathematical formulation 1D Nomenclature & hypotheses yi Reality x i Data D = {(x i, y i ), i = 1 : D} Covariate x i R : attribute regressor y i R : Observation Model g(x) fct(x) Hypothesis: g(x) reality, i.e. no variability y = g(x) + v, v : V N (v;, σ 2 V ), independent observation errors {}}{ V i V j, i j Regression methods V1. Machine Learning for Civil Engineers 7 / 77
10 Mathematical formulation 1D Mathematical formulation special case Let us assume that our model takes the form [ ] 1 g(x) = b + b 1 x = [b b 1 ] x For the entire dataset b = [ b b 1 y = g(x) + v, v : V N (v;, σ 2 V ) ], y = y 1 y 2. y D 1 x 1 1 x 2, X =.., ɛ = 1 x D y = b X + v, v : V N (v;, σ 2 V I) ɛ 1 ɛ 2. ɛ D Regression methods V1. Machine Learning for Civil Engineers 8 / 77
11 Mathematical formulation 1D Joint likelihood of observations y = b X + v, v : V N (v;, σ 2 V I) f (y x, b) = D N (y i ; g(x i ), σv 2 ) i=1 D ( exp 1 (y i g(x i )) 2 ) 2 σ 2 i=1 V ( ) = exp 1 D (y i g(x i )) 2 2 σ 2 i=1 V ( = exp 1 ) 2σV 2 (y b X) (y b X) Regression methods V1. Machine Learning for Civil Engineers 9 / 77
12 Mathematical formulation 1D Estimating optimal parameters b The goal is to estimate optimal parameters b which maximize the likelihood f (y x, b),or equivalently which maximize the log-likelihood ln f (y x, b) = 1 2σV 2 (y b X) (y b X) 1 2 (y b X) (y b X) = J(b) (Loss function) In the context of linear regression, we minimize J(b) which is equivalent to maximizing f (y x, b) Regression methods V1. Machine Learning for Civil Engineers 1 / 77
13 Mathematical formulation 1D Estimating optimal parameters b (cont.) Loss function: measures the model quality J(b) = 1 2 D (y i g(x i )) 2 = 1 2 (y b X) (y b X) i=1 b = arg min b J(b) {}}{ 1 2 (y b X) (y b X) J(b ) b = X Xb X y b = (X X) 1 X y Regression methods V1. Machine Learning for Civil Engineers 11 / 77
14 Mathematical formulation 1D Example #1 [ ] yi training set x i g(x) = b + b 1 x b = [.96, 4.9] [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 12 / 77
15 Mathematical formulation 1D Example #1 [ ] yi training set g(x) J train (b) = x i g(x) = b + b 1 x b = [.96, 4.9] [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 12 / 77
16 Mathematical formulation 1D What if the data is non-linear? yi training set x i Regression methods V1. Machine Learning for Civil Engineers 13 / 77
17 Mathematical formulation 1D Mathematical formulation general case g(x) = b + b 1 φ 1 (x) + b 2 φ 2 (x) + + b B φ B (x) = b X where φ j (x) can be any linear or non-linear function, e.g.: φ j (x) = x 2, φ j (x) = sin(x), For the entire dataset b b 1 b =. b P, y = y 1 y 2. y D, X = 1 φ 1 (x 1 ) φ 2 (x 1 ) φ B (x 1 ) 1 φ 1 (x 2 ) φ 2 (x 2 ) φ B (x 2 ) φ 1 (x D ) φ 2 (x D ) φ B (x D ) Linear regression: Linear with respect to φ j (x) and b not x Regression methods V1. Machine Learning for Civil Engineers 14 / 77
18 Mathematical formulation 1D Example #2a [ ] g(x) = b + b 1 x 2 yi 7 training set g(x) J train (b) = x i [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 15 / 77
19 Mathematical formulation 1D Example #2b [ ] g(x) = b + b 1 x 3 + b 2 sin(1x) training set g(x) yi [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 16 / 77 x i
20 Cross-validation What model structure is best? g(x) = b + b 1 x g(x) = b +b 1 x +b 2 x 2 +b 3 x 3 +b 4 x 4 +b 5 x 5 yi training set g(x) J train (b) = x i yi training set g(x) J train (b) = x i Linear regression is sensitive to over-fitting Regression methods V1. Machine Learning for Civil Engineers 17 / 77
21 Introduction Linear Regression Gaussian Process Regression (GPR) Examples Adv. topics Summary Cross-validation Overfitting & Bias-Variance tradeoff (a) Low bias & (b) High bias & (c) Low bias & (d) High bias & Low variance Low variance High variance High variance Regression methods V1. Machine Learning for Civil Engineers Polytechnique Montre al 18 / 77
22 Cross-validation Cross-validation methodology Over-fitting is prevented by employing cross-validation Cross-validation: Do not employ the same data to train (i.e. estimate optimal parameters) and to test the loss function N-fold cross-validation: Split the data into N subsets and that are one at a time employ as an independent test set fold # entire dataset training subset testing subset Regression methods V1. Machine Learning for Civil Engineers 19 / 77
23 Cross-validation Algorithm 1: N-fold cross-validation 1 define x, y, g m (x); 2 CV idx=randperm(x); 3 ncv=round(x/n); 4 for m = 1, 2,, M do 5 for i = 1, 2,, N do 6 test idx= CV idx((i 1) ncv + 1 : min(x, i ncv )); 7 x train=x; x train(test idx,:)=[]; 8 y train=x; y train(test idx,:)=[]; 9 x test=x(test idx,:); 1 y test=x(test idx,:); 11 Estimate b i : y train = g m (b, x train); 12 Compute J i (b, x test, y test) 13 J m test = N i=1 J i; 14 Select m : Jtest m = min Jtest; m 15 Estimate b : y = g m (b, x); Regression methods V1. Machine Learning for Civil Engineers 2 / 77
24 Cross-validation Example #3 Cross-validation [ ] g(x) = b + b 1 x g(x) = b +b 1 x +b 2 x 2 +b 3 x 3 +b 4 x 4 +b 5 x 5 yi training set J test (b) = 1.56 J train (b) = x i g(x) test set Jtrain = 1.97 Jtest = 2.2 } yi training set J test (b) = J train (b) = x i g(x) test set Jtrain = 1.14 Jtest = } [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 21 / 77
25 Cross-validation Example #3 Cross-validation [ ] g(x) = b + b 1 x g(x) = b +b 1 x +b 2 x 2 +b 3 x 3 +b 4 x 4 +b 5 x 5 yi training set J test (b) =.461 J train (b) = x i g(x) test set Jtrain = 1.97 Jtest = 2.2 } yi training set J test (b) = J train (b) = x i g(x) test set Jtrain = 1.14 Jtest = } [CIV ML/Linear regression 1D.m] Regression methods V1. Machine Learning for Civil Engineers 21 / 77
26 Cross-validation N-fold cross-validation The greater is the number of folds N, the more accurate are results. If N = D it is called leave-one-out cross-validation (LOOCV) In practice LOOCV to computationally expansive when D is large because it requires as many calibration loops Instead in practice, it is common to employ 5 to 1-fold cross-validation Regression methods V1. Machine Learning for Civil Engineers 22 / 77
27 Mathematical formulation >1D Regression in > 1D Data D = {(x i, y i ), i = 1 : D} Covariates x i = [x 1, x 2 ] i R 2 : attributes regressors y i R 1 : Observation Model g(x) fct(x) R 1 General case: x i = [x 1, x 2,, x X ] i R X, y i R 1, g(x i ) R 1 Regression methods V1. Machine Learning for Civil Engineers 23 / 77
28 Mathematical formulation >1D Mathematical formulation general case >1D g(x) b + b 1 φ 1 (x 1 ) + b 2 φ 2 (x 2 ) + = b X For the entire dataset b b 1 b =. b B, y = y 1 y 2. y D, X = 1 φ 1 (x 1,1 ) φ 2 (x 1,2 ) 1 φ 1 (x 2,1 ) φ 2 (x 2,2 ) φ 1 (x D,1 ) φ 2 (x D,2 ) Regression methods V1. Machine Learning for Civil Engineers 24 / 77
29 Mathematical formulation >1D Example #4 [ ] g(x) = b + b 1 x 1/2 1 + b 2 x 2 2 training set g(x) yi x 2,i x 1,i.8 1 [CIV ML/Linear regression 2D.m] Regression methods V1. Machine Learning for Civil Engineers 25 / 77
30 Limitation Limitations Does not differentiate between interpolating and extrapolating Cannot handle autocorrelation which is comon in time-series Sensitive to outliers Assumes homoscedastic errors The performance depends on the capacity to hand-pick the correct transformation functions (Difficult when D > 1) Regression methods V1. Machine Learning for Civil Engineers 26 / 77
31 Limitation Summary In short: Linear regression is simple and quick, yet it is outperformed by modern techniques such as Gaussian Process Regresssion Note: Cross-validation is not limited to linear regression and can be employed with any regression technique Regression methods V1. Machine Learning for Civil Engineers 27 / 77
32 Introduction Example Tº distribution along a pipeline Given that we know the pipeline temperature to be y = 8 C at x = 52 km. What is the temperature at x = 52.1 km? What is the temperature at x = 7 km? We will take advantage of the conditional dependency between a system response y and covariate values x [Bloomberg] Regression methods V1. Machine Learning for Civil Engineers 28 / 77
33 Introduction Section Outline Gaussian Process Regression (GPR) 3.1 Introduction 3.2 Definition 3.3 Correlation functions 3.4 Updating a GP using exact observations 3.5 Updating a GP using noise-contaminated observations 3.6 Multiple covariates: x = [x 1, x 2,, x X ] i 3.7 Parameter estimation 3.8 Evaluating GPR predictive capacity 3.9 GPML Regression methods V1. Machine Learning for Civil Engineers 29 / 77
34 Definition Definition Given a system response so that reality {}}{ g }{{} i = g(x i ), x i = [x 1, x 2,, x X ] i }{{} observation covariates Data: D = {(x i, g i ), i = 1 : D} g = [g 1, g 2,, g D ], x = [x 1, x 2,, x D ] X D Gaussian process: g(x) : G N (m G, Σ G ) Set of discrete Gaussian random variables for which the pairwise correlation between G i and G j is a function of the distance between attributes [Σ G ] ij = ρ(x i, x j )σ Gi σ Gj, ρ(x i, x j ) = fct( x i x j ) Regression methods V1. Machine Learning for Civil Engineers 3 / 77
35 Correlation functions Example - Tº distribution along a pipeline [Bloomberg] Regression methods V1. Machine Learning for Civil Engineers 31 / 77
36 Correlation functions Example - Tº distribution along a pipeline Prior knowledge: µ Gi = µ G = C, σ Gi = σ G ( = 2.5 C ) Gaussian correlation function: ρ(x i, x j ) = exp 1 (x i x j ) 2 2 l 2 ρ(x 1, x 2 ) =.999 for l = 25 km ρ(x 1, x 3 ) =.278 ρ(x 2, x 3 ) =.296 Regression methods V1. Machine Learning for Civil Engineers 32 / 77
37 Correlation functions Example - Tº distribution along a pipeline [ ] Pour x = [, 1,, 1] Prior knowledge {}}{ G N (m G, Σ G ), [Σ G ] ij = ρ(x i, x j )σ 2 G i g i : realizations of our prior knowledge G [CIV ML/M14 GP pipeline.m] Regression methods V1. Machine Learning for Civil Engineers 33 / 77
38 Correlation functions Example - Tº distribution along a pipeline [ ] Pour x = [, 1,, 1] Prior knowledge {}}{ G N (m G, Σ G ), [Σ G ] ij = ρ(x i, x j )σ 2 G i g i : realizations of our prior knowledge G [CIV ML/M14 GP pipeline.m] Regression methods V1. Machine Learning for Civil Engineers 33 / 77
39 Updating a GP using exact observations Updating a GP using exact observations Given D = {(x i, g i ), i = 1, D} a set of D observations and x a set of X covariates for which we want to predict f (g X, D) Reminder: Gaussian conditionals are also gaussian { } { } [ ] G mg ΣG Σ, m =, Σ = G G m Σ G Σ }{{} Prior knowledge [Σ G ] ij = ρ(x i, x j )σ 2 G [Σ G ] ij = ρ(x i, x j )σ 2 G [Σ ] ij = ρ(x i, x j )σ 2 G f G x,d(g x, D) = N (g ; m D, Σ D ) m D m D = m G + Σ G Σ 1 G (g m G) Σ D Σ D = Σ Σ G Σ 1 G Σ G }{{} Posterior knowledge Regression methods V1. Machine Learning for Civil Engineers 34 / 77
40 Updating a GP using exact observations Example Multivariate gaussian conditional Soit µ G = C, σ G = 2.5 C, ρ(x 1, x 2 ) =.8, g 2 = 2 C }{{}}{{} prior knowledge observation [ ] m G = f G1 G 2 (g 1, g 2 ) = N (m G, Σ G ) [ 2.5 Σ G = ] { f G1 g 2 (g 1 g 2 ) = f g G 1 G 2 (g 1, g 2 ) = N (µ f G2 (g 2 ) 1 2, σ1 2 2 ) µ 1 2 = µ 1 + ρσ 2 µ 2 1 σ 2 σ 1 2 = σ 1 1 ρ 2 µ 1 2 = σ 1 2 = = 1.5 C = 1.6 C Regression methods V1. Machine Learning for Civil Engineers 35 / 77
41 Updating a GP using exact observations Example Multivariate gaussian conditional [ ] f G1 g 2 (g 1 g 2 ) = f G 1 G 2 (g 1, g 2 ) f G2 (g 2 ) = N (µ 1 2, σ ) { µ 1 2 = µ 1 + ρσ 1 y 2 µ 2 σ 2 σ 1 2 = σ 1 1 ρ 2 g 2 = -2 µ 1 2 =-1.6 σ 1 2 =1.5 fg1g2 (g 1,g2) f G 1 g2 (g 1 g2) -2 g 2-5 g g 2-5 g 1 5 [CIV ML/M14 Gauss conditionnelle.m] Regression methods V1. Machine Learning for Civil Engineers 36 / 77
42 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {( [ ], [ ] )} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77
43 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {( [ ], [ ] )} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77
44 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {([31], [4.13])} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77
45 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {([31], [4.13])} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77
46 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {([31, 7], [4.13, 3.62] )} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77
47 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {([31, 7], [4.13, 3.62] )} }{{}}{{} x g [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77
48 Updating a GP using exact observations Example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] Observations: D = {([31, 7, 3], [4.13, 3.62, 3.71] )} }{{}}{{} x y [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 37 / 77
49 Updating a GP using noise-contaminated observations Noise-contaminated observations Given a system response contaminated by measurement noise reality so that {}}{ Y }{{} i = g(x i ) + }{{} V, V N (v;, σv 2 ) observation measurement error [Σ Y ] ij = ρ(x i, x j )σ 2 G + σ2 V δ ij, δ ij = 1 si i = j { Y G } { my, m = m G } [ ΣY Σ, Σ = Y Σ Y Σ } {{ } prior knowledge f G x,d(g x, D) = N (g ; m D, Σ D ) m D = m G + Σ Y Σ 1 Y (y m Y), Σ D = Σ Σ Y Σ 1 Y Σ Y }{{} posterior knowledge ] Regression methods V1. Machine Learning for Civil Engineers 38 / 77
50 Updating a GP using noise-contaminated observations Example - Tº distribution (V ) [ ] 1 y [ºc] x [km] Measurement error: σ V =.5 Observations: D = {( [ ], [ ] )} }{{}}{{} x y [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 39 / 77
51 Updating a GP using noise-contaminated observations Example - Tº distribution (V ) [ ] 1 y [ºc] x [km] Measurement error: σ V =.5 Observations: D = {([31], [3.42])} }{{}}{{} x y [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 39 / 77
52 Updating a GP using noise-contaminated observations Example - Tº distribution (V ) [ ] 1 y [ºc] x [km] Measurement error: σ V =.5 Observations: D = {([31, 7], [3.42, 2.92] )} }{{}}{{} x y [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 39 / 77
53 Updating a GP using noise-contaminated observations Example - Tº distribution (V ) [ ] 1 y [ºc] x [km] Measurement error: σ V =.5 Observations: D = {([31, 7, 3], [3.42, 2.92, 3.38] )} }{{}}{{} x y [CIV ML/M14 GP pipeline obs.m] Regression methods V1. Machine Learning for Civil Engineers 39 / 77
54 Multiple covariates: x = [x 1, x 2,, x X] i Multiple covariates: x = [x 1, x 2,, x X ] i y }{{} i = observation reality {}}{ g(x i ) +v i, x i = [x 1, x 2,, x X ] i } {{ } covariates, for i = 1,, D y = [y 1, y 2,, y D ], x = [x 1, x 2,, x D ] X D, l = [l 1, l 2,, l X ] X 1 }{{} one l k per covariate x k ( ) ρ(x i, x j ) = exp 1 X ([x k ] i [x k ] j ) 2 2 l 2 k=1 k ( = exp 1 ) 2 (x i x j ) diag(l 2 ) 1 (x i x j ) Regression methods V1. Machine Learning for Civil Engineers 4 / 77
55 Multiple covariates: x = [x 1, x 2,, x X] i Multivariate correlation function [ ] ( ) ρ(x i, x j ) = exp 1 X ([x i ] k [x j ] k ) 2 2 l 2 k=1 k ( = exp 1 ) 2 (x i x j ) diag(l) 2 (x i x j ) 1 si l k σ Xk : Imp(X k y) ρ(xi, xj ) si l k < σ Xk : Imp(X k y).2 5 [xi ]2 [xj ] [xi ]1 [xj ]1 5 [CIV ML/M14 multivariate corr fct.m] Regression methods V1. Machine Learning for Civil Engineers 41 / 77
56 Parameter estimation Parameter estimation σ G & l Up to now, we made the hypothesis that we knew σ G and l = [l 1, l 2,, l X ]. In practice, we need to learn hyperparameters θ = [σ G, l] using training data D = {D x, D y } = {(x i, y i ), i = 1, D}. Bayes rule: p(θ D x, D y ) = }{{} posterior likelihood {}}{ p(d y D x, θ) p(d y ) }{{} cte. prior {}}{ p(θ)! In practice, evaluating p(θ x, y) is computationally expensive... Regression methods V1. Machine Learning for Civil Engineers 42 / 77
57 Parameter estimation MLE approximation prior likelihood D {}}{{}}{{}}{ p(d y D x, θ) p(θ) p(θ D x, D y ) = }{{} p(d y ) posterior }{{} cte. If p(θ) = cte., θ p(θ D x, D y ) p(d y D x, θ) p(d y D x ) θ = arg max θ p(d y D x, θ) p(d y D x, θ) = p(d y D x ) = N (D y ; m Y, Σ Y ) Regression methods V1. Machine Learning for Civil Engineers 43 / 77
58 Parameter estimation MLE approximation (cont.) When the size of Σ Y increases, p(y x, θ) causing numerical instabilities!. Solution: θ = arg max θ likelihood {}}{ p(y x, θ) arg max θ log-likelihood {}}{ log p(y x, θ) p(y x, θ) s maximum corresponds to log p(y x, θ) s maximum and there are no more instabilities. log p(y x, θ) = log N (y; m Y, Σ Y ) = 1 2 (y m Y) Σ 1 Y (y m Y) 1 2 log Σ Y N 2 log 2π Regression methods V1. Machine Learning for Civil Engineers 44 / 77
59 Parameter estimation Maximization How to find θ? d log p(y x, θ) = 1 dθ j 2 y Σ 1 dσ Y Y Σ 1 dθ j Y y 1 2 tr(σ 1 Y We employ iterative optimization techniques (such as Newton-Raphson) to find θ. Efficient algorithms are already implemented in GPML. dσ Y dθ j ) = Regression methods V1. Machine Learning for Civil Engineers 45 / 77
60 Parameter estimation Interactive example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] [CIV ML/GPR example.m] Regression methods V1. Machine Learning for Civil Engineers 46 / 77
61 Evaluating GPR predictive capacity Evaluating GPR predictive capacity We should employ Leave-one-out-cross-validation to estimate the GPR predictive capacity 1. set-up the GPR model 2. estimate hyper-parameters {l, σ G, σ V } using the entire dataset 1 3. use LOO-CV to predict each data point while using only the training set remaining data 4. plot the histogram of normalized prediction errors: m G D,i y i σ G D,i 5. plot a scatter of predicted and observed values 1 This can be done because GPR s hyper-parameters l and σ V are not sensitive to over-fitting (this is not true for linear-regression or for more advanced formulation) Regression methods V1. Machine Learning for Civil Engineers 47 / 77
62 Evaluating GPR predictive capacity Example Evaluating GPR predictive capacity.8 Predictions Theoretical (y i, M C D,i ) y i = M C D,i (y i, M C D,i ± σ C D,i ) M C D,i yi σ C D,i M C D,i y i Regression methods V1. Machine Learning for Civil Engineers 48 / 77
63 GPML GPML package [ ] %% Data x_obs ; % Attribute values associated with observations y_obs ; % Observed / simulated values s_v ; % Measurement noise std %% GPML initial_guess_l =1; % initial guess for l initial_guess_s_g =1; % initial guess for s_g covfunc covseiso ; hyp2. cov = log ([ initial_guess_l ; initial_guess_s_g ]); hyp2. lik = log ( s_v ); likfunc likgauss ; hyp2 = minimize -5,@ infexact,[], covfunc, likfunc, x_obs, y_obs ); %% Estimated values l_gpml = exp ( hyp2. cov (1) ); % estimated value for l s_g_gpml = exp ( hyp2. cov (2) ); % estimated value for s_p [CIV ML/M14 demogmpl.m] Regression methods V1. Machine Learning for Civil Engineers 49 / 77
64 GPML Cross-Validation %% GPML set -up meanfunc meanconst ; % GP mean fct. hyp. mean = [?]; % Prior mean likfunc likgauss ; % GP likelihood fct. hyp. lik = log ([?]) ; % Measurement std covfunc = { ' covseard '};% Gaussian cov function ell = log ([?]) ; % GP correlation length sg = log ([?]) ; % GP prior standatd deviation hyp. cov = [ ell ; sg ]; % Covariance function hyperparameters %% Parameter optimization [hyp,fx] = minimize infexact, meanfunc, covfunc, likfunc, x_train, y_train ); %% Leave -one -out Cross - Validation for cv_loop =1: opt. nb_train idx_cv_loop = idx_cv ; % re - initialize CV indices idx_cv_loop ( cv_loop )=[]; % remove curr. CV test sample [~,~,GP_m, GP_s2 ] = meanfunc, covfunc, likfunc, x_train ( idx_cv_loop,:) ), y_train ( idx_cv_loop ),x_train ( cv_loop,:) ); LL_cv=log ( normpdf ( y_train ( cv_loop ),GP_m,sqrt ( GP_s2 ))); % log - likelihood residual = y_train ( cv_loop )-GP_m ; % Prediction errors cv_results (cv_loop,:) =[ GP_m,GP_s2,LL_cv, residual ]; % write -up results end %% Normalized prediction error histogram histogram ( cv_results (:,4)./ sqrt ( cv_results (:,2) )) %% Normalized prediction error histogram scatter (y_train, cv_results (:,1) ) Regression methods V1. Machine Learning for Civil Engineers 5 / 77
65 Section Outline Examples 4.1 Soil contamination characterization 4.2 Metamodel Regression methods V1. Machine Learning for Civil Engineers 51 / 77
66 Soil contamination characterization Soil contamination characterization [ ] Observed concentration, pb [mg/kg] z [dm] -5-5 y [m] x [m] 2 1 [CIV ML/Soil contamination data.fig] Regression methods V1. Machine Learning for Civil Engineers 52 / 77
67 Soil contamination characterization Problem set-up Covariates and observations: Observation model: log y }{{} i = observation = [x, y, z] i }{{} geodesic coordinates l {Y } i D = {(l {Y } i, y i ), i = 1 : 116} true contaminant [ ] in log space {}}{ ) + v i g(l {Y } i }{{} meas. error in log space, v : V N (, σ 2 V ) Regression methods V1. Machine Learning for Civil Engineers 53 / 77
68 Soil contamination characterization Problem set-up (cont.) Prior: m = { my m G } = { } [ ΣY Σ, Σ = YG Σ YG Σ G [Σ Y ] ij = ρ(l {Y } i, l {Y } j )σ 2 G + σ V 2 δ ij, δ ij = 1 if i = j, else δ ij = [Σ G ] kl = ρ(l {G} k [Σ YG ] ik = ρ(l {Y } i ρ(l i, l j ) = exp Posterior knowledge:, l {G} l, l {G} k )σ 2 G )σ 2 G ( 1 ) 2 (l i l j ) diag(l 2 ) 1 (l i l j ) m G D = m G + Σ YG Σ 1 Y (y m Y) Σ G D = Σ G Σ YG Σ 1 Y Σ YG ] Regression methods V1. Machine Learning for Civil Engineers 54 / 77
69 Soil contamination characterization Problem set-up (cont.) Hyper-parameters estimation: P = {l x, l y, l z, σ G, σ V } P = arg max P = log-likelihood {}}{ log p(y l {Y }, P) l x = 56.4 m l y = 53.7 m l z =.64 m σ G = 2.86 σ V =.18 Regression methods V1. Machine Learning for Civil Engineers 55 / 77
70 Soil contamination characterization Results contaminant concentration [ ] [CIV ML/Soil contamination concentration.fig] Regression methods V1. Machine Learning for Civil Engineers 56 / 77
71 Soil contamination characterization Results Prediction quality Predictions Theoretical log M G D,i log yi σ G D,i (log yi, log M G D,i) log yi = log M G D,i (log yi, log M G D,i ± σ G D,i) log M G D,i log yi! Overall, the predictive capacity is poor need more data Regression methods V1. Machine Learning for Civil Engineers 57 / 77
72 Soil contamination characterization Soil contamination concentration All the details of this study are presented in: Quach, A., Tabor, L. Dumont, D., Courcelles, B., and Goulet, J.-A. (217). A machine learning approach for characterizing soil contamination in the presence of physical site discontinuities and aggregated samples. Submitted to Advanced Engineering Informatics. Regression methods V1. Machine Learning for Civil Engineers 58 / 77
73 Metamodel GPR & Metamodels } {{ } y = g(x), x = [x 1, x 2,, x N ] i g(x): Mathematical model taking as inputs parameters x x: Vector containing model parameters (e.g. boundary condition, material properties, etc...) y: Model response (e.g. displacement, strain, temperature etc...) Regression methods V1. Machine Learning for Civil Engineers 59 / 77
74 Metamodel Metamodel GPR set-up model {}}{ y }{{} i = g(x i ), x i = [x 1, x 2,, x X ] i }{{} simulation covariates for i = 1,, D, where D is the number of simulations generated y = [y 1, y 2,, y D ] x = [x 1, x 2,, x D ] X D l = [l 1, l 2,, l X ] X 1 }{{} one l k per covariate x k m = cte. σ V = (no ( observation errors) ρ(x i, x j ) = exp 1 ) 2 (x i x j ) diag(l) 2 (x i x j ) Regression methods V1. Machine Learning for Civil Engineers 6 / 77
75 Metamodel Example Metamodel Tamar Bridge Regression methods V1. Machine Learning for Civil Engineers 61 / 77
76 Metamodel Metamodel Tamar Bridge Response y i We are interested in modelling the bridge first natural frequency We have a set of 1 model simulations where model parameters are randomly sampled Regression methods V1. Machine Learning for Civil Engineers 62 / 77
77 Metamodel Metamodel Tamar Bridge Covariates x i = [x 1 : x 5 ] i Saltash side Sidespan cables initial strains Main cable initial strains Plymouth side Saltash side deck expansion joint Saltash tower deck expansion joint Plymouth side deck expansion joint 1. Main-cable initial strain: (5E-4, 3E-3) mm/mm 2. Sidespan cable initial strains (5E-4, 3E-3) mm/mm 3. Plymouth-side support long. stiff. 1 (4,11) kn/mm 4. Saltash-side support long. stiff. 1 (4,11) kn/mm 5. Saltash tower deck expansion joint long. stiff. 1 (4,11) kn/mm Regression methods V1. Machine Learning for Civil Engineers 63 / 77
78 Metamodel Problem set-up Covariates and observations: x {Y } i = [l 1, l 2, l 3, l 4, l 5, σ g ] i D = {(x i, y i ), i = 1 : 1} Observation model: model prediction {}}{ Y }{{} i = g(x i ) observation Regression methods V1. Machine Learning for Civil Engineers 64 / 77
79 Metamodel Metamodel Tamar Bridge hyper-parameters estimation Hyper-parameters estimation: P = {l 1, l 2, l 3, l 4, l 5, σ G } P = arg max P = log-likelihood {}}{ log p(y l {Y }, P) l 1 =.6 mm/mm l 2 =.4 mm/mm l 3 = 2.21 kn/mm l 4 = 2.31 kn/mm l 5 = 1.76 kn/mm σ G =.7 Hz Regression methods V1. Machine Learning for Civil Engineers 65 / 77
80 Metamodel Metamodel Tamar Bridge Prediction quality[ ].8 Predictions Theoretical (y i, M C D,i ) y i = M C D,i (y i, M C D,i ± σ C D,i ) M C D,i yi σ C D,i M C D,i y i [CIV ML/Meta model tamar.m] Regression methods V1. Machine Learning for Civil Engineers 66 / 77
81 Section Outline Adv. topics 5.1 Prior knowledge mean values 5.2 Covariance functions 5.3 Homo/Heteroscedasticity 5.4 Other methods Regression methods V1. Machine Learning for Civil Engineers 67 / 77
82 Prior knowledge mean values Prior knowledge mean values So far we have model our prior mean as being either or a constant. When predicting in the neighbourhood of observed data the prior knowledge has a negligible impact. However, the impact of the prior increase as the distance from observed data increases Solution: your prior knowledge can be parameterized by a function depending on covariates. e.g.: m(x) = ax + b Where {a, b} are hyper-parameters that needs to be estimated using data. Regression methods V1. Machine Learning for Civil Engineers 68 / 77
83 Prior knowledge mean values Interactive example - Tº distribution along a pipeline [ ] 1 g [ºc] x [km] [CIV ML/GPR example.m] Regression methods V1. Machine Learning for Civil Engineers 69 / 77
84 Prior knowledge mean values GPML package mean functions %% Zero prior mean meanfunc meanzero ; %% Constant prior mean meanfunc meanconst ; % GP mean fct. hyp. mean =.39; % Prior mean %% Linear prior mean meanfunc meanlinear ; % GP mean fct. hyp. mean =.1; % Prior mean slope %% composite prior mean msu = { ' meansum ',{@ meanconst,@ meanlinear }}; hyp. mean = [.39,.1]; Note: parameters in hyp.mean will be estimated using the minimize function Regression methods V1. Machine Learning for Civil Engineers 7 / 77
85 Covariance functions Square-exponential covariance function ( ρ(x i, x j ) = exp 1 ) 2 (x i x j ) diag(l) 2 (x i x j ) = exp ( ) X ([x i ] k [x j ] k ) 2 k=1 [l] k Regression methods V1. Machine Learning for Civil Engineers 71 / 77
86 Covariance functions Matern covariance function ρ(x i, x j ) = exp ( X k=1 ) [x i ] k [x j ] k [l] k Regression methods V1. Machine Learning for Civil Engineers 72 / 77
87 Covariance functions Covariance function and cross-validation How to choose between two covariance function structures? Cross-Validation or Bayesian Model selection Regression methods V1. Machine Learning for Civil Engineers 73 / 77
88 Homo/Heteroscedasticity Homo/Heteroscedasticity Observations Mean CI 9% y (a) Homoscedastic and independent x x (b) Heteroscedastic and independent x (c) Heteroscedastic and dependent a) Standard GPR,GPML b) Tolvanen et al. (214). Expectation propagation for non-stationary heteroscedastic gaussian process regression. IEEE-MLSP, GPStuff c) Tabor et al. (217). Probabilistic modeling of heteroscedastic laboratory experiments using gaussian process regression. Journal of Engineering Mechanics Regression methods V1. Machine Learning for Civil Engineers 74 / 77
89 Other methods Regression methods Linear regression and Gaussian Process regression are not the only approaches available Neural-networks (Deep-Learning) Support vector macine (SVM) Decision tree (Random-forest)... Most of current research carries on Deep-learning Regression methods V1. Machine Learning for Civil Engineers 75 / 77
90 Summary Linear Regression: Quick & easy, yet outperformed by modern methods Gaussian Process Regression: Quick & easy Allows interpolating and extrapolating (uncertainty estimates) l allows quantifying the relevance of each covariate Cross-Validation: Prevents over-fitting The standard method to employ for quantifying prediction accuracy Allows comparing model structures Regression methods V1. Machine Learning for Civil Engineers 76 / 77
91 Gaussian Process Limitations The GPR method covered here only allows modelling deterministic phenomenons More advanced methods are required to handle heteroscedasticity Depends on the selection of proper prior mean and covariance functions P is a point estimate for model hyper-parameters. A full Bayesian approach is required of the amount of data is small. The GP is assumed to be stationary, i.e., same l for any covariate x Regression methods V1. Machine Learning for Civil Engineers 77 / 77
Artificial Intelligence & Sequential Decision Problems
Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet
More informationIntro Utility theory Utility & Loss Functions Value of Information Summary. Decision Theory. (CIV Machine Learning for Civil Engineers)
Decision Theory (CIV654 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 14 Goulet (219). Probabilistic Machine Learning
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationClassification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationAdvanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)
Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,
More informationModular Bayesian uncertainty assessment for Structural Health Monitoring
uncertainty assessment for Structural Health Monitoring Warwick Centre for Predictive Modelling André Jesus a.jesus@warwick.ac.uk June 26, 2017 Thesis advisor: Irwanda Laory & Peter Brommer Structural
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationGaussian Processes for Machine Learning (GPML) Toolbox
Journal of Machine Learning Research 11 (2010) 3011-3015 Submitted 8/10; Revised 9/10; Published 11/10 Gaussian Processes for Machine Learning (GPML) Toolbox Carl Edward Rasmussen Department of Engineering
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationProbabilistic Modeling of Heteroscedastic Laboratory Experiments Using Gaussian Process Regression
Probabilistic Modeling of Heteroscedastic Laboratory Experiments Using Gaussian Process Regression Lucie Tabor 1, James-A. Goulet 1, Jean-Philippe Charron 1, and Clelia Desmettre 1 1 Department of Civil,
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationReliability Monitoring Using Log Gaussian Process Regression
COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationCan you predict the future..?
Can you predict the future..? Gaussian Process Modelling for Forward Prediction Anna Scaife 1 1 Jodrell Bank Centre for Astrophysics University of Manchester @radastrat September 7, 2017 Anna Scaife University
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationMachine Learning Lecture 3
Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationReminders. Thought questions should be submitted on eclass. Please list the section related to the thought question
Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationOutline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren
1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationBayesian Deep Learning
Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference
More informationIntroduction to SVM and RVM
Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance
More informationBayesian Approaches Data Mining Selected Technique
Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationGaussian Processes for Computer Experiments
Gaussian Processes for Computer Experiments Jeremy Oakley School of Mathematics and Statistics, University of Sheffield www.jeremy-oakley.staff.shef.ac.uk 1 / 43 Computer models Computer model represented
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More information