Étude de classes de noyaux adaptées à la simplification et à l interprétation des modèles d approximation.

Size: px
Start display at page:

Download "Étude de classes de noyaux adaptées à la simplification et à l interprétation des modèles d approximation."

Transcription

1 Étude de classes de noyaux adaptées à la simplification et à l interprétation des modèles d approximation. Soutenance de thèse de N. Durrande Ecole des Mines de St-Etienne, 9 november 2011 Directeurs Laurent Carraro Rapporteurs Beatrice Laurent Rodolphe Le Riche Henry Wynn Co-encadrants David Ginsbourger Examinateurs Yves Grandvalet Olivier Roustant Alberto Pasanisi 1/37

2 Introduction to Gaussian Process models Outline: 1 Introduction: The Gaussian process modeling framework issues when d is large: lack of interpretability large number of observation 2 Simplified models for high dimensional modeling 3 Gaussian Process and ANOVA representation 2/37

3 f Introduction to Gaussian Process models Let f : D R be a function which value is known on a limited set of points X = (X 1,..., X n ) This situation may be found in many fields: Engineering Geostatistics Numerical simulator x 3/37

4 Introduction to Gaussian Process models Given a kernel K and its associated centered Gaussian Process Z, one can look at paths satisfying Z ω (X i ) = f (X i ) We can calculate the conditional law of Z knowing the observations x 4/37

5 Introduction to Gaussian Process models We approximate f (x) using the conditional expectation m(x) and the conditional variance v(x) of Z x m(x) = k(x) T K 1 F v(x) = K (x, x) k(x) T K 1 k(x) where k is a functional vector: (k(.)) i = K (X i,.) K is the covariance matrix (K) ij = K (X i, X j ) 5/37

6 Introduction to Gaussian Process models The best predictor m(x) can be seen as a linear combination of the K (X i,.) x x 6/37

7 Introduction to Gaussian Process models The choice of the covariance kernel K has a great impact on the model x Squared exponential Brownian Squared exponential (σ 2, θ) = (1, 0.2) (σ 2, θ) = (1, 0.5) x x How can we choose the most appropriate kernel? 7/37

8 Introduction to Gaussian Process models Definition: A kernel is a symmetric function of positive type: K (x, y) = K (y, x) n N, a 1,..., a n R et x 1,..., x n D, n n a i a j K (x i, x j ) 0. i=1 j=1 The verification of the second point is often intractable. 8/37

9 Introduction to Gaussian Process models However, if K 1, K 2 are kernels and f a real valued function, K 1 K 2, K 1 + K 2 and f (x)k 1 (x, y)f (y) are covariance kernels. Its easy to create new kernels from old! Example: Construction of multidimensional kernels K (x, y) = i K i (x i, y i ) 9/37

10 Limitations for tensor product kernels Most of the time, multidimensional kernels are based on the product of univariate kernels Example the squared exponential kernel over [0, 1] 2 is: K (x, y) = exp( x y 2 ) = exp( (x 1 y 1 ) 2 (x 2 y 2 ) 2 ) = K u (x 1, y 1 ) K u (x 2, y 2 ) With such kernels, it happens that an observation f (X i ) only has an influence on the neighborhood of X i. 10/37

11 Limitations for tensor product kernels For such kernels, the basis function K (X i, ) associated to an observation at X i only has a local influence Conversely to other methods where the basis functions have a global meaning, the kriging models are difficult to evaluate /37 X 1 = (0.25, 0.25)

12 Limitations for tensor product kernels If the neighborhood of an observation is a domain of size ε for d = 1, we then have when the dimension increases d = 1 ε d = 2 d = 3 ε ε This phenomena is known as the curse of dimensionality. the number of observations has to grow exponentially with d 12/37

13 Limitations for tensor product kernels When using usual kernels in high dimension, kriging emulators face 2 issues: They require a large number of observations They are difficult to interpret The aim of this presentation is to get round those issues Outlines: 1 Additive models 2 Simplified models with interaction terms 3 Interpretation of high-dimensional models 13/37

14 Additive kernels A popular approach to get round the curse of dimensionality is to consider simplified models such as additive models [Stone 85]: f (x) m(x) = d m i (x i ) i=1 Example of such models Regression without interaction Generalized additive models [Hastie 90] 14/37

15 Additive kernels In order to obtain additive kriging models, we considered kernels with expression K (x, y) = d K i (x i, y i ) i=1 As we said, such kernels are symmetric and of positive type. We will call such kernels additive kernels. 15/37

16 Y x2 Y x2 Y x2 Additive kernels Examples of sample paths of a GP Y with additive kernel x1 x1 x The paths of Y are additive (up to a modification). 16/37

17 Additive kernels Now, let us consider a GP model with additive kernel As an additive kernel is still a kernel, kriging equations do not change m(x) = k(x) T K 1 F v(x) = K (x, x) k(x) T K 1 k(x) 17/37

18 Interpretability of Additive kriging models When the input space is high-dimensional, usual kriging models cannot easily be interpreted They can be seen as black box. On the other hand, additive kriging models are easily interpretable m(x) = (k 1 (x 1 ) + k 2 (x 2 )) t (K 1 + K 2 ) 1 F = k 1 (x 1 ) t (K 1 + K 2 ) 1 F + k }{{} 2 (x 2 ) t (K 1 + K 2 ) 1 F }{{} m 1 (x 1 ) m 2 (x 2 ) For the sub-model m 1, the covariance matrix K 2 appears as a noise of observation. 18/37

19 Interpretability of Additive kriging models We obtain for the sub-models: m m x x2 m 1 (x 1 ) m 2 (x 2 ) 19/37

20 Additive models and linear budget As previously, if we look at the neighborhood of a point X i, we obtain The observations do not only have a local influence The number of observations can increase linearly with dimension 20/37

21 Additive models and linear budget Let Z p be a centered GP over [0, 1] d with tensor product kernel Z a be a centered GP over [0, 1] d with additive kernel Z = Z a + Z p (supposed to be independent) We compare the predictivity of the approximation of Z ω by An additive kriging model A kriging model based on a tensor product kernel when d increases 21/37

22 Additive models and linear budget For d 1,..., 30, we choose n = 10 d. We compare the percentage of variance explained by the two models: percentage of variance explained additive kernel tensor product kernel dimension θ = /37

23 Additive models and linear budget Those results depend on the value of θ: percentage of variance explained additive kernel tensor product kernel percentage of variance explained additive kernel tensor product kernel percentage of variance explained additive kernel tensor product kernel dimension dimension dimension θ = 0.25 θ = 0.5 θ = 1 With a limited linear budget, simple models can outperform more complex models 23/37

24 Additive models and linear budget We have seen 2 advantages of additive models they are easily interpretable they require a reasonable number of observations However, those models usually are too basic for modeling a real life phenomena How can we increase the complexity of the models? 24/37

25 ANOVA kernels In order to take into account the interactions between variables, one can consider ANOVA kernels [Stitson 97]: K (x, y) = d (1 + K i (x i, y i )) i=1 = 1 + d K i (x i, y i ) + K i (x i, y i )K j (x j, y j ) + + i=1 i<j }{{}}{{} additive part 2 nd order interactions d K i (x i, y i ) i=1 }{{} full interaction 25/37

26 ANOVA kernels A decomposition of the best predictor is naturally associated to those kernels. Example: we have in 2D K = 1 + K 1 + K 2 + K 1 K 2 so the best predictor can be written as m(x) = (1 + k 1 (x 1 ) + k 2 (x 2 ) + k 1 (x 1 )k 2 (x 2 )) t K 1 F = m 0 + m 1 (x 1 ) + m 2 (x 2 ) + m 12 (x) This decomposition looks like the ANOVA representation of m but the m I do not satisfy D i m I (x I )dx i = 0 26/37

27 ANOVA kernels The ANOVA representation is based on a functional decomposition of L 2 : if D = D 1 D d and µ = µ 1 µ d, we have L 2 (D, µ) = d (1 Di + L 2 0 (D i, µ i )) i=1 If we can build a RKHS with the same structure, the ANOVA representation of m can be obtained naturally How to build a RKHS of zero mean function? One example is given in [Wahba 97] 27/37

28 Kernel ANOVA Decomposition Using the RKHS framework, we showed that from any usual 1-dimensional kernel K we could extract a kernel K 0 associated to a RKHS of zero mean functions H 1 = span(r) Let R be the Riesz representant of.dx for.,. H. We define H 0 as R H H 0 28/37

29 Kernel ANOVA Decomposition The expression of R(x) can be obtained easily R(x) = R, K (x,.) H = K (x, s)ds D Brownian Gaussian with theta = 1 Gaussian with theta = 0.25 R(x) R(x) R(x) x x x 29/37

30 Kernel ANOVA Decomposition Finally, we have H = H 1 H 0 with H 1 = span(r) a one dimensional RKHS H 0 a RKHS of zero mean functions Those spaces have kernel: K (x, s)ds K (y, s)ds K 1 (x, y) = K (s, t)dsdt K (x, s)ds K (y, s)ds K 0 (x, y) = K (x, y) K (s, t)dsdt 30/37

31 Kernel ANOVA Decomposition As for the ANOVA representation in L 2, we can build a RKHS H H = K (x, y) = d (1 Di + Hi 0 (x i, y i )) i=1 d (1 + Ki 0 (x i, y i )) i=1 with this space, the ANOVA representation is obtained naturally m(x) = (1 + k 0 1 (x 1) + k 0 2 (x 2) + k 0 1 (x 1)k 0 2 (x 2)) t K 1 F = m 0 + m 1 (x 1 ) + m 2 (x 2 ) + m 12 (x) 31/37

32 Application 1: interpretation Let us consider the random test function f : [0, 1] 10 R : x 10 sin(πx 1 x 2 ) + 20(x 3 0.5) x 4 + 5x 5 + N (0, 1) The steps for approximating f with GPM are: 1 Learn f on a DoE (here LHS maximin with 180 points) 2 get the optimal values ψ for the kernel parameters using MLE, 3 build the kriging predictor ˆf based on K 0 32/37 As ˆf is a function of 10 variables, the model can not easily be represented: it is usually considered as a blackbox. However, the structure of K allows to split m in submodels.

33 Application 1: interpretation The univariate sub-models are: ( ) we had f (x) = 10 sin(πx 1 x 2 ) + 20(x 3 0.5) x 4 + 5x 5 + N (0, 1) 33/37

34 Application 2: computation of Sobol indices Using K, the sensitivity indices S I can be computed analytically: S I = var (m I(X I ) var (m(x)) = F T K 1 ( i I Γ i) K 1 F F T K 1 ( d i=1 (1 n n + Γ i ) 1 n n ) K 1 F where Γ i is the matrix Γ i = D i ki 0 (s i )ki 0 (s i ) T ds i, 1 n n is the matrix of 1 and where is a term wise product. Conversely to other methods, the computation of S I do not require to compute all S J for J I. 34/37

35 Conclusion Additive models Are useful for modeling high dimensional phenomenon Can be used for extracting an additive trend Kernels for sensitivity analyses K correspond to a particular class of ANOVA kernels they allows to obtain efficiently the terms of the ANOVA representation. This is useful for show the first terms for model interpretation compute the sensitivity analysis indices 35/37

36 Conclusion Perspectives: Confidence intervals for the sub-models RKHS orthogonal to other operators than Future work: Models with very high dimensional inputs kernels for taking into account specific features 36/37

37 Conclusion Thank you for your attention 37/37

Gaussian process regression for Sensitivity analysis

Gaussian process regression for Sensitivity analysis Gaussian process regression for Sensitivity analysis GPSS Workshop on UQ, Sheffield, September 2016 Nicolas Durrande, Mines St-Étienne, durrande@emse.fr GPSS workshop on UQ GPs for sensitivity analysis

More information

arxiv: v1 [stat.ml] 27 Nov 2011

arxiv: v1 [stat.ml] 27 Nov 2011 ADDITIVE COVARIANCE KERNELS FOR HIGH-DIMENSIONAL GAUSSIAN PROCESS MODELING by arxiv:1111.6233v1 [stat.ml] 27 Nov 2011 Nicolas Durrande, David Ginsbourger & Olivier Roustant Abstract. Gaussian process models

More information

ANOVA kernels and RKHS of zero mean functions for model-based sensitivity analysis

ANOVA kernels and RKHS of zero mean functions for model-based sensitivity analysis ANOVA kernels and RKHS of zero mean functions for model-based sensitivity analysis Nicolas Durrande, David Ginsbourger, Olivier Roustant, Laurent Carraro To cite this version: Nicolas Durrande, David Ginsbourger,

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Optimal Designs for Gaussian Process Models via Spectral Decomposition. Ofir Harari

Optimal Designs for Gaussian Process Models via Spectral Decomposition. Ofir Harari Optimal Designs for Gaussian Process Models via Spectral Decomposition Ofir Harari Department of Statistics & Actuarial Sciences, Simon Fraser University September 2014 Dynamic Computer Experiments, 2014

More information

Bootstrap & Confidence/Prediction intervals

Bootstrap & Confidence/Prediction intervals Bootstrap & Confidence/Prediction intervals Olivier Roustant Mines Saint-Étienne 2017/11 Olivier Roustant (EMSE) Bootstrap & Confidence/Prediction intervals 2017/11 1 / 9 Framework Consider a model with

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models

More information

Design of experiments for smoke depollution of diesel engine outputs

Design of experiments for smoke depollution of diesel engine outputs ControlledCO 2 Diversifiedfuels Fuel-efficientvehicles Cleanrefining Extendedreserves Design of experiments for smoke depollution of diesel engine outputs M. CANAUD (1), F. WAHL (1), C. HELBERT (2), L.

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Introduction to emulators - the what, the when, the why

Introduction to emulators - the what, the when, the why School of Earth and Environment INSTITUTE FOR CLIMATE & ATMOSPHERIC SCIENCE Introduction to emulators - the what, the when, the why Dr Lindsay Lee 1 What is a simulator? A simulator is a computer code

More information

Multi-fidelity sensitivity analysis

Multi-fidelity sensitivity analysis Multi-fidelity sensitivity analysis Loic Le Gratiet 1,2, Claire Cannamela 2 & Bertrand Iooss 3 1 Université Denis-Diderot, Paris, France 2 CEA, DAM, DIF, F-91297 Arpajon, France 3 EDF R&D, 6 quai Watier,

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Kernels for Automatic Pattern Discovery and Extrapolation

Kernels for Automatic Pattern Discovery and Extrapolation Kernels for Automatic Pattern Discovery and Extrapolation Andrew Gordon Wilson agw38@cam.ac.uk mlg.eng.cam.ac.uk/andrew University of Cambridge Joint work with Ryan Adams (Harvard) 1 / 21 Pattern Recognition

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

Kernel methods and the exponential family

Kernel methods and the exponential family Kernel methods and the exponential family Stéphane Canu 1 and Alex J. Smola 2 1- PSI - FRE CNRS 2645 INSA de Rouen, France St Etienne du Rouvray, France Stephane.Canu@insa-rouen.fr 2- Statistical Machine

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Polynomial chaos expansions for sensitivity analysis

Polynomial chaos expansions for sensitivity analysis c DEPARTMENT OF CIVIL, ENVIRONMENTAL AND GEOMATIC ENGINEERING CHAIR OF RISK, SAFETY & UNCERTAINTY QUANTIFICATION Polynomial chaos expansions for sensitivity analysis B. Sudret Chair of Risk, Safety & Uncertainty

More information

Basis Expansion and Nonlinear SVM. Kai Yu

Basis Expansion and Nonlinear SVM. Kai Yu Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation

Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation Curtis B. Storlie a a Los Alamos National Laboratory E-mail:storlie@lanl.gov Outline Reduction of Emulator

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Gaussian Process Regression Networks

Gaussian Process Regression Networks Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh

More information

Why experimenters should not randomize, and what they should do instead

Why experimenters should not randomize, and what they should do instead Why experimenters should not randomize, and what they should do instead Maximilian Kasy Department of Economics, Harvard University Maximilian Kasy (Harvard) Experimental design 1 / 42 project STAR Introduction

More information

Tilburg University. Efficient Global Optimization for Black-Box Simulation via Sequential Intrinsic Kriging Mehdad, Ehsan; Kleijnen, Jack

Tilburg University. Efficient Global Optimization for Black-Box Simulation via Sequential Intrinsic Kriging Mehdad, Ehsan; Kleijnen, Jack Tilburg University Efficient Global Optimization for Black-Box Simulation via Sequential Intrinsic Kriging Mehdad, Ehsan; Kleijnen, Jack Document version: Early version, also known as pre-print Publication

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

Lasso, Ridge, and Elastic Net

Lasso, Ridge, and Elastic Net Lasso, Ridge, and Elastic Net David Rosenberg New York University February 7, 2017 David Rosenberg (New York University) DS-GA 1003 February 7, 2017 1 / 29 Linearly Dependent Features Linearly Dependent

More information

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Anton Rodomanov Higher School of Economics, Russia Bayesian methods research group (http://bayesgroup.ru) 14 March

More information

Kernel methods and the exponential family

Kernel methods and the exponential family Kernel methods and the exponential family Stephane Canu a Alex Smola b a 1-PSI-FRE CNRS 645, INSA de Rouen, France, St Etienne du Rouvray, France b Statistical Machine Learning Program, National ICT Australia

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Multivariate Random Variable

Multivariate Random Variable Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Statistical Models and Methods for Computer Experiments

Statistical Models and Methods for Computer Experiments Statistical Models and Methods for Computer Experiments Olivier ROUSTANT Ecole des Mines de St-Etienne Habilitation à Diriger des Recherches 8 th November 2011 Outline Foreword 1. Computer Experiments:

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Derivative-based global sensitivity measures for interactions

Derivative-based global sensitivity measures for interactions Derivative-based global sensitivity measures for interactions Olivier Roustant École Nationale Supérieure des Mines de Saint-Étienne Joint work with J. Fruth, B. Iooss and S. Kuhnt SAMO 13 Olivier Roustant

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

The Kernel Trick. Robert M. Haralick. Computer Science, Graduate Center City University of New York

The Kernel Trick. Robert M. Haralick. Computer Science, Graduate Center City University of New York The Kernel Trick Robert M. Haralick Computer Science, Graduate Center City University of New York Outline SVM Classification < (x 1, c 1 ),..., (x Z, c Z ) > is the training data c 1,..., c Z { 1, 1} specifies

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Sequential approaches to reliability estimation and optimization based on kriging

Sequential approaches to reliability estimation and optimization based on kriging Sequential approaches to reliability estimation and optimization based on kriging Rodolphe Le Riche1,2 and Olivier Roustant2 1 CNRS ; 2 Ecole des Mines de Saint-Etienne JSO 2012, ONERA Palaiseau 1 Intro

More information

Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification

Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification François Bachoc Josselin Garnier Jean-Marc Martinez CEA-Saclay, DEN, DM2S, STMF,

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Machine Learning Srihari. Gaussian Processes. Sargur Srihari

Machine Learning Srihari. Gaussian Processes. Sargur Srihari Gaussian Processes Sargur Srihari 1 Topics in Gaussian Processes 1. Examples of use of GP 2. Duality: From Basis Functions to Kernel Functions 3. GP Definition and Intuition 4. Linear regression revisited

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

Lasso, Ridge, and Elastic Net

Lasso, Ridge, and Elastic Net Lasso, Ridge, and Elastic Net David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 14 A Very Simple Model Suppose we have one feature

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,

More information

Geostatistical Modeling for Large Data Sets: Low-rank methods

Geostatistical Modeling for Large Data Sets: Low-rank methods Geostatistical Modeling for Large Data Sets: Low-rank methods Whitney Huang, Kelly-Ann Dixon Hamil, and Zizhuang Wu Department of Statistics Purdue University February 22, 2016 Outline Motivation Low-rank

More information

Mustafa H. Tongarlak Bruce E. Ankenman Barry L. Nelson

Mustafa H. Tongarlak Bruce E. Ankenman Barry L. Nelson Proceedings of the 0 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelspach, K. P. White, and M. Fu, eds. RELATIVE ERROR STOCHASTIC KRIGING Mustafa H. Tongarlak Bruce E. Ankenman Barry L.

More information

Additive Gaussian Processes

Additive Gaussian Processes Additive Gaussian Processes David Duvenaud Department of Engineering Cambridge University dkd3@cam.ac.uk Hannes Nickisch MPI for Intelligent Systems Tübingen, Germany hn@tue.mpg.de Carl Edward Rasmussen

More information

Regularization Methods for Additive Models

Regularization Methods for Additive Models Regularization Methods for Additive Models Marta Avalos, Yves Grandvalet, and Christophe Ambroise HEUDIASYC Laboratory UMR CNRS 6599 Compiègne University of Technology BP 20529 / 60205 Compiègne, France

More information

Stable Process. 2. Multivariate Stable Distributions. July, 2006

Stable Process. 2. Multivariate Stable Distributions. July, 2006 Stable Process 2. Multivariate Stable Distributions July, 2006 1. Stable random vectors. 2. Characteristic functions. 3. Strictly stable and symmetric stable random vectors. 4. Sub-Gaussian random vectors.

More information

Iterative Gaussian Process Regression for Potential Energy Surfaces. Matthew Shelley University of York ISNET-5 Workshop 6th November 2017

Iterative Gaussian Process Regression for Potential Energy Surfaces. Matthew Shelley University of York ISNET-5 Workshop 6th November 2017 Iterative Gaussian Process Regression for Potential Energy Surfaces Matthew Shelley University of York ISNET-5 Workshop 6th November 2017 Outline Motivation: Calculation of potential energy surfaces (PES)

More information

Data-driven Kriging models based on FANOVA-decomposition

Data-driven Kriging models based on FANOVA-decomposition Data-driven Kriging models based on FANOVA-decomposition Thomas Muehlenstaedt, Olivier Roustant, Laurent Carraro, Sonja Kuhnt To cite this version: Thomas Muehlenstaedt, Olivier Roustant, Laurent Carraro,

More information

Unsupervised Regressive Learning in High-dimensional Space

Unsupervised Regressive Learning in High-dimensional Space Unsupervised Regressive Learning in High-dimensional Space University of Kent ATRC Leicester 31st July, 2018 Outline Data Linkage Analysis High dimensionality and variable screening Variable screening

More information

arxiv: v1 [stat.me] 24 May 2010

arxiv: v1 [stat.me] 24 May 2010 The role of the nugget term in the Gaussian process method Andrey Pepelyshev arxiv:1005.4385v1 [stat.me] 24 May 2010 Abstract The maximum likelihood estimate of the correlation parameter of a Gaussian

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Variance Reduction and Ensemble Methods

Variance Reduction and Ensemble Methods Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis

More information

Gaussian Random Variables Why we Care

Gaussian Random Variables Why we Care Gaussian Random Variables Why we Care I Gaussian random variables play a critical role in modeling many random phenomena. I By central limit theorem, Gaussian random variables arise from the superposition

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the

More information

Spline Density Estimation and Inference with Model-Based Penalities

Spline Density Estimation and Inference with Model-Based Penalities Spline Density Estimation and Inference with Model-Based Penalities December 7, 016 Abstract In this paper we propose model-based penalties for smoothing spline density estimation and inference. These

More information

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Karhunen-Loève decomposition of Gaussian measures on Banach spaces

Karhunen-Loève decomposition of Gaussian measures on Banach spaces Karhunen-Loève decomposition of Gaussian measures on Banach spaces Jean-Charles Croix GT APSSE - April 2017, the 13th joint work with Xavier Bay. 1 / 29 Sommaire 1 Preliminaries on Gaussian processes 2

More information

Computer experiments with functional inputs and scalar outputs by a norm-based approach

Computer experiments with functional inputs and scalar outputs by a norm-based approach Computer experiments with functional inputs and scalar outputs by a norm-based approach arxiv:1410.0403v1 [stat.me] 1 Oct 2014 Thomas Muehlenstaedt W. L. Gore & Associates and Jana Fruth Faculty of Statistics,

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Nonparametric regression with martingale increment errors

Nonparametric regression with martingale increment errors S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM Electronic Companion Stochastic Kriging for Simulation Metamodeling

e-companion ONLY AVAILABLE IN ELECTRONIC FORM Electronic Companion Stochastic Kriging for Simulation Metamodeling OPERATIONS RESEARCH doi 10.187/opre.1090.0754ec e-companion ONLY AVAILABLE IN ELECTRONIC FORM informs 009 INFORMS Electronic Companion Stochastic Kriging for Simulation Metamodeling by Bruce Ankenman,

More information

Introduction to Smoothing spline ANOVA models (metamodelling)

Introduction to Smoothing spline ANOVA models (metamodelling) Introduction to Smoothing spline ANOVA models (metamodelling) M. Ratto DYNARE Summer School, Paris, June 215. Joint Research Centre www.jrc.ec.europa.eu Serving society Stimulating innovation Supporting

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Karhunen-Loève decomposition of Gaussian measures on Banach spaces

Karhunen-Loève decomposition of Gaussian measures on Banach spaces Karhunen-Loève decomposition of Gaussian measures on Banach spaces Jean-Charles Croix jean-charles.croix@emse.fr Génie Mathématique et Industriel (GMI) First workshop on Gaussian processes at Saint-Etienne

More information

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter

More information

2018 2019 1 9 sei@mistiu-tokyoacjp http://wwwstattu-tokyoacjp/~sei/lec-jhtml 11 552 3 0 1 2 3 4 5 6 7 13 14 33 4 1 4 4 2 1 1 2 2 1 1 12 13 R?boxplot boxplotstats which does the computation?boxplotstats

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Prediction by conditional simulation

Prediction by conditional simulation Prediction by conditional simulation C. Lantuéjoul MinesParisTech christian.lantuejoul@mines-paristech.fr Introductive example Problem: A submarine cable has to be laid on the see floor between points

More information

Kernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt.

Kernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt. SINGAPORE SHANGHAI Vol TAIPEI - Interdisciplinary Mathematical Sciences 19 Kernel-based Approximation Methods using MATLAB Gregory Fasshauer Illinois Institute of Technology, USA Michael McCourt University

More information

An improved approach to estimate the hyper-parameters of the kriging model for high-dimensional problems through the Partial Least Squares method

An improved approach to estimate the hyper-parameters of the kriging model for high-dimensional problems through the Partial Least Squares method An improved approach to estimate the hyper-parameters of the kriging model for high-dimensional problems through the Partial Least Squares method Mohamed Amine Bouhlel, Nathalie Bartoli, Abdelkader Otsmane,

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

Prediction of double gene knockout measurements

Prediction of double gene knockout measurements Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair

More information

3. Some tools for the analysis of sequential strategies based on a Gaussian process prior

3. Some tools for the analysis of sequential strategies based on a Gaussian process prior 3. Some tools for the analysis of sequential strategies based on a Gaussian process prior E. Vazquez Computer experiments June 21-22, 2010, Paris 21 / 34 Function approximation with a Gaussian prior Aim:

More information

How to build an automatic statistician

How to build an automatic statistician How to build an automatic statistician James Robert Lloyd 1, David Duvenaud 1, Roger Grosse 2, Joshua Tenenbaum 2, Zoubin Ghahramani 1 1: Department of Engineering, University of Cambridge, UK 2: Massachusetts

More information

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( ) Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio (2014-2015) Etienne Tanré - Olivier Faugeras INRIA - Team Tosca October 22nd, 2014 E. Tanré (INRIA - Team Tosca) Mathematical

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information