Building an Automatic Statistician

Size: px
Start display at page:

Download "Building an Automatic Statistician"

Transcription

1 Building an Automatic Statistician Zoubin Ghahramani Department of Engineering University of Cambridge ALT-Discovery Science Conference, October 214 James Robert Lloyd David Duvenaud Roger Grosse Josh Tenenbaum Cambridge Cambridge! Harvard MIT! Toronto MIT

2 THERE IS A GROWING NEED FOR DATA ANALYSIS I We live in an era of abundant data I The McKinsey Global Institute claim I The United States alone faces a shortage of 14, to 19, people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data. I Diverse fields increasingly relying on expert statisticians, machine learning researchers and data scientists e.g. I Computational sciences (e.g. biology, astronomy,... ) I Online advertising I Quantitative finance I... James Robert Lloyd and Zoubin Ghahramani 2 / 44

3 GOALS OF THE AUTOMATIC STATISTICIAN PROJECT I Provide a set of tools for understanding data that require minimal expert input I Uncover challenging research problems in e.g. I Automated inference I Model construction and comparison I Data visualisation and interpretation I Advance the field of machine learning in general James Robert Lloyd and Zoubin Ghahramani 4 / 44

4 Background Probabilistic modelling Model selection and marginal likelihoods Bayesian nonparametrics Gaussian processes

5 Probabilistic Modelling A model describes data that one could observe from a system If we use the mathematics of probability theory to express all forms of uncertainty and noise associated with our model......then inverse probability (i.e. Bayes rule) allows us to infer unknown quantities, adapt our models, make predictions and learn from data.

6 Bayesian Machine Learning Everything follows from two simple rules: Sum rule: P (x) = P y P (x, y) Product rule: P (x, y) =P (x)p (y x) P ( D,m) = P (D,m)P ( m) P (D m) P (D,m) P ( m) P ( D,m) likelihood of parameters in model m prior probability of posterior of given data D Prediction: P (x D,m) = Z P (x, D,m)P ( D,m)d Model Comparison: P (m D) = P (D m) = P (D m)p(m) P (D) Z P (D,m)P( m) d

7 Model Comparison M = M = 1 M = 2 M = M = 4 M = 5 M = 6 M =

8 Bayesian Occam s Razor and Model Comparison Compare model classes, e.g. m and m, using posterior probabilities given D: Z p(d m) p(m) p(m D) =, p(d m) = p(d,m) p( m) d p(d) Interpretations of the Marginal Likelihood ( model evidence ): The probability that randomly selected parameters from the prior would generate D. Probability of the data under the model, averaging over all possible parameter values. 1 log 2 is the number of bits of surprise at observing data D under model m. p(d m) Model classes that are too simple are unlikely to generate the data set. Model classes that are too complex can generate many possible data sets, so again, they are unlikely to generate that particular data set at random. P(D m) too simple "just right" too complex D All possible data sets of size n

9 Bayesian Model Comparison: Occam s Razor at Work M = M = 1 M = 2 M = Model Evidence M = M = M = M = 7 P(Y M) M For example, for quadratic polynomials (m =2): y = a + a 1 x + a 2 x 2 +, where N (, 2 ) and parameters =(a a 1 a 2 ) demo: polybayes

10 Parametric vs Nonparametric Models Parametric models assume some finite set of parameters. Given the parameters, future predictions, x, are independent of the observed data, D: P (x, D) =P (x ) therefore capture everything there is to know about the data. So the complexity of the model is bounded even if the amount of data is unbounded. This makes them not very flexible. Non-parametric models assume that the data distribution cannot be defined in terms of such a finite set of parameters. But they can often be defined by assuming an infinite dimensional. Usually we think of as a function. The amount of information that can capture about the data D can grow as the amount of data grows. This makes them more flexible.

11 Bayesian nonparametrics A simple framework for modelling complex data. Nonparametric models can be viewed as having infinitely many parameters Examples of non-parametric models: Parametric Non-parametric Application polynomial regression Gaussian processes function approx. logistic regression Gaussian process classifiers classification mixture models, k-means Dirichlet process mixtures clustering hidden Markov models infinite HMMs time series factor analysis / ppca / PMF infinite latent factor models feature discovery...

12 Nonlinear regression and Gaussian processes Consider the problem of nonlinear regression: You want to learn a function f with error bars from data D = {X, y} y x A Gaussian process defines a distribution over functions p(f) which can be used for Bayesian regression: p(f D) = p(f)p(d f) p(d) Let f =(f(x 1 ),f(x 2 ),...,f(x n )) be an n-dimensional vector of function values evaluated at n points x i 2 X. Note, f is a random variable. Definition: p(f) is a Gaussian process if for any finite subset {x 1,...,x n } X, the marginal distribution over that subset p(f) is multivariate Gaussian.

13 Apicture Linear Regression Logistic Regression Bayesian Linear Regression Bayesian Logistic Regression Kernel Regression Kernel Classification GP Regression GP Classification Classification Kernel Bayesian

14 Gaussian process covariance functions (kernels) p(f) is a Gaussian process if for any finite subset {x 1,...,x n } X, the marginal distribution over that finite subset p(f) has a multivariate Gaussian distribution. Gaussian processes (GPs) are parameterized by a mean function, µ(x), and a covariance function, or kernel, K(x, x ). p(f(x),f(x )) = N(µ, ) where µ = apple µ(x) µ(x ) = apple K(x, x) K(x, x ) K(x,x) K(x,x ) and similarly for p(f(x 1 ),...,f(x n )) where now µ is an n 1 vector and is an n n matrix.

15 Gaussian process covariance functions Gaussian processes (GPs) are parameterized by a mean function, µ(x), and a covariance function, K(x, x ), where µ(x) = E(f(x)) and K(x, x ) = Cov(f(x),f(x )). An example covariance function: x x K(x, x )=v exp + v 1 + v 2 ij r with parameters (v,v 1,v 2,r, ). These kernel parameters are interpretable and can be learned from data: v v 1 v 2 r signal variance variance of bias noise variance lengthscale roughness Once the mean and covariance functions are defined, everything else about GPs follows from the basic rules of probability applied to mutivariate Gaussians.

16 Samples from GPs with di erent K(x, x ) f(x) 1.5 f(x) f(x).5 f(x) x x x x f(x) f(x) f(x) f(x) x x x x f(x) f(x) f(x) f(x) gpdemogen x x x x

17 Prediction using GPs with di erent K(x, x ) A sample from the prior for each covariance function: f(x) f(x) f(x) x x x Corresponding predictions, mean with two standard deviations: gpdemo

18 The Automatic Statistician

19 WHAT WOULD AN AUTOMATIC STATISTICIAN DO? Language of models Translation Data Search Model Prediction Report Evaluation Checking James Robert Lloyd and Zoubin Ghahramani 3 / 44

20 GOALS OF THE AUTOMATIC STATISTICIAN PROJECT I Provide a set of tools for understanding data that require minimal expert input I Uncover challenging research problems in e.g. I Automated inference I Model construction and comparison I Data visualisation and interpretation I Advance the field of machine learning in general James Robert Lloyd and Zoubin Ghahramani 4 / 44

21 INGREDIENTS OF AN AUTOMATIC STATISTICIAN Language of models Translation Data Search Model Prediction Report Evaluation Checking I An open-ended language of models I Expressive enough to capture real-world phenomena... I... and the techniques used by human statisticians I A search procedure I To efficiently explore the language of models I A principled method of evaluating models I Trading off complexity and fit to data I A procedure to automatically explain the models I Making the assumptions of the models explicit... I... in a way that is intelligible to non-experts James Robert Lloyd and Zoubin Ghahramani 5 / 44

22 PREVIEW: ANENTIRELYAUTOMATICANALYSIS 7 Raw data 7 Full model posterior with extrapolations Four additive components have been identified in the data I A linearly increasing function. I An approximately periodic function with a period of 1. years and with linearly increasing amplitude. I A smooth function. I Uncorrelated noise with linearly increasing standard deviation. James Robert Lloyd and Zoubin Ghahramani 6 / 44

23 DEFINING A LANGUAGE OF MODELS Language of models Translation Data Search Model Prediction Report Evaluation Checking James Robert Lloyd and Zoubin Ghahramani 7 / 44

24 DEFINING A LANGUAGE OF REGRESSION MODELS I Regression consists of learning a function f : X! Y from inputs to outputs from example input / output pairs I Language should include simple parametric forms... I e.g. Linear functions, Polynomials, Exponential functions I... as well as functions specified by high level properties I e.g. Smoothness, Periodicity I Inference should be tractable for all models in language James Robert Lloyd and Zoubin Ghahramani 8 / 44

25 WE CANBUILDREGRESSIONMODELSWITH GAUSSIAN PROCESSES I GPs are distributions over functions such that any finite subset of function evaluations, (f (x 1 ), f (x 2 ),...f (x N )), have a joint Gaussian distribution I A GP is completely specified by I Mean function, µ(x) =E(f (x)) I Covariance / kernel function, k(x, x )=Cov(f (x), f (x )) I Denoted f GP(µ, k) f(x) f(x) f(x) f(x) x GP Posterior Mean GP Posterior Uncertainty GP Posterior Mean GP Posterior Uncertainty Data x GP Posterior Mean GP Posterior Uncertainty Data x GP Posterior Mean GP Posterior Uncertainty Data x James Robert Lloyd and Zoubin Ghahramani 9 / 44

26 A LANGUAGE OF GAUSSIAN PROCESS KERNELS I It is common practice to use a zero mean function since the mean can be marginalised out I Suppose, f (x) a GP(a µ(x), k(x, x )) where a N (, 1) I Then equivalently, f (x) GP(,µ(x)µ(x )+k(x, x )) I We therefore define a language of GP regression models by specifying a language of kernels James Robert Lloyd and Zoubin Ghahramani 1 / 44

27 THE ATOMS OF OUR LANGUAGE Five base kernels Squared exp. (SE) Periodic (PER) Linear (LIN) Constant (C) White noise (WN) Encoding for the following types of functions Smooth functions Periodic functions Linear functions Constant functions Gaussian noise James Robert Lloyd and Zoubin Ghahramani 11 / 44

28 THE COMPOSITION RULES OF OUR LANGUAGE I Two main operations: addition, multiplication LIN LIN quadratic functions SE PER locally periodic LIN + PER periodic plus linear trend SE + PER periodic plus smooth trend James Robert Lloyd and Zoubin Ghahramani 12 / 44

29 MODELING CHANGEPOINTS Assume f 1 (x) GP(, k 1 ) and f 2 (x) GP(, k 2 ). Define: f (x) =(1 (x)) f 1 (x) + (x) f 2 (x) where is a sigmoid function between and 1. Then f GP(, k), where k(x, x )=(1 (x)) k 1 (x, x ) (1 (x )) + (x) k 2 (x, x ) (x ) We define the changepoint operator k = CP(k 1, k 2 ). James Robert Lloyd and Zoubin Ghahramani 13 / 44

30 AN EXPRESSIVELANGUAGEOFMODELS Regression model GP smoothing Linear regression Multiple kernel learning Trend, cyclical, irregular Fourier decomposition Sparse spectrum GPs Spectral mixture Changepoints Heteroscedasticity Kernel SE + WN C + LIN P + WN P SE + WN P SE + PER +WN C + P P cos + WN P cos + WN SE cos + WN e.g. CP(SE, SE)+WN e.g. SE + LIN WN Note: cos is a special case of our version of PER James Robert Lloyd and Zoubin Ghahramani 14 / 44

31 DISCOVERING A GOOD MODEL VIA SEARCH Language of models Translation Data Search Model Prediction Report Evaluation Checking James Robert Lloyd and Zoubin Ghahramani 15 / 44

32 DISCOVERING A GOOD MODEL VIA SEARCH I Language defined as the arbitrary composition of five base kernels (WN, C, LIN, SE, PER) via three operators (+,, CP). I The space spanned by this language is open-ended and can have a high branching factor requiring a judicious search I We propose a greedy search for its simplicity and similarity to human model-building James Robert Lloyd and Zoubin Ghahramani 16 / 44

33 EXAMPLE: MAUNA LOA KEELING CURVE 6 4 RQ Start 2 SE RQ LIN PER SE + RQ... PER + RQ... PER RQ SE + PER + RQ... SE (PER + RQ) James Robert Lloyd and Zoubin Ghahramani 17 / 44

34 EXAMPLE: MAUNA LOA KEELING CURVE 4 ( Per + RQ ) 3 Start 2 1 SE RQ LIN PER SE + RQ... PER + RQ... PER RQ SE + PER + RQ... SE (PER + RQ) James Robert Lloyd and Zoubin Ghahramani 17 / 44

35 EXAMPLE: MAUNA LOA KEELING CURVE 5 SE ( Per + RQ ) 4 Start 3 2 SE RQ LIN PER SE + RQ... PER + RQ... PER RQ SE + PER + RQ... SE (PER + RQ) James Robert Lloyd and Zoubin Ghahramani 17 / 44

36 EXAMPLE: MAUNA LOA KEELING CURVE 6 ( SE + SE ( Per + RQ ) ) 4 Start 2 SE RQ LIN PER SE + RQ... PER + RQ... PER RQ SE + PER + RQ... SE (PER + RQ) James Robert Lloyd and Zoubin Ghahramani 17 / 44

37 MODEL EVALUATION Language of models Translation Data Search Model Prediction Report Evaluation Checking James Robert Lloyd and Zoubin Ghahramani 18 / 44

38 MODEL EVALUATION I After proposing a new model its kernel parameters are optimised by conjugate gradients I We evaluate each optimised model, M, using the model evidence (marginal likelihood) which can be computed analytically for GPs I We penalise the marginal likelihood for the optimised kernel parameters using the Bayesian Information Criterion (BIC):.5 BIC(M) =log p(d M) p 2 log n where p is the number of kernel parameters, D represents the data, and n is the number of data points. James Robert Lloyd and Zoubin Ghahramani 19 / 44

39 AUTOMATIC TRANSLATION OF MODELS Language of models Translation Data Search Model Prediction Report Evaluation Checking James Robert Lloyd and Zoubin Ghahramani 2 / 44

40 AUTOMATIC TRANSLATION OF MODELS I Search can produce arbitrarily complicated models from open-ended language but two main properties allow description to be automated I Kernels can be decomposed into a sum of products I A sum of kernels corresponds to a sum of functions I Therefore, we can describe each product of kernels separately I Each kernel in a product modifies a model in a consistent way I Each kernel roughly corresponds to an adjective James Robert Lloyd and Zoubin Ghahramani 21 / 44

41 SUM OF PRODUCTS NORMAL FORM Suppose the search finds the following kernel SE (WN LIN + CP(C, PER)) James Robert Lloyd and Zoubin Ghahramani 22 / 44

42 SUM OF PRODUCTS NORMAL FORM Suppose the search finds the following kernel SE (WN LIN + CP(C, PER)) The changepoint can be converted into a sum of products SE (WN LIN + C + PER ) James Robert Lloyd and Zoubin Ghahramani 22 / 44

43 SUM OF PRODUCTS NORMAL FORM Suppose the search finds the following kernel SE (WN LIN + CP(C, PER)) The changepoint can be converted into a sum of products SE (WN LIN + C + PER ) Multiplication can be distributed over addition SE WN LIN + SE C + SE PER James Robert Lloyd and Zoubin Ghahramani 22 / 44

44 SUM OF PRODUCTS NORMAL FORM Suppose the search finds the following kernel SE (WN LIN + CP(C, PER)) The changepoint can be converted into a sum of products SE (WN LIN + C + PER ) Multiplication can be distributed over addition SE WN LIN + SE C + SE PER Simplification rules are applied WN LIN + SE + SE PER James Robert Lloyd and Zoubin Ghahramani 22 / 44

45 Posterior of component Posterior of component SUMS OF KERNELS ARE SUMS OF FUNCTIONS If f 1 GP(, k 1 ) and independently f 2 GP(, k 2 ) then e.g. f 1 + f 2 GP(, k 1 + k 2 ) = + + = + + We can therefore describe each component separately James Robert Lloyd and Zoubin Ghahramani 23 / 44

46 PRODUCTS OF KERNELS PER {z} periodic function On their own, each kernel is described by a standard noun phrase James Robert Lloyd and Zoubin Ghahramani 24 / 44

47 PRODUCTS OF KERNELS -SE SE {z} approximately PER {z} periodic function Multiplication by SE removes long range correlations from a model since SE(x, x ) decreases monotonically to as x x increases. James Robert Lloyd and Zoubin Ghahramani 25 / 44

48 PRODUCTS OF KERNELS -LIN SE {z} approximately PER {z} periodic function LIN {z} with linearly growing amplitude Multiplication by LIN is equivalent to multiplying the function being modeled by a linear function. If f (x) GP(, k), then xf(x) GP (, k LIN). This causes the standard deviation of the model to vary linearly without affecting the correlation. James Robert Lloyd and Zoubin Ghahramani 26 / 44

49 PRODUCTS OF KERNELS - CHANGEPOINTS SE {z} approximately PER {z} periodic function {z} LIN {z} with linearly growing amplitude until 17 Multiplication by is equivalent to multiplying the function being modeled by a sigmoid. James Robert Lloyd and Zoubin Ghahramani 27 / 44

50 AUTOMATICALLY GENERATED REPORTS Language of models Translation Data Search Model Prediction Report Evaluation Checking James Robert Lloyd and Zoubin Ghahramani 28 / 44

51 EXAMPLE: AIRLINE PASSENGER VOLUME 7 Raw data 7 Full model posterior with extrapolations Four additive components have been identified in the data I A linearly increasing function. I An approximately periodic function with a period of 1. years and with linearly increasing amplitude. I A smooth function. I Uncorrelated noise with linearly increasing standard deviation. James Robert Lloyd and Zoubin Ghahramani 29 / 44

52 EXAMPLE: AIRLINE PASSENGER VOLUME This component is linearly increasing. 5 Posterior of component 1 7 Sum of components up to component James Robert Lloyd and Zoubin Ghahramani 3 / 44

53 EXAMPLE: AIRLINE PASSENGER VOLUME This component is approximately periodic with a period of 1. years and varying amplitude. Across periods the shape of this function varies very smoothly. The amplitude of the function increases linearly. The shape of this function within each period has a typical lengthscale of 6. weeks. 2 Posterior of component 2 7 Sum of components up to component James Robert Lloyd and Zoubin Ghahramani 31 / 44

54 EXAMPLE: AIRLINE PASSENGER VOLUME This component is a smooth function with a typical lengthscale of 8.1 months. 4 Posterior of component 3 7 Sum of components up to component James Robert Lloyd and Zoubin Ghahramani 32 / 44

55 EXAMPLE: AIRLINE PASSENGER VOLUME This component models uncorrelated noise. The standard deviation of the noise increases linearly. 2 Posterior of component 4 7 Sum of components up to component James Robert Lloyd and Zoubin Ghahramani 33 / 44

56 EXAMPLE: SOLAR IRRADIANCE This component is constant Posterior of component Sum of components up to component James Robert Lloyd and Zoubin Ghahramani 34 / 44

57 EXAMPLE: SOLAR IRRADIANCE This component is constant. This component applies from 1643 until Posterior of component Sum of components up to component James Robert Lloyd and Zoubin Ghahramani 35 / 44

58 EXAMPLE: SOLAR IRRADIANCE This component is a smooth function with a typical lengthscale of 23.1 years. This component applies until 1643 and from 1716 onwards..8 Posterior of component Sum of components up to component James Robert Lloyd and Zoubin Ghahramani 36 / 44

59 EXAMPLE: SOLAR IRRADIANCE This component is approximately periodic with a period of 1.8 years. Across periods the shape of this function varies smoothly with a typical lengthscale of 36.9 years. The shape of this function within each period is very smooth and resembles a sinusoid. This component applies until 1643 and from 1716 onwards..6 Posterior of component Sum of components up to component James Robert Lloyd and Zoubin Ghahramani 37 / 44

60 OTHER EXAMPLES See James Robert Lloyd and Zoubin Ghahramani 38 / 44

61 GOOD PREDICTIVE PERFORMANCE AS WELL Standardised RMSE over 13 data sets Standardised RMSE ABCD accuracy ABCD interpretability Spectral kernels Trend, cyclical irregular Bayesian MKL Eureqa Changepoints Squared Exponential Linear regression I Tweaks can be made to the algorithm to improve accuracy or interpretability of models produced... I... but both methods are highly competitive at extrapolation (shown above) and interpolation James Robert Lloyd and Zoubin Ghahramani 39 / 44

62 MODEL CHECKING AND CRITICISM I Good statistical modelling should include model criticism: I Does the data match the assumptions of the model? I For example, if the model assumed Gaussian noise, does a Q-Q plot reveal non-gaussian residuals? I Our automatic statistician does posterior predictive checks, dependence tests and residual tests I We have also been developing more systematic nonparametric approaches to model criticism using kernel two-sample testing with MMD. Lloyd, J. R., and Ghahramani, Z. (214) Statistical Model Criticism using Kernel Two Sample Tests. James Robert Lloyd and Zoubin Ghahramani 4 / 44

63 CHALLENGES I Interpretability / accuracy I Increasing the expressivity of language I e.g. Monotonocity, positive functions, symmetries I Computational complexity of searching through a huge space of models I Extending the automatic reports to multidimensional datasets I Search and descriptions naturally extend to multiple dimensions, but automatically generating relevant visual summaries harder James Robert Lloyd and Zoubin Ghahramani 41 / 44

64 CURRENT AND FUTURE DIRECTIONS I Automatic statistician for: * One-dimensional time series * Linear regression (classical) I Multivariate nonlinear regression (c.f. Duvenaud, Lloyd et al, ICML 213) I Multivariate classification (w/ Nikola Mrksic) I Completing and interpreting tables and databases (w/ Kee Chong Tan) I Probabilistic programming I Probabilistic models are expressed in a general (Turing complete) programming language I A universal inference engine can then be used to infer unobserved variables given observed data I This can be used to implement seach over the model space in an automatic statistician James Robert Lloyd and Zoubin Ghahramani 42 / 44

65 SUMMARY I We have presented the beginnings of an automatic statistician I Our system I Defines an open-ended language of models I Searches greedily through this space I Produces detailed reports describing patterns in data I Performs automatic model criticism I Extrapolation and interpolation performance highly competitive I We believe this line of research has the potential to make powerful statistical model-building techniques accessible to non-experts James Robert Lloyd and Zoubin Ghahramani 43 / 44

66 REFERENCES Website: Duvenaud, D., Lloyd, J. R., Grosse, R., Tenenbaum, J. B. and Ghahramani, Z. (213) Structure Discovery in Nonparametric Regression through Compositional Kernel Search. ICML 213. Lloyd, J. R., Duvenaud, D., Grosse, R., Tenenbaum, J. B. and Ghahramani, Z. (214) Automatic Construction and Natural-language Description of Nonparametric Regression Models AAAI Lloyd, J. R., and Ghahramani, Z. (214) Statistical Model Criticism using Kernel Two Sample Tests Ghahramani, Z. (213) Bayesian nonparametrics and the probabilistic approach to modelling Philosophical Trans. Royal Society A 371: Looking for postdocs! James Robert Lloyd and Zoubin Ghahramani 44 / 44

How to build an automatic statistician

How to build an automatic statistician How to build an automatic statistician James Robert Lloyd 1, David Duvenaud 1, Roger Grosse 2, Joshua Tenenbaum 2, Zoubin Ghahramani 1 1: Department of Engineering, University of Cambridge, UK 2: Massachusetts

More information

The Automatic Statistician

The Automatic Statistician The Automatic Statistician Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ James Lloyd, David Duvenaud (Cambridge) and

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Kernels for Automatic Pattern Discovery and Extrapolation

Kernels for Automatic Pattern Discovery and Extrapolation Kernels for Automatic Pattern Discovery and Extrapolation Andrew Gordon Wilson agw38@cam.ac.uk mlg.eng.cam.ac.uk/andrew University of Cambridge Joint work with Ryan Adams (Harvard) 1 / 21 Pattern Recognition

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

Automatic Model Construction with Gaussian Processes

Automatic Model Construction with Gaussian Processes Automatic Model Construction with Gaussian Processes David Kristjanson Duvenaud University of Cambridge This dissertation is submitted for the degree of Doctor of Philosophy Pembroke College June 2014

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

Bayesian Hidden Markov Models and Extensions

Bayesian Hidden Markov Models and Extensions Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

CS 7140: Advanced Machine Learning

CS 7140: Advanced Machine Learning Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Probabilistic Models for Learning Data Representations. Andreas Damianou

Probabilistic Models for Learning Data Representations. Andreas Damianou Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA Sistemi di Elaborazione dell Informazione Regressione Ruggero Donida Labati Dipartimento di Informatica via Bramante 65, 26013 Crema (CR), Italy http://homes.di.unimi.it/donida

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Nonparametric Probabilistic Modelling

Nonparametric Probabilistic Modelling Nonparametric Probabilistic Modelling Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Signal processing and inference

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh The problem Learn scalar function of vector values f(x).5.5 f(x) y i.5.2.4.6.8 x f 5 5.5 x x 2.5 We have (possibly

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

System identification and control with (deep) Gaussian processes. Andreas Damianou

System identification and control with (deep) Gaussian processes. Andreas Damianou System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

MTTTS16 Learning from Multiple Sources

MTTTS16 Learning from Multiple Sources MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Modeling human function learning with Gaussian processes

Modeling human function learning with Gaussian processes Modeling human function learning with Gaussian processes Thomas L. Griffiths Christopher G. Lucas Joseph J. Williams Department of Psychology University of California, Berkeley Berkeley, CA 94720-1650

More information

Learning from Data. Amos Storkey, School of Informatics. Semester 1. amos/lfd/

Learning from Data. Amos Storkey, School of Informatics. Semester 1.   amos/lfd/ Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/ Introduction Welcome Administration Online notes Books: See website Assignments Tutorials Exams Acknowledgement: I would like to that David Barber and Chris

More information

Gaussian Processes for Big Data. James Hensman

Gaussian Processes for Big Data. James Hensman Gaussian Processes for Big Data James Hensman Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples Overview Motivation Sparse Gaussian Processes Stochastic Variational

More information

Disease mapping with Gaussian processes

Disease mapping with Gaussian processes EUROHEIS2 Kuopio, Finland 17-18 August 2010 Aki Vehtari (former Helsinki University of Technology) Department of Biomedical Engineering and Computational Science (BECS) Acknowledgments Researchers - Jarno

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

More information

Optimization of Gaussian Process Hyperparameters using Rprop

Optimization of Gaussian Process Hyperparameters using Rprop Optimization of Gaussian Process Hyperparameters using Rprop Manuel Blum and Martin Riedmiller University of Freiburg - Department of Computer Science Freiburg, Germany Abstract. Gaussian processes are

More information

A Brief Overview of Nonparametric Bayesian Models

A Brief Overview of Nonparametric Bayesian Models A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Bayesian Model Selection Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Machine Learning Srihari. Gaussian Processes. Sargur Srihari

Machine Learning Srihari. Gaussian Processes. Sargur Srihari Gaussian Processes Sargur Srihari 1 Topics in Gaussian Processes 1. Examples of use of GP 2. Duality: From Basis Functions to Kernel Functions 3. GP Definition and Intuition 4. Linear regression revisited

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks Creighton Heaukulani and Zoubin Ghahramani University of Cambridge TU Denmark, June 2013 1 A Network Dynamic network data

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

Relevance Vector Machines for Earthquake Response Spectra

Relevance Vector Machines for Earthquake Response Spectra 2012 2011 American American Transactions Transactions on on Engineering Engineering & Applied Applied Sciences Sciences. American Transactions on Engineering & Applied Sciences http://tuengr.com/ateas

More information

Why experimenters should not randomize, and what they should do instead

Why experimenters should not randomize, and what they should do instead Why experimenters should not randomize, and what they should do instead Maximilian Kasy Department of Economics, Harvard University Maximilian Kasy (Harvard) Experimental design 1 / 42 project STAR Introduction

More information

Joint Emotion Analysis via Multi-task Gaussian Processes

Joint Emotion Analysis via Multi-task Gaussian Processes Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 1 Introduction 2 Multi-task Gaussian Process Regression 3 Experiments and Discussion 4 Conclusions

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E483 Kernel Methods in Machine Learning Lecture : Gaussian processes Markus Heinonen 3. November, 26 Today s topic Gaussian processes: Kernel as a covariance function C. E. Rasmussen & C. K. I. Williams,

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information