Learning Mixtures of Truncated Basis Functions from Data

Size: px
Start display at page:

Download "Learning Mixtures of Truncated Basis Functions from Data"

Transcription

1 Learning Mixtures of Truncated Basis Functions from Data Helge Langseth, Thomas D. Nielsen, and Antonio Salmerón PGM This work is supported by an Abel grant from Iceland, Liechtenstein, and Norway through the EEA Financial Mechanism (Nils mobility project). Supported and Coordinated by Universidad Complutense de Madrid, by the Spanish Ministry of Science and Innovation through projects TIN-9-C--, and by ERDF (FEDER) funds. Learning MoTBFs from data

2 Background: Approximations Learning MoTBFs from data Background: Approximations

3 Geometry of approximations A quick recall of how of how to do approximations in R n : 5 5 We want to approximate the vector f = (,,5) with A vector along e = (,,). 5 Learning MoTBFs from data Background: Approximations

4 Geometry of approximations A quick recall of how of how to do approximations in R n : 5 5 We want to approximate the vector f = (,,5) with A vector along e = (,,). Best choice is f,e e = (,,). 5 Learning MoTBFs from data Background: Approximations

5 Geometry of approximations A quick recall of how of how to do approximations in R n : 5 5 We want to approximate the vector f = (,,5) with A vector along e = (,,). Best choice is f,e e = (,,). Now, add a vector along e. Best choice is f,e e, independently of the choice made for e. Also, the choice we made for e is still optimal since e e. Best approximation is in general l f,e l e l. 5 Learning MoTBFs from data Background: Approximations

6 Geometry of approximations A quick recall of how of how to do approximations in R n : 5 5 We want to approximate the vector f = (,,5) with A vector along e = (,,). Best choice is f,e e = (,,). Now, add a vector along e. Best choice is f,e e, independently of the choice made for e. Also, the choice we made for e is still optimal since e e. Best approximation is in general l f,e l e l. All of this maps over to approximations of functions! We only need a definition of the inner product and the equivalent to orthonormal basis vectors. 5 Learning MoTBFs from data Background: Approximations

7 Geometry of approximations A quick recall of how of how to do approximations in R n : 5 5 We want to approximate the vector f = (,,5) with A vector along e = (,,). Best choice is f,e e = (,,). Now, add a vector along e. Best choice is f,e e, independently of the choice made for e. Also, the choice we made for e is still optimal since e e. Best approximation is in general l f,e l e l. Inner product for functions For two functions u( ) and v( ) defined on Ω R, we use u,v = Ω u(x)v(x)dx. 5 Learning MoTBFs from data Background: Approximations

8 Generalised Fourier series Definition (Legal set of basis functions) Let Ψ = {ψ i } i= be an indexed set of basis functions. Let Q be the set of all linear combination of functions in Ψ. Ψ is a legal set of basis functions if: ψ is constant; u Q and v Q implies that (u v) Q; For any pair of real numbers s and t, s t, there exists a function ψ i Ψ s.t. ψ i (s) ψ i (t). Legal basis functions {,x,x,x,...} is a legal set of basis functions. {, exp( x), exp(x), exp( x), exp(x),...} is also legal. {, log(x), log(x), log(x),...} is not a legal set of basis functions. Learning MoTBFs from data Background: Approximations

9 Generalised Fourier series Definition (Legal set of basis functions) Let Ψ = {ψ i } i= be an indexed set of basis functions. Let Q be the set of all linear combination of functions in Ψ. Ψ is a legal set of basis functions if: ψ is constant; u Q and v Q implies that (u v) Q; For any pair of real numbers s and t, s t, there exists a function ψ i Ψ s.t. ψ i (s) ψ i (t). Generalized Fourier series Assume Ψ is legal and contains orthonormal basis functions (if not, they can be made orthonormal through a Gram-Schmidt process). Then, the Generalized Fourier Series approximation to a function f is defined as ˆf( ) = l f,ψ l ψ l ( ). Learning MoTBFs from data Background: Approximations

10 Generalised Fourier series Definition (Legal set of basis functions) Let Ψ = {ψ i } i= be an indexed set of basis functions. Let Q be the set of all linear combination of functions in Ψ. Ψ is a legal set of basis functions if: ψ is constant; u Q and v Q implies that (u v) Q; For any pair of real numbers s and t, s t, there exists a function ψ i Ψ s.t. ψ i (s) ψ i (t). Important properties Any function including density functions can be approximated arbitrarily well by this approach. ( f(x) ) ( k Ω l= c iψ l (x) dx f(x) k Ω l= f,ψ l ψ l (x)) dx, so the generalized Fourier series approximation is optimal in L sense. Learning MoTBFs from data Background: Approximations

11 MoTBFs Learning MoTBFs from data MoTBFs

12 The marginal MoTBF potential Definition Let Ψ = {ψ i } i= with ψ i : R R define a legal set of basis functions on Ω R. Then g k : Ω R + is an MoTBF potential at level k wrt. Ψ... if g k (x) = k a i ψ i (x) i= for all x Ω, where a i are real constants;... or there is a partition of Ω into intervals I,...,I m s.t. g k is defined as above on each I j. Special cases An MoTBFs potential at level k = is simply a standard discretisation. MoPs (original definition) and MTEs are also special cases of MoTBFs. Learning MoTBFs from data MoTBFs

13 The marginal MoTBF potential Definition Let Ψ = {ψ i } i= with ψ i : R R define a legal set of basis functions on Ω R. Then g k : Ω R + is an MoTBF potential at level k wrt. Ψ... if g k (x) = k a i ψ i (x) i= for all x Ω, where a i are real constants;... or there is a partition of Ω into intervals I,...,I m s.t. g k is defined as above on each I j. Simplification We do not utilize the option to split the domain into subdomains here. Learning MoTBFs from data MoTBFs

14 Example: Polynomials vs. the Std. Gaussian g =.6 ψ g =.6 ψ + ψ + g 8 =.6 ψ + ψ +.97 ψ...5 ψ 8 Use orthonormal polynomials (shifted & scaled Legendre polynomials). Approximation always integrates to unity. Direct computations give the g k closest in L -norm. Positivity constraint and KL minimisation convex optimization. Learning MoTBFs from data MoTBFs 5

15 Learning Univariate Distributions Learning MoTBFs from data Learning Univariate Distributions 6

16 Relationship between KL and ML Idea for learning MoTBFs from data Generate a kernel density for a (marginal) probability distribution, and use the translation-scheme to approximate it with an MoTBF. Learning MoTBFs from data Learning Univariate Distributions 6

17 Relationship between KL and ML Idea for learning MoTBFs from data Generate a kernel density for a (marginal) probability distribution, and use the translation-scheme to approximate it with an MoTBF. Setup Let f(x) be the density generating {x,...,x N }. Let g k (x θ) = k i= θ i ψ i (x) be an MoTBF of order k. Let h N (x) be a kernel density estimator. Result: KL minimization is likelihood maximization in the limit Let ˆθ N = argmin θ D(h N ( ) g k ( θ)). Then ˆθ N converges to the maximum likelihood estimator of θ as N (given certain regularity conditions). Learning MoTBFs from data Learning Univariate Distributions 6

18 Example: Learning the standard Gaussian Density estimate; 5 samples. Learning MoTBFs from data Learning Univariate Distributions 7

19 Example: Learning the standard Gaussian Density estimate; 5 samples. g : BIC = 9.5. Learning MoTBFs from data Learning Univariate Distributions 7

20 Example: Learning the standard Gaussian Density estimate; 5 samples. g : BIC = 9.5. g : BIC = 8.. Learning MoTBFs from data Learning Univariate Distributions 7

21 Example: Learning the standard Gaussian Density estimate; 5 samples. g : BIC = 9.5. g : BIC = 8.. g : BIC = 76.. Learning MoTBFs from data Learning Univariate Distributions 7

22 Example: Learning the standard Gaussian Density estimate; 5 samples. g : BIC = 9.5. g : BIC = 8.. g : BIC = 76.. g : BIC = Best BIC score. Learning MoTBFs from data Learning Univariate Distributions 7

23 Comparison to State-of-the-art Direct ML optimization At PGM 8/IJAR we presented ML-learning of univariate MTEs: Divides support of function up into intervals. Direct ML optimization inside each interval. Computationally difficult. Summary of results Precision of the new method in terms of log likelihood is comparable to (but slightly poorer than) previous results. Speedup factor from to 5. Fewer parameters chosen by BIC selection criteria. Learning MoTBFs from data Learning Univariate Distributions 8

24 Conditional Distributions Learning MoTBFs from data Conditional Distributions 9

25 Definition of conditional distributions Assume we have x I m, and want to define g (m) k (y x) there. We define conditional MoTBFs to only depend on their conditioning variable(s) through the relevant hypercube, and not the numerical value: g (m) k (y x) = k j= θ(m) j ψ j (y) for x I m. X g (,) (y) g (,) (y) g (,) (y) g (,) (y) g (,) (y) g (,) (y) g (,) (y) g (,) (y) g (,) (y) X Conditioning hypercubes learned by optimizing BIC-score. Learning MoTBFs from data Conditional Distributions 9

26 Results: X N(,), Y {X = x} N(x/,) 5 cases 5 cases 5 cases 5 cases Learning MoTBFs from data Conditional Distributions

27 Concluding Remarks Learning MoTBFs from data Concluding Remarks

28 Summary Conclusions: KL-guided learning is much faster than the current implementations of direct ML optimization. There is however a loss in precision. The KL-guided learning results do not use splitpoints for the head variable. This can be exploited by inference algorithms. Future work: Look for improvements with respect to computational speed and numerical stability of the learning algorithm. Investigate the formal properties of the estimators. Compare our approach to López-Cruz et al. (): Learning mixtures of polynomials from data using B-spline interpolation. Learning MoTBFs from data Concluding Remarks

Mixtures of Truncated Basis Functions

Mixtures of Truncated Basis Functions Mixtures of Truncated Basis Functions Helge Langseth, Thomas D. Nielsen, Rafael Rumí, and Antonio Salmerón This work is supported by an Abel grant from Iceland, Liechtenstein, and Norway through the EEA

More information

Learning Mixtures of Truncated Basis Functions from Data

Learning Mixtures of Truncated Basis Functions from Data Learning Mixtures of Truncated Basis Functions from Data Helge Langseth Department of Computer and Information Science The Norwegian University of Science and Technology Trondheim (Norway) helgel@idi.ntnu.no

More information

Inference in hybrid Bayesian networks with Mixtures of Truncated Basis Functions

Inference in hybrid Bayesian networks with Mixtures of Truncated Basis Functions Inference in hybrid Bayesian networks with Mixtures of Truncated Basis Functions Helge Langseth Department of Computer and Information Science The Norwegian University of Science and Technology Trondheim

More information

Maximum Likelihood vs. Least Squares for Estimating Mixtures of Truncated Exponentials

Maximum Likelihood vs. Least Squares for Estimating Mixtures of Truncated Exponentials Maximum Likelihood vs. Least Squares for Estimating Mixtures of Truncated Exponentials Helge Langseth 1 Thomas D. Nielsen 2 Rafael Rumí 3 Antonio Salmerón 3 1 Department of Computer and Information Science

More information

Some Practical Issues in Inference in Hybrid Bayesian Networks with Deterministic Conditionals

Some Practical Issues in Inference in Hybrid Bayesian Networks with Deterministic Conditionals Some Practical Issues in Inference in Hybrid Bayesian Networks with Deterministic Conditionals Prakash P. Shenoy School of Business University of Kansas Lawrence, KS 66045-7601 USA pshenoy@ku.edu Rafael

More information

Tractable Inference in Hybrid Bayesian Networks with Deterministic Conditionals using Re-approximations

Tractable Inference in Hybrid Bayesian Networks with Deterministic Conditionals using Re-approximations Tractable Inference in Hybrid Bayesian Networks with Deterministic Conditionals using Re-approximations Rafael Rumí, Antonio Salmerón Department of Statistics and Applied Mathematics University of Almería,

More information

Parameter learning in MTE networks using incomplete data

Parameter learning in MTE networks using incomplete data Parameter learning in MTE networks using incomplete data Antonio Fernández Dept. of Statistics and Applied Mathematics University of Almería, Spain afalvarez@ual.es Thomas Dyhre Nielsen Dept. of Computer

More information

Parameter Estimation in Mixtures of Truncated Exponentials

Parameter Estimation in Mixtures of Truncated Exponentials Parameter Estimation in Mixtures of Truncated Exponentials Helge Langseth Department of Computer and Information Science The Norwegian University of Science and Technology, Trondheim (Norway) helgel@idi.ntnu.no

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

3 Orthogonality and Fourier series

3 Orthogonality and Fourier series 3 Orthogonality and Fourier series We now turn to the concept of orthogonality which is a key concept in inner product spaces and Hilbert spaces. We start with some basic definitions. Definition 3.1. Let

More information

Piecewise Linear Approximations of Nonlinear Deterministic Conditionals in Continuous Bayesian Networks

Piecewise Linear Approximations of Nonlinear Deterministic Conditionals in Continuous Bayesian Networks Piecewise Linear Approximations of Nonlinear Deterministic Conditionals in Continuous Bayesian Networks Barry R. Cobb Virginia Military Institute Lexington, Virginia, USA cobbbr@vmi.edu Abstract Prakash

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Two Issues in Using Mixtures of Polynomials for Inference in Hybrid Bayesian Networks

Two Issues in Using Mixtures of Polynomials for Inference in Hybrid Bayesian Networks Accepted for publication in: International Journal of Approximate Reasoning, 2012, Two Issues in Using Mixtures of Polynomials for Inference in Hybrid Bayesian

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Kernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt.

Kernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt. SINGAPORE SHANGHAI Vol TAIPEI - Interdisciplinary Mathematical Sciences 19 Kernel-based Approximation Methods using MATLAB Gregory Fasshauer Illinois Institute of Technology, USA Michael McCourt University

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

96 CHAPTER 4. HILBERT SPACES. Spaces of square integrable functions. Take a Cauchy sequence f n in L 2 so that. f n f m 1 (b a) f n f m 2.

96 CHAPTER 4. HILBERT SPACES. Spaces of square integrable functions. Take a Cauchy sequence f n in L 2 so that. f n f m 1 (b a) f n f m 2. 96 CHAPTER 4. HILBERT SPACES 4.2 Hilbert Spaces Hilbert Space. An inner product space is called a Hilbert space if it is complete as a normed space. Examples. Spaces of sequences The space l 2 of square

More information

Polynomials. p n (x) = a n x n + a n 1 x n 1 + a 1 x + a 0, where

Polynomials. p n (x) = a n x n + a n 1 x n 1 + a 1 x + a 0, where Polynomials Polynomials Evaluation of polynomials involve only arithmetic operations, which can be done on today s digital computers. We consider polynomials with real coefficients and real variable. p

More information

There are two things that are particularly nice about the first basis

There are two things that are particularly nice about the first basis Orthogonality and the Gram-Schmidt Process In Chapter 4, we spent a great deal of time studying the problem of finding a basis for a vector space We know that a basis for a vector space can potentially

More information

Introduction to Machine Learning (67577) Lecture 3

Introduction to Machine Learning (67577) Lecture 3 Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz

More information

Bayesian Interpretations of Regularization

Bayesian Interpretations of Regularization Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S

More information

Math Real Analysis II

Math Real Analysis II Math 4 - Real Analysis II Solutions to Homework due May Recall that a function f is called even if f( x) = f(x) and called odd if f( x) = f(x) for all x. We saw that these classes of functions had a particularly

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Inner products. Theorem (basic properties): Given vectors u, v, w in an inner product space V, and a scalar k, the following properties hold:

Inner products. Theorem (basic properties): Given vectors u, v, w in an inner product space V, and a scalar k, the following properties hold: Inner products Definition: An inner product on a real vector space V is an operation (function) that assigns to each pair of vectors ( u, v) in V a scalar u, v satisfying the following axioms: 1. u, v

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Solutions: Problem Set 3 Math 201B, Winter 2007

Solutions: Problem Set 3 Math 201B, Winter 2007 Solutions: Problem Set 3 Math 201B, Winter 2007 Problem 1. Prove that an infinite-dimensional Hilbert space is a separable metric space if and only if it has a countable orthonormal basis. Solution. If

More information

Parameter learning in CRF s

Parameter learning in CRF s Parameter learning in CRF s June 01, 2009 Structured output learning We ish to learn a discriminant (or compatability) function: F : X Y R (1) here X is the space of inputs and Y is the space of outputs.

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Theory of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Theory of Positive Definite Kernel and Reproducing Kernel Hilbert Space Theory of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

Physics 331 Introduction to Numerical Techniques in Physics

Physics 331 Introduction to Numerical Techniques in Physics Physics 331 Introduction to Numerical Techniques in Physics Instructor: Joaquín Drut Lecture 12 Last time: Polynomial interpolation: basics; Lagrange interpolation. Today: Quick review. Formal properties.

More information

Copulas. MOU Lili. December, 2014

Copulas. MOU Lili. December, 2014 Copulas MOU Lili December, 2014 Outline Preliminary Introduction Formal Definition Copula Functions Estimating the Parameters Example Conclusion and Discussion Preliminary MOU Lili SEKE Team 3/30 Probability

More information

Radial Basis Functions I

Radial Basis Functions I Radial Basis Functions I Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo November 14, 2008 Today Reformulation of natural cubic spline interpolation Scattered

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Approximation Theory

Approximation Theory Approximation Theory Function approximation is the task of constructing, for a given function, a simpler function so that the difference between the two functions is small and to then provide a quantifiable

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

Estimating Unnormalised Models by Score Matching

Estimating Unnormalised Models by Score Matching Estimating Unnormalised Models by Score Matching Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Program 1. Basics

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Spectral methods for fuzzy structural dynamics: modal vs direct approach

Spectral methods for fuzzy structural dynamics: modal vs direct approach Spectral methods for fuzzy structural dynamics: modal vs direct approach S Adhikari Zienkiewicz Centre for Computational Engineering, College of Engineering, Swansea University, Wales, UK IUTAM Symposium

More information

i x i y i

i x i y i Department of Mathematics MTL107: Numerical Methods and Computations Exercise Set 8: Approximation-Linear Least Squares Polynomial approximation, Chebyshev Polynomial approximation. 1. Compute the linear

More information

Vectors in Function Spaces

Vectors in Function Spaces Jim Lambers MAT 66 Spring Semester 15-16 Lecture 18 Notes These notes correspond to Section 6.3 in the text. Vectors in Function Spaces We begin with some necessary terminology. A vector space V, also

More information

Inference in Hybrid Bayesian Networks with Nonlinear Deterministic Conditionals

Inference in Hybrid Bayesian Networks with Nonlinear Deterministic Conditionals KU SCHOOL OF BUSINESS WORKING PAPER NO. 328 Inference in Hybrid Bayesian Networks with Nonlinear Deterministic Conditionals Barry R. Cobb 1 cobbbr@vmi.edu Prakash P. Shenoy 2 pshenoy@ku.edu 1 Department

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

Advanced Computational Fluid Dynamics AA215A Lecture 2 Approximation Theory. Antony Jameson

Advanced Computational Fluid Dynamics AA215A Lecture 2 Approximation Theory. Antony Jameson Advanced Computational Fluid Dynamics AA5A Lecture Approximation Theory Antony Jameson Winter Quarter, 6, Stanford, CA Last revised on January 7, 6 Contents Approximation Theory. Least Squares Approximation

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books. Applied Analysis APPM 44: Final exam 1:3pm 4:pm, Dec. 14, 29. Closed books. Problem 1: 2p Set I = [, 1]. Prove that there is a continuous function u on I such that 1 ux 1 x sin ut 2 dt = cosx, x I. Define

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Bayes spaces: use of improper priors and distances between densities

Bayes spaces: use of improper priors and distances between densities Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de

More information

Orthogonality of hat functions in Sobolev spaces

Orthogonality of hat functions in Sobolev spaces 1 Orthogonality of hat functions in Sobolev spaces Ulrich Reif Technische Universität Darmstadt A Strobl, September 18, 27 2 3 Outline: Recap: quasi interpolation Recap: orthogonality of uniform B-splines

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Karhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques

Karhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques Institut für Numerische Mathematik und Optimierung Karhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques Oliver Ernst Computational Methods with Applications Harrachov, CR,

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

MTH 309Y 37. Inner product spaces. = a 1 b 1 + a 2 b a n b n

MTH 309Y 37. Inner product spaces. = a 1 b 1 + a 2 b a n b n MTH 39Y 37. Inner product spaces Recall: ) The dot product in R n : a. a n b. b n = a b + a 2 b 2 +...a n b n 2) Properties of the dot product: a) u v = v u b) (u + v) w = u w + v w c) (cu) v = c(u v)

More information

The Minimum Message Length Principle for Inductive Inference

The Minimum Message Length Principle for Inductive Inference The Principle for Inductive Inference Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne University of Helsinki, August 25,

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

PART II : Least-Squares Approximation

PART II : Least-Squares Approximation PART II : Least-Squares Approximation Basic theory Let U be an inner product space. Let V be a subspace of U. For any g U, we look for a least-squares approximation of g in the subspace V min f V f g 2,

More information

Exercise 11. Isao Sasano

Exercise 11. Isao Sasano Exercise Isao Sasano Exercise Calculate the value of the following series by using the Parseval s equality for the Fourier series of f(x) x on the range [, π] following the steps ()-(5). () Calculate the

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Fourier Series. ,..., e ixn ). Conversely, each 2π-periodic function φ : R n C induces a unique φ : T n C for which φ(e ix 1

Fourier Series. ,..., e ixn ). Conversely, each 2π-periodic function φ : R n C induces a unique φ : T n C for which φ(e ix 1 Fourier Series Let {e j : 1 j n} be the standard basis in R n. We say f : R n C is π-periodic in each variable if f(x + πe j ) = f(x) x R n, 1 j n. We can identify π-periodic functions with functions on

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

INFERENCE IN HYBRID BAYESIAN NETWORKS

INFERENCE IN HYBRID BAYESIAN NETWORKS INFERENCE IN HYBRID BAYESIAN NETWORKS Helge Langseth, a Thomas D. Nielsen, b Rafael Rumí, c and Antonio Salmerón c a Department of Information and Computer Sciences, Norwegian University of Science and

More information

MATH 590: Meshfree Methods

MATH 590: Meshfree Methods MATH 590: Meshfree Methods Chapter 2 Part 3: Native Space for Positive Definite Kernels Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2014 fasshauer@iit.edu MATH

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Using Multiple Kernel-based Regularization for Linear System Identification

Using Multiple Kernel-based Regularization for Linear System Identification Using Multiple Kernel-based Regularization for Linear System Identification What are the Structure Issues in System Identification? with coworkers; see last slide Reglerteknik, ISY, Linköpings Universitet

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Efficient Solvers for Stochastic Finite Element Saddle Point Problems

Efficient Solvers for Stochastic Finite Element Saddle Point Problems Efficient Solvers for Stochastic Finite Element Saddle Point Problems Catherine E. Powell c.powell@manchester.ac.uk School of Mathematics University of Manchester, UK Efficient Solvers for Stochastic Finite

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

Non-Intrusive Solution of Stochastic and Parametric Equations

Non-Intrusive Solution of Stochastic and Parametric Equations Non-Intrusive Solution of Stochastic and Parametric Equations Hermann G. Matthies a Loïc Giraldi b, Alexander Litvinenko c, Dishi Liu d, and Anthony Nouy b a,, Brunswick, Germany b École Centrale de Nantes,

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

5. Orthogonal matrices

5. Orthogonal matrices L Vandenberghe EE133A (Spring 2017) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Mixture Distributions for Modeling Lead Time Demand in Coordinated Supply Chains. Barry Cobb. Alan Johnson

Mixture Distributions for Modeling Lead Time Demand in Coordinated Supply Chains. Barry Cobb. Alan Johnson Mixture Distributions for Modeling Lead Time Demand in Coordinated Supply Chains Barry Cobb Virginia Military Institute Alan Johnson Air Force Institute of Technology AFCEA Acquisition Research Symposium

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

Exercises * on Linear Algebra

Exercises * on Linear Algebra Exercises * on Linear Algebra Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 7 Contents Vector spaces 4. Definition...............................................

More information