Learning Mixtures of Truncated Basis Functions from Data
|
|
- Phillip Terry
- 6 years ago
- Views:
Transcription
1 Learning Mixtures of Truncated Basis Functions from Data Helge Langseth, Thomas D. Nielsen, and Antonio Salmerón PGM This work is supported by an Abel grant from Iceland, Liechtenstein, and Norway through the EEA Financial Mechanism (Nils mobility project). Supported and Coordinated by Universidad Complutense de Madrid, by the Spanish Ministry of Science and Innovation through projects TIN-9-C--, and by ERDF (FEDER) funds. Learning MoTBFs from data
2 Background: Approximations Learning MoTBFs from data Background: Approximations
3 Geometry of approximations A quick recall of how of how to do approximations in R n : 5 5 We want to approximate the vector f = (,,5) with A vector along e = (,,). 5 Learning MoTBFs from data Background: Approximations
4 Geometry of approximations A quick recall of how of how to do approximations in R n : 5 5 We want to approximate the vector f = (,,5) with A vector along e = (,,). Best choice is f,e e = (,,). 5 Learning MoTBFs from data Background: Approximations
5 Geometry of approximations A quick recall of how of how to do approximations in R n : 5 5 We want to approximate the vector f = (,,5) with A vector along e = (,,). Best choice is f,e e = (,,). Now, add a vector along e. Best choice is f,e e, independently of the choice made for e. Also, the choice we made for e is still optimal since e e. Best approximation is in general l f,e l e l. 5 Learning MoTBFs from data Background: Approximations
6 Geometry of approximations A quick recall of how of how to do approximations in R n : 5 5 We want to approximate the vector f = (,,5) with A vector along e = (,,). Best choice is f,e e = (,,). Now, add a vector along e. Best choice is f,e e, independently of the choice made for e. Also, the choice we made for e is still optimal since e e. Best approximation is in general l f,e l e l. All of this maps over to approximations of functions! We only need a definition of the inner product and the equivalent to orthonormal basis vectors. 5 Learning MoTBFs from data Background: Approximations
7 Geometry of approximations A quick recall of how of how to do approximations in R n : 5 5 We want to approximate the vector f = (,,5) with A vector along e = (,,). Best choice is f,e e = (,,). Now, add a vector along e. Best choice is f,e e, independently of the choice made for e. Also, the choice we made for e is still optimal since e e. Best approximation is in general l f,e l e l. Inner product for functions For two functions u( ) and v( ) defined on Ω R, we use u,v = Ω u(x)v(x)dx. 5 Learning MoTBFs from data Background: Approximations
8 Generalised Fourier series Definition (Legal set of basis functions) Let Ψ = {ψ i } i= be an indexed set of basis functions. Let Q be the set of all linear combination of functions in Ψ. Ψ is a legal set of basis functions if: ψ is constant; u Q and v Q implies that (u v) Q; For any pair of real numbers s and t, s t, there exists a function ψ i Ψ s.t. ψ i (s) ψ i (t). Legal basis functions {,x,x,x,...} is a legal set of basis functions. {, exp( x), exp(x), exp( x), exp(x),...} is also legal. {, log(x), log(x), log(x),...} is not a legal set of basis functions. Learning MoTBFs from data Background: Approximations
9 Generalised Fourier series Definition (Legal set of basis functions) Let Ψ = {ψ i } i= be an indexed set of basis functions. Let Q be the set of all linear combination of functions in Ψ. Ψ is a legal set of basis functions if: ψ is constant; u Q and v Q implies that (u v) Q; For any pair of real numbers s and t, s t, there exists a function ψ i Ψ s.t. ψ i (s) ψ i (t). Generalized Fourier series Assume Ψ is legal and contains orthonormal basis functions (if not, they can be made orthonormal through a Gram-Schmidt process). Then, the Generalized Fourier Series approximation to a function f is defined as ˆf( ) = l f,ψ l ψ l ( ). Learning MoTBFs from data Background: Approximations
10 Generalised Fourier series Definition (Legal set of basis functions) Let Ψ = {ψ i } i= be an indexed set of basis functions. Let Q be the set of all linear combination of functions in Ψ. Ψ is a legal set of basis functions if: ψ is constant; u Q and v Q implies that (u v) Q; For any pair of real numbers s and t, s t, there exists a function ψ i Ψ s.t. ψ i (s) ψ i (t). Important properties Any function including density functions can be approximated arbitrarily well by this approach. ( f(x) ) ( k Ω l= c iψ l (x) dx f(x) k Ω l= f,ψ l ψ l (x)) dx, so the generalized Fourier series approximation is optimal in L sense. Learning MoTBFs from data Background: Approximations
11 MoTBFs Learning MoTBFs from data MoTBFs
12 The marginal MoTBF potential Definition Let Ψ = {ψ i } i= with ψ i : R R define a legal set of basis functions on Ω R. Then g k : Ω R + is an MoTBF potential at level k wrt. Ψ... if g k (x) = k a i ψ i (x) i= for all x Ω, where a i are real constants;... or there is a partition of Ω into intervals I,...,I m s.t. g k is defined as above on each I j. Special cases An MoTBFs potential at level k = is simply a standard discretisation. MoPs (original definition) and MTEs are also special cases of MoTBFs. Learning MoTBFs from data MoTBFs
13 The marginal MoTBF potential Definition Let Ψ = {ψ i } i= with ψ i : R R define a legal set of basis functions on Ω R. Then g k : Ω R + is an MoTBF potential at level k wrt. Ψ... if g k (x) = k a i ψ i (x) i= for all x Ω, where a i are real constants;... or there is a partition of Ω into intervals I,...,I m s.t. g k is defined as above on each I j. Simplification We do not utilize the option to split the domain into subdomains here. Learning MoTBFs from data MoTBFs
14 Example: Polynomials vs. the Std. Gaussian g =.6 ψ g =.6 ψ + ψ + g 8 =.6 ψ + ψ +.97 ψ...5 ψ 8 Use orthonormal polynomials (shifted & scaled Legendre polynomials). Approximation always integrates to unity. Direct computations give the g k closest in L -norm. Positivity constraint and KL minimisation convex optimization. Learning MoTBFs from data MoTBFs 5
15 Learning Univariate Distributions Learning MoTBFs from data Learning Univariate Distributions 6
16 Relationship between KL and ML Idea for learning MoTBFs from data Generate a kernel density for a (marginal) probability distribution, and use the translation-scheme to approximate it with an MoTBF. Learning MoTBFs from data Learning Univariate Distributions 6
17 Relationship between KL and ML Idea for learning MoTBFs from data Generate a kernel density for a (marginal) probability distribution, and use the translation-scheme to approximate it with an MoTBF. Setup Let f(x) be the density generating {x,...,x N }. Let g k (x θ) = k i= θ i ψ i (x) be an MoTBF of order k. Let h N (x) be a kernel density estimator. Result: KL minimization is likelihood maximization in the limit Let ˆθ N = argmin θ D(h N ( ) g k ( θ)). Then ˆθ N converges to the maximum likelihood estimator of θ as N (given certain regularity conditions). Learning MoTBFs from data Learning Univariate Distributions 6
18 Example: Learning the standard Gaussian Density estimate; 5 samples. Learning MoTBFs from data Learning Univariate Distributions 7
19 Example: Learning the standard Gaussian Density estimate; 5 samples. g : BIC = 9.5. Learning MoTBFs from data Learning Univariate Distributions 7
20 Example: Learning the standard Gaussian Density estimate; 5 samples. g : BIC = 9.5. g : BIC = 8.. Learning MoTBFs from data Learning Univariate Distributions 7
21 Example: Learning the standard Gaussian Density estimate; 5 samples. g : BIC = 9.5. g : BIC = 8.. g : BIC = 76.. Learning MoTBFs from data Learning Univariate Distributions 7
22 Example: Learning the standard Gaussian Density estimate; 5 samples. g : BIC = 9.5. g : BIC = 8.. g : BIC = 76.. g : BIC = Best BIC score. Learning MoTBFs from data Learning Univariate Distributions 7
23 Comparison to State-of-the-art Direct ML optimization At PGM 8/IJAR we presented ML-learning of univariate MTEs: Divides support of function up into intervals. Direct ML optimization inside each interval. Computationally difficult. Summary of results Precision of the new method in terms of log likelihood is comparable to (but slightly poorer than) previous results. Speedup factor from to 5. Fewer parameters chosen by BIC selection criteria. Learning MoTBFs from data Learning Univariate Distributions 8
24 Conditional Distributions Learning MoTBFs from data Conditional Distributions 9
25 Definition of conditional distributions Assume we have x I m, and want to define g (m) k (y x) there. We define conditional MoTBFs to only depend on their conditioning variable(s) through the relevant hypercube, and not the numerical value: g (m) k (y x) = k j= θ(m) j ψ j (y) for x I m. X g (,) (y) g (,) (y) g (,) (y) g (,) (y) g (,) (y) g (,) (y) g (,) (y) g (,) (y) g (,) (y) X Conditioning hypercubes learned by optimizing BIC-score. Learning MoTBFs from data Conditional Distributions 9
26 Results: X N(,), Y {X = x} N(x/,) 5 cases 5 cases 5 cases 5 cases Learning MoTBFs from data Conditional Distributions
27 Concluding Remarks Learning MoTBFs from data Concluding Remarks
28 Summary Conclusions: KL-guided learning is much faster than the current implementations of direct ML optimization. There is however a loss in precision. The KL-guided learning results do not use splitpoints for the head variable. This can be exploited by inference algorithms. Future work: Look for improvements with respect to computational speed and numerical stability of the learning algorithm. Investigate the formal properties of the estimators. Compare our approach to López-Cruz et al. (): Learning mixtures of polynomials from data using B-spline interpolation. Learning MoTBFs from data Concluding Remarks
Mixtures of Truncated Basis Functions
Mixtures of Truncated Basis Functions Helge Langseth, Thomas D. Nielsen, Rafael Rumí, and Antonio Salmerón This work is supported by an Abel grant from Iceland, Liechtenstein, and Norway through the EEA
More informationLearning Mixtures of Truncated Basis Functions from Data
Learning Mixtures of Truncated Basis Functions from Data Helge Langseth Department of Computer and Information Science The Norwegian University of Science and Technology Trondheim (Norway) helgel@idi.ntnu.no
More informationInference in hybrid Bayesian networks with Mixtures of Truncated Basis Functions
Inference in hybrid Bayesian networks with Mixtures of Truncated Basis Functions Helge Langseth Department of Computer and Information Science The Norwegian University of Science and Technology Trondheim
More informationMaximum Likelihood vs. Least Squares for Estimating Mixtures of Truncated Exponentials
Maximum Likelihood vs. Least Squares for Estimating Mixtures of Truncated Exponentials Helge Langseth 1 Thomas D. Nielsen 2 Rafael Rumí 3 Antonio Salmerón 3 1 Department of Computer and Information Science
More informationSome Practical Issues in Inference in Hybrid Bayesian Networks with Deterministic Conditionals
Some Practical Issues in Inference in Hybrid Bayesian Networks with Deterministic Conditionals Prakash P. Shenoy School of Business University of Kansas Lawrence, KS 66045-7601 USA pshenoy@ku.edu Rafael
More informationTractable Inference in Hybrid Bayesian Networks with Deterministic Conditionals using Re-approximations
Tractable Inference in Hybrid Bayesian Networks with Deterministic Conditionals using Re-approximations Rafael Rumí, Antonio Salmerón Department of Statistics and Applied Mathematics University of Almería,
More informationParameter learning in MTE networks using incomplete data
Parameter learning in MTE networks using incomplete data Antonio Fernández Dept. of Statistics and Applied Mathematics University of Almería, Spain afalvarez@ual.es Thomas Dyhre Nielsen Dept. of Computer
More informationParameter Estimation in Mixtures of Truncated Exponentials
Parameter Estimation in Mixtures of Truncated Exponentials Helge Langseth Department of Computer and Information Science The Norwegian University of Science and Technology, Trondheim (Norway) helgel@idi.ntnu.no
More informationFinite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product
Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )
More information3 Orthogonality and Fourier series
3 Orthogonality and Fourier series We now turn to the concept of orthogonality which is a key concept in inner product spaces and Hilbert spaces. We start with some basic definitions. Definition 3.1. Let
More informationPiecewise Linear Approximations of Nonlinear Deterministic Conditionals in Continuous Bayesian Networks
Piecewise Linear Approximations of Nonlinear Deterministic Conditionals in Continuous Bayesian Networks Barry R. Cobb Virginia Military Institute Lexington, Virginia, USA cobbbr@vmi.edu Abstract Prakash
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationTwo Issues in Using Mixtures of Polynomials for Inference in Hybrid Bayesian Networks
Accepted for publication in: International Journal of Approximate Reasoning, 2012, Two Issues in Using Mixtures of Polynomials for Inference in Hybrid Bayesian
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationKernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt.
SINGAPORE SHANGHAI Vol TAIPEI - Interdisciplinary Mathematical Sciences 19 Kernel-based Approximation Methods using MATLAB Gregory Fasshauer Illinois Institute of Technology, USA Michael McCourt University
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More information96 CHAPTER 4. HILBERT SPACES. Spaces of square integrable functions. Take a Cauchy sequence f n in L 2 so that. f n f m 1 (b a) f n f m 2.
96 CHAPTER 4. HILBERT SPACES 4.2 Hilbert Spaces Hilbert Space. An inner product space is called a Hilbert space if it is complete as a normed space. Examples. Spaces of sequences The space l 2 of square
More informationPolynomials. p n (x) = a n x n + a n 1 x n 1 + a 1 x + a 0, where
Polynomials Polynomials Evaluation of polynomials involve only arithmetic operations, which can be done on today s digital computers. We consider polynomials with real coefficients and real variable. p
More informationThere are two things that are particularly nice about the first basis
Orthogonality and the Gram-Schmidt Process In Chapter 4, we spent a great deal of time studying the problem of finding a basis for a vector space We know that a basis for a vector space can potentially
More informationIntroduction to Machine Learning (67577) Lecture 3
Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz
More informationBayesian Interpretations of Regularization
Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S
More informationMath Real Analysis II
Math 4 - Real Analysis II Solutions to Homework due May Recall that a function f is called even if f( x) = f(x) and called odd if f( x) = f(x) for all x. We saw that these classes of functions had a particularly
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationInner products. Theorem (basic properties): Given vectors u, v, w in an inner product space V, and a scalar k, the following properties hold:
Inner products Definition: An inner product on a real vector space V is an operation (function) that assigns to each pair of vectors ( u, v) in V a scalar u, v satisfying the following axioms: 1. u, v
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationSolutions: Problem Set 3 Math 201B, Winter 2007
Solutions: Problem Set 3 Math 201B, Winter 2007 Problem 1. Prove that an infinite-dimensional Hilbert space is a separable metric space if and only if it has a countable orthonormal basis. Solution. If
More informationParameter learning in CRF s
Parameter learning in CRF s June 01, 2009 Structured output learning We ish to learn a discriminant (or compatability) function: F : X Y R (1) here X is the space of inputs and Y is the space of outputs.
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationTheory of Positive Definite Kernel and Reproducing Kernel Hilbert Space
Theory of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department
More informationPhysics 331 Introduction to Numerical Techniques in Physics
Physics 331 Introduction to Numerical Techniques in Physics Instructor: Joaquín Drut Lecture 12 Last time: Polynomial interpolation: basics; Lagrange interpolation. Today: Quick review. Formal properties.
More informationCopulas. MOU Lili. December, 2014
Copulas MOU Lili December, 2014 Outline Preliminary Introduction Formal Definition Copula Functions Estimating the Parameters Example Conclusion and Discussion Preliminary MOU Lili SEKE Team 3/30 Probability
More informationRadial Basis Functions I
Radial Basis Functions I Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo November 14, 2008 Today Reformulation of natural cubic spline interpolation Scattered
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationApproximation Theory
Approximation Theory Function approximation is the task of constructing, for a given function, a simpler function so that the difference between the two functions is small and to then provide a quantifiable
More informationReproducing Kernel Hilbert Spaces
9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we
More informationEstimating Unnormalised Models by Score Matching
Estimating Unnormalised Models by Score Matching Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Program 1. Basics
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationSpectral methods for fuzzy structural dynamics: modal vs direct approach
Spectral methods for fuzzy structural dynamics: modal vs direct approach S Adhikari Zienkiewicz Centre for Computational Engineering, College of Engineering, Swansea University, Wales, UK IUTAM Symposium
More informationi x i y i
Department of Mathematics MTL107: Numerical Methods and Computations Exercise Set 8: Approximation-Linear Least Squares Polynomial approximation, Chebyshev Polynomial approximation. 1. Compute the linear
More informationVectors in Function Spaces
Jim Lambers MAT 66 Spring Semester 15-16 Lecture 18 Notes These notes correspond to Section 6.3 in the text. Vectors in Function Spaces We begin with some necessary terminology. A vector space V, also
More informationInference in Hybrid Bayesian Networks with Nonlinear Deterministic Conditionals
KU SCHOOL OF BUSINESS WORKING PAPER NO. 328 Inference in Hybrid Bayesian Networks with Nonlinear Deterministic Conditionals Barry R. Cobb 1 cobbbr@vmi.edu Prakash P. Shenoy 2 pshenoy@ku.edu 1 Department
More information10-701/ Recitation : Kernels
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer
More informationAdvanced Computational Fluid Dynamics AA215A Lecture 2 Approximation Theory. Antony Jameson
Advanced Computational Fluid Dynamics AA5A Lecture Approximation Theory Antony Jameson Winter Quarter, 6, Stanford, CA Last revised on January 7, 6 Contents Approximation Theory. Least Squares Approximation
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationApplied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.
Applied Analysis APPM 44: Final exam 1:3pm 4:pm, Dec. 14, 29. Closed books. Problem 1: 2p Set I = [, 1]. Prove that there is a continuous function u on I such that 1 ux 1 x sin ut 2 dt = cosx, x I. Define
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationBayes spaces: use of improper priors and distances between densities
Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de
More informationOrthogonality of hat functions in Sobolev spaces
1 Orthogonality of hat functions in Sobolev spaces Ulrich Reif Technische Universität Darmstadt A Strobl, September 18, 27 2 3 Outline: Recap: quasi interpolation Recap: orthogonality of uniform B-splines
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationKarhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques
Institut für Numerische Mathematik und Optimierung Karhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques Oliver Ernst Computational Methods with Applications Harrachov, CR,
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationMTH 309Y 37. Inner product spaces. = a 1 b 1 + a 2 b a n b n
MTH 39Y 37. Inner product spaces Recall: ) The dot product in R n : a. a n b. b n = a b + a 2 b 2 +...a n b n 2) Properties of the dot product: a) u v = v u b) (u + v) w = u w + v w c) (cu) v = c(u v)
More informationThe Minimum Message Length Principle for Inductive Inference
The Principle for Inductive Inference Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne University of Helsinki, August 25,
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationPART II : Least-Squares Approximation
PART II : Least-Squares Approximation Basic theory Let U be an inner product space. Let V be a subspace of U. For any g U, we look for a least-squares approximation of g in the subspace V min f V f g 2,
More informationExercise 11. Isao Sasano
Exercise Isao Sasano Exercise Calculate the value of the following series by using the Parseval s equality for the Fourier series of f(x) x on the range [, π] following the steps ()-(5). () Calculate the
More informationProbabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationFourier Series. ,..., e ixn ). Conversely, each 2π-periodic function φ : R n C induces a unique φ : T n C for which φ(e ix 1
Fourier Series Let {e j : 1 j n} be the standard basis in R n. We say f : R n C is π-periodic in each variable if f(x + πe j ) = f(x) x R n, 1 j n. We can identify π-periodic functions with functions on
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationINFERENCE IN HYBRID BAYESIAN NETWORKS
INFERENCE IN HYBRID BAYESIAN NETWORKS Helge Langseth, a Thomas D. Nielsen, b Rafael Rumí, c and Antonio Salmerón c a Department of Information and Computer Sciences, Norwegian University of Science and
More informationMATH 590: Meshfree Methods
MATH 590: Meshfree Methods Chapter 2 Part 3: Native Space for Positive Definite Kernels Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2014 fasshauer@iit.edu MATH
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationUsing Multiple Kernel-based Regularization for Linear System Identification
Using Multiple Kernel-based Regularization for Linear System Identification What are the Structure Issues in System Identification? with coworkers; see last slide Reglerteknik, ISY, Linköpings Universitet
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationEfficient Solvers for Stochastic Finite Element Saddle Point Problems
Efficient Solvers for Stochastic Finite Element Saddle Point Problems Catherine E. Powell c.powell@manchester.ac.uk School of Mathematics University of Manchester, UK Efficient Solvers for Stochastic Finite
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationNon-Intrusive Solution of Stochastic and Parametric Equations
Non-Intrusive Solution of Stochastic and Parametric Equations Hermann G. Matthies a Loïc Giraldi b, Alexander Litvinenko c, Dishi Liu d, and Anthony Nouy b a,, Brunswick, Germany b École Centrale de Nantes,
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More information5. Orthogonal matrices
L Vandenberghe EE133A (Spring 2017) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationMixture Distributions for Modeling Lead Time Demand in Coordinated Supply Chains. Barry Cobb. Alan Johnson
Mixture Distributions for Modeling Lead Time Demand in Coordinated Supply Chains Barry Cobb Virginia Military Institute Alan Johnson Air Force Institute of Technology AFCEA Acquisition Research Symposium
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods
More informationExercises * on Linear Algebra
Exercises * on Linear Algebra Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 7 Contents Vector spaces 4. Definition...............................................
More information