Lena Leitenmaier & Parikshit Upadhyaya KTH - Royal Institute of Technology

Size: px
Start display at page:

Download "Lena Leitenmaier & Parikshit Upadhyaya KTH - Royal Institute of Technology"

Transcription

1 Data mining with sparse grids Lena Leitenmaier & Parikshit Upadhyaya KTH - Royal Institute of Technology

2 Classification problem Given: training data d-dimensional data set (feature space); typically relatively big d danger of curse of dimensionality class label, e.g. { 1, 1}, for each data point Goal: based on the training data, construct a classifier that can predict the class of any given new data point Strategies to solve the problem: neural networks, support vector machines, regularization networks,... Approach here: regularization network based, minimization on sparse grids Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 2/31

3 More mathematical problem description Given: training set S = {(x i, y i ) R d R} M i=1 Assume data were obtained by sampling of unknown function f V defined over R d, sampling process was disturbed by noise. Goal: recover f from the data as good as possible. To make problem well-posed add additional smoothness constraint Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 3/31

4 More mathematical problem description Variational problem with R(f ) = 1 M min R(f ) f V M C(f (x i ), y i ) + λφ(f ) i=1 C(, ): cost function measuring interpolation error Φ(f ): smoothness functional λ: regularization parameter, balances the two terms. Example: C(x, y) = (x y) 2, Φ(f ) = f 2 2. λ has to be chosen appropriately, e.g. according to cross-validation Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 4/31

5 Discretization Restriction to finite dimensional subspace V N V : N f N = α j φ j (x), j=1 {αj } N j=1 : degrees of freedom {Φj } N j=1 : a basis for V N. Choose C(f N (x i ), y i ) = (f N (x i ) y i ) 2, Φ(f N ) = Pf N 2 L 2. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 5/31

6 Linear system Minimizing R(f ) = 1 M M (f (x i ), y i ) 2 + λ Pf N 2 L 2, i=1 1 M for f N V N is equivalent to solving N M N α j φ j (x i ) φ k (x i )+λ α j (Pφ j, Pφ k ) L2 = 1 j=1 i=1 j=1 M M y i φ k (x i ) i=1 In matrix form (B B T + λc)α = By, C R N N, B R N M Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 6/31

7 Direct FE approximation Naive approach: Direct application of finite element method, solving the arising linear system; Assume an equidistant grid Ω n with mesh size h n = 2 n for each coordinate direction Number of grid points would be of the order ) = O(2 nd ) not feasible! O(h d n Matrix sizes: C is (2 n + 1) d (2 n + 1) d, B is (2 n + 1) d M Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 7/31

8 Sparse grids V n = span{φ n,j, j t = 0,..., 2 n t = 1,..., d} l = (l 1,..., l d ), j = (j 1,..., j d ), j t = 0,..., 2 lt d φ l,j = φ lt,jt (x t ), where φ lt,jt (x t ) is the scaled and t=1 translated hat function. V l = span{φ l,j, j t = 0,..., 2 lt, t = 1,..., d} Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 8/31

9 Sparse grids V n = span{φ n,j, j t = 0,..., 2 n t = 1,..., d} l = (l 1,..., l d ), j = (j 1,..., j d ), j t = 0,..., 2 lt d φ l,j = φ lt,jt (x t ), where φ lt,jt (x t ) is the scaled and t=1 translated hat function. V l = span{φ l,j, j t = 0,..., 2 lt, t = 1,..., d} Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 8/31

10 Sparse grids d W l = V l \ V l et t=1 are the difference spaces V n = l n W l V n (s) = l 1 n+d 1 W l, dim(v l ) = O(n d 1 2 n ). (s) Lp f f n = O(h 2 n log(h 1 n ) d 1 ) Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 9/31

11 Sparse grids Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 10/31

12 Sparse grids Sparse grid linear system: λc s + B s B s T. C s is O((2 n n d 1 ) by (2 n n d 1 )) instead of O(2 nd by 2 nd ) C s is more densely populated than C. Assembly is a difficult problem. Sparse grid combination technique. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 11/31

13 Sparse grids Sparse grid linear system: λc s + B s B s T. C s is O((2 n n d 1 ) by (2 n n d 1 )) instead of O(2 nd by 2 nd ) C s is more densely populated than C. Assembly is a difficult problem. Sparse grid combination technique. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 11/31

14 Sparse grids Sparse grid linear system: λc s + B s B s T. C s is O((2 n n d 1 ) by (2 n n d 1 )) instead of O(2 nd by 2 nd ) C s is more densely populated than C. Assembly is a difficult problem. Sparse grid combination technique. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 11/31

15 Sparse grid combination technique Idea: Write the sparse grid interpolant as a linear combination of full grid interpolants. Discretize the problem independently on these full grids. In 2D, In 3D,... û c (n,n,n) = û c (n,n) = i+j+k=n+2 i+j=n+1 u i,j,k 2 u i,j i+j+k=n+1 u i,j i+j=n u i,j,k + u i,j,k i+j+k=n Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 12/31

16 Sparse grid combination technique Idea: Write the sparse grid interpolant as a linear combination of full grid interpolants. Discretize the problem independently on these full grids. In 2D, In 3D,... û c (n,n,n) = û c (n,n) = i+j+k=n+2 i+j=n+1 u i,j,k 2 u i,j i+j+k=n+1 u i,j i+j=n u i,j,k + u i,j,k i+j+k=n Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 12/31

17 Sparse grid combination technique Combination technique illustration, "Hierarchization for the sparse grid combination technique"by Philip Hupp Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 13/31

18 An error estimate Necessary assumptions Exact solution is sufficiently smooth There are pointwise asymptotic error expansions for the discrete solutions In 2D: for i, j = 1, 2,... u u i,j = C 1 (h i )h 2 i + C 2 (h j )h 2 j + D(h i, h j )h 2 i h 2 j Coefficient functions C1, C 2, D bounded, C 1 (h i ) κ, C 2 (h j ) κ, D(h i, h j ) κ Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 14/31

19 An error estimate Difference between exact and SGC solution: u ûn,n c = u ( u ij u ij ) = i+j=n+1 i+j=n+1 }{{} n terms i+j=n (u u ij ) (u u ij ) i+j=n }{{} n 1 terms Put in the error expansion u ûn,n c = (C 1 hi 2 + C 2 hj 2 + Dhi 2 hj 2 ) (C 1 hi 2 + C 2 hj 2 + Dhi 2 hj 2 ) i+j=n+1 = C 1 h 2 n + C 2 h 2 n + i+j=n+1 i+j=n Dh 2 i h 2 j i+j=n Dh 2 i h 2 j Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 15/31

20 An error estimate Use definition of mesh width, h i = 2 i, h j = 2 j { 2 hi 2 hj 2 = 2i 2 2(n+1 i) = 2 2n 2 2 = 1 4 h2 n i + j = n i 2 2(n i) = 2 2n = hn 2 i + j = n, hence u ûn,n c = (C 1 + C D D)hn 2 4 i+j=n+1 i+j=n Use boundedness of the coefficient functions u û c n,n (κ + κ nκ + (n 1)κ)h2 n = ( n)κh2 n Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 16/31

21 An error estimate Error estimate 2D u û c n,n ( n)κh2 n = O(h 2 n log 2 (h 1 n )) (1) since n = log 2 (2 n ) = log 2 (h 1 n ). Error estimate 3D u ûn,n,n c ( n n2 )hn 2 = O(hn 2 log 2 (hn 1 ) 2 ) (2) is obtained using the same steps Can be generalized to higher dimensions Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 17/31

22 SGC for the classification problem Solve problem on sequence of grids Ω l O(dn d 1 ) linear systems (λc l + B l B T l )α l = B l y where (C l ) j,k = M ( Φ l,j, Φ l,k ) and (B l ) j,i = Φ l,j (x i ) Each system of size dim(v l ) = O(2 n ) Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 18/31

23 SGC for the classification problem Put together combination solution f (c) n ( ) d 1 f n (c) d 1 (x) = ( 1) q q q=0 l =n+(d 1) q f l (x) f l is the solution on a full grid. f l (x) = j α l,j φ l,j (x) V l Any linear operation on f n (c) the combination formula ( ) d 1 F (f n (c) d 1 ) = ( 1) q q q=0 can be expressed by means of l =n+(d 1) q F (f l ) Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 19/31

24 Some implementational aspects Training: Go through all grids, solve small linear systems Algorithm 1 Training 1: for q=0:d-1: 2: for l 1 =1:n-q: 3: for... 4: l d = n - q - (l 1-1) (l d 1-1) 5: Assemble C l, B l. 6: Solve (λc l + B l Bl T )α l = B l y 7: Save α l We use preconditioned CG for solving the system This is the slow part Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 20/31

25 Some implementational aspects Testing: use saved α ls to compute labels for given data Algorithm 2 Testing 1: for q=0:d-1: 2: for l 1 =1:n-q: 3: for... 4: Evaluate f l (x) = 2 l 1 j l1 2 l d j ld α l Φ l,j (x) 5: Add weighted contribution to overall solution Also function evaluation only on the subgrids Once we have computed α l we can reuse it for any test data Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 21/31

26 Computational complexity (Theoretical) cost: Assembly C l : O(3 d N) (N = 2 n : number of unknowns) Assembly B l : O(NM) [O(d 2 d M)], (M: number of data points) total number of subgrids: O(dn d 1 ) Overall assembly cost: O(dn d 1 (3 d N + d2 d M)) Similar cost for solving the systems (Function evaluation: O(n d 1 N M)) Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 22/31

27 Numerical results Checkerboard example - 2D 1000 data points 750 training and 250 test points Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 23/31

28 Checkerboard example # levels training accuarcy test accuracy % 82.4 % % 84.4 % % 85.6 % % 90.8 % % 92.0 % Results for the checkerboard example with λ = Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 24/31

29 A second example Our classifier (with λ = 0.001): testing accuarcy of 91% for data set 1, 89% for data set 2, 86% for data set 3 For (a rough) comparison: Neural Net: 82%, 89%, 88% Gaussian process: 90%, 89%, 88% Slower Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 25/31

30 A second example The influence of lambda First synthetic data set λ = λ = 1e 5 n train test train test Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 26/31

31 Computational cost Dependence on number of levels Scaling of computational time with number of levels. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 27/31

32 Computational cost Dependence on number of data points Computational time scales linearly with number of data points. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 28/31

33 8-dimensional example: Diabetes dataset # levels training accuarcy test accuracy λ % 79 % 1e % 80.1 % 1e % 82.3 % 1e % 83.8 % 1e-4 Results for the 8D dataset 768 instances with 8 features (like body mass index, diastolic blood pressure, etc) and a class label. 700 traning points, 68 testing points. Highly sensitive to choice of training and testing data. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 29/31

34 Finally... Questions and Remarks? Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 30/31

35 References J. Garcke, M. Griebel and M. Thess, Data Mining with Sparse Grids, Computing, 2001 J.Garcke, Sparse Grids in a Nutshell, Springer, M. Griebel, M. Schneider, C. Zenger A combination technique for the solution of sparse grid problems, 1990 Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 31/31

Dimension-adaptive Sparse Grids

Dimension-adaptive Sparse Grids Dimension-adaptive Sparse Grids Jörg Blank April 14, 2008 1 Data Mining Function Reconstruction 2 Sparse Grids Motivation Introduction 3 Combination technique Motivation Dimension-adaptive 4 Examples 5

More information

Algorithms for Scientific Computing

Algorithms for Scientific Computing Algorithms for Scientific Computing Hierarchical Methods and Sparse Grids d-dimensional Hierarchical Basis Michael Bader Technical University of Munich Summer 208 Intermezzo/ Big Picture : Archimedes Quadrature

More information

Suboptimal feedback control of PDEs by solving Hamilton-Jacobi Bellman equations on sparse grids

Suboptimal feedback control of PDEs by solving Hamilton-Jacobi Bellman equations on sparse grids Suboptimal feedback control of PDEs by solving Hamilton-Jacobi Bellman equations on sparse grids Jochen Garcke joint work with Axel Kröner, INRIA Saclay and CMAP, Ecole Polytechnique Ilja Kalmykov, Universität

More information

Fitting multidimensional data using gradient penalties and combination techniques

Fitting multidimensional data using gradient penalties and combination techniques 1 Fitting multidimensional data using gradient penalties and combination techniques Jochen Garcke and Markus Hegland Mathematical Sciences Institute Australian National University Canberra ACT 0200 jochen.garcke,markus.hegland@anu.edu.au

More information

On the computation of the eigenproblems of hydrogen and helium in strong magnetic and electric fields with the sparse grid combination technique

On the computation of the eigenproblems of hydrogen and helium in strong magnetic and electric fields with the sparse grid combination technique On the computation of the eigenproblems of hydrogen and helium in strong magnetic and electric fields with the sparse grid combination technique Jochen Garcke and Michael Griebel Institut für Angewandte

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Deep Learning: Approximation of Functions by Composition

Deep Learning: Approximation of Functions by Composition Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation

More information

Fitting multidimensional data using gradient penalties and the sparse grid combination technique

Fitting multidimensional data using gradient penalties and the sparse grid combination technique Fitting multidimensional data using gradient penalties and the sparse grid combination technique Jochen Garcke 1 and Markus Hegland 2 1 Institut für Mathematik, Technische Universität Berlin Straße des

More information

Multilevel stochastic collocations with dimensionality reduction

Multilevel stochastic collocations with dimensionality reduction Multilevel stochastic collocations with dimensionality reduction Ionut Farcas TUM, Chair of Scientific Computing in Computer Science (I5) 27.01.2017 Outline 1 Motivation 2 Theoretical background Uncertainty

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Principal Manifold Learning by Sparse Grids

Principal Manifold Learning by Sparse Grids Wegelerstraße 6 535 Bonn Germany phone +49 228 73-3427 fax +49 228 73-7527 www.ins.uni-bonn.de Chr. Feuersänger, M. Griebel Principal Manifold Learning by Sparse Grids INS Preprint No. 8 April 28 Principal

More information

MATH 590: Meshfree Methods

MATH 590: Meshfree Methods MATH 590: Meshfree Methods Chapter 33: Adaptive Iteration Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2010 fasshauer@iit.edu MATH 590 Chapter 33 1 Outline 1 A

More information

Adaptive sparse grids

Adaptive sparse grids ANZIAM J. 44 (E) ppc335 C353, 2003 C335 Adaptive sparse grids M. Hegland (Received 1 June 2001) Abstract Sparse grids, as studied by Zenger and Griebel in the last 10 years have been very successful in

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

MATH 590: Meshfree Methods

MATH 590: Meshfree Methods MATH 590: Meshfree Methods Chapter 33: Adaptive Iteration Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2010 fasshauer@iit.edu MATH 590 Chapter 33 1 Outline 1 A

More information

Variational problems in machine learning and their solution with finite elements

Variational problems in machine learning and their solution with finite elements Variational problems in machine learning and their solution with finite elements M. Hegland M. Griebel April 2007 Abstract Many machine learning problems deal with the estimation of conditional probabilities

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Nonparametric Bayesian Methods

Nonparametric Bayesian Methods Nonparametric Bayesian Methods Debdeep Pati Florida State University October 2, 2014 Large spatial datasets (Problem of big n) Large observational and computer-generated datasets: Often have spatial and

More information

Robust Domain Decomposition Preconditioners for Abstract Symmetric Positive Definite Bilinear Forms

Robust Domain Decomposition Preconditioners for Abstract Symmetric Positive Definite Bilinear Forms www.oeaw.ac.at Robust Domain Decomposition Preconditioners for Abstract Symmetric Positive Definite Bilinear Forms Y. Efendiev, J. Galvis, R. Lazarov, J. Willems RICAM-Report 2011-05 www.ricam.oeaw.ac.at

More information

A sparse grid based generative topographic mapping for the dimensionality reduction of high-dimensional data

A sparse grid based generative topographic mapping for the dimensionality reduction of high-dimensional data Wegelerstraße 6 535 Bonn Germany phone +49 228 73-3427 fax +49 228 73-7527 www.ins.uni-bonn.de M. Griebel, A. Hullmann A sparse grid based generative topographic mapping for the dimensionality reduction

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

Lecture 1. Finite difference and finite element methods. Partial differential equations (PDEs) Solving the heat equation numerically

Lecture 1. Finite difference and finite element methods. Partial differential equations (PDEs) Solving the heat equation numerically Finite difference and finite element methods Lecture 1 Scope of the course Analysis and implementation of numerical methods for pricing options. Models: Black-Scholes, stochastic volatility, exponential

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Computer simulation of multiscale problems

Computer simulation of multiscale problems Progress in the SSF project CutFEM, Geometry, and Optimal design Computer simulation of multiscale problems Axel Målqvist and Daniel Elfverson University of Gothenburg and Uppsala University Umeå 2015-05-20

More information

Suboptimal feedback control of PDEs by solving HJB equations on adaptive sparse grids

Suboptimal feedback control of PDEs by solving HJB equations on adaptive sparse grids Wegelerstraße 6 53115 Bonn Germany phone +49 228 73-3427 fax +49 228 73-7527 www.ins.uni-bonn.de Jochen Garcke and Axel Kröner Suboptimal feedback control of PDEs by solving HJB equations on adaptive sparse

More information

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks and Nonparametric Methods CMPSCI 383 Nov 17, 2011! Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Machine Learning Linear Models

Machine Learning Linear Models Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Karhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques

Karhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques Institut für Numerische Mathematik und Optimierung Karhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques Oliver Ernst Computational Methods with Applications Harrachov, CR,

More information

Partial Differential Equations and Image Processing. Eshed Ohn-Bar

Partial Differential Equations and Image Processing. Eshed Ohn-Bar Partial Differential Equations and Image Processing Eshed Ohn-Bar OBJECTIVES In this presentation you will 1) Learn what partial differential equations are and where do they arise 2) Learn how to discretize

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

Bagging and Other Ensemble Methods

Bagging and Other Ensemble Methods Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained

More information

On the convergence of the combination technique

On the convergence of the combination technique Wegelerstraße 6 53115 Bonn Germany phone +49 228 73-3427 fax +49 228 73-7527 www.ins.uni-bonn.de M. Griebel, H. Harbrecht On the convergence of the combination technique INS Preprint No. 1304 2013 On

More information

Time Series Classification

Time Series Classification Distance Measures Classifiers DTW vs. ED Further Work Questions August 31, 2017 Distance Measures Classifiers DTW vs. ED Further Work Questions Outline 1 2 Distance Measures 3 Classifiers 4 DTW vs. ED

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Infinite Ensemble Learning with Support Vector Machinery

Infinite Ensemble Learning with Support Vector Machinery Infinite Ensemble Learning with Support Vector Machinery Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology ECML/PKDD, October 4, 2005 H.-T. Lin and L. Li (Learning Systems

More information

Manifold Regularization

Manifold Regularization 9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Other Artificial Neural Networks Samy Bengio IDIAP Research Institute, Martigny, Switzerland,

More information

Orthogonality of hat functions in Sobolev spaces

Orthogonality of hat functions in Sobolev spaces 1 Orthogonality of hat functions in Sobolev spaces Ulrich Reif Technische Universität Darmstadt A Strobl, September 18, 27 2 3 Outline: Recap: quasi interpolation Recap: orthogonality of uniform B-splines

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Regularization and Optimization of Backpropagation

Regularization and Optimization of Backpropagation Regularization and Optimization of Backpropagation The Norwegian University of Science and Technology (NTNU) Trondheim, Norway keithd@idi.ntnu.no October 17, 2017 Regularization Definition of Regularization

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 9: Basis Expansions Department of Statistics & Biostatistics Rutgers University Nov 01, 2011 Regression and Classification Linear Regression. E(Y X) = f(x) We want to learn

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

Neural Network Language Modeling

Neural Network Language Modeling Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation

More information

Linear Discriminant Functions

Linear Discriminant Functions Linear Discriminant Functions Linear discriminant functions and decision surfaces Definition It is a function that is a linear combination of the components of g() = t + 0 () here is the eight vector and

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Neural Networks DWML, /25

Neural Networks DWML, /25 DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

1 Discretizing BVP with Finite Element Methods.

1 Discretizing BVP with Finite Element Methods. 1 Discretizing BVP with Finite Element Methods In this section, we will discuss a process for solving boundary value problems numerically, the Finite Element Method (FEM) We note that such method is a

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Discussion About Nonlinear Time Series Prediction Using Least Squares Support Vector Machine

Discussion About Nonlinear Time Series Prediction Using Least Squares Support Vector Machine Commun. Theor. Phys. (Beijing, China) 43 (2005) pp. 1056 1060 c International Academic Publishers Vol. 43, No. 6, June 15, 2005 Discussion About Nonlinear Time Series Prediction Using Least Squares Support

More information

Adapted Feature Extraction and Its Applications

Adapted Feature Extraction and Its Applications saito@math.ucdavis.edu 1 Adapted Feature Extraction and Its Applications Naoki Saito Department of Mathematics University of California Davis, CA 95616 email: saito@math.ucdavis.edu URL: http://www.math.ucdavis.edu/

More information

10701/15781 Machine Learning, Spring 2007: Homework 2

10701/15781 Machine Learning, Spring 2007: Homework 2 070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach

More information

Regularized Least Squares

Regularized Least Squares Regularized Least Squares Ryan M. Rifkin Google, Inc. 2008 Basics: Data Data points S = {(X 1, Y 1 ),...,(X n, Y n )}. We let X simultaneously refer to the set {X 1,...,X n } and to the n by d matrix whose

More information

1. Fast Iterative Solvers of SLE

1. Fast Iterative Solvers of SLE 1. Fast Iterative Solvers of crucial drawback of solvers discussed so far: they become slower if we discretize more accurate! now: look for possible remedies relaxation: explicit application of the multigrid

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Adaptive simulation of Vlasov equations in arbitrary dimension using interpolatory hierarchical bases

Adaptive simulation of Vlasov equations in arbitrary dimension using interpolatory hierarchical bases Adaptive simulation of Vlasov equations in arbitrary dimension using interpolatory hierarchical bases Erwan Deriaz Fusion Plasma Team Institut Jean Lamour (Nancy), CNRS / Université de Lorraine VLASIX

More information

Brief Introduction to Machine Learning

Brief Introduction to Machine Learning Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector

More information

Feature selection. Micha Elsner. January 29, 2014

Feature selection. Micha Elsner. January 29, 2014 Feature selection Micha Elsner January 29, 2014 2 Using megam as max-ent learner Hal Daume III from UMD wrote a max-ent learner Pretty typical of many classifiers out there... Step one: create a text file

More information

Chapter 1: The Finite Element Method

Chapter 1: The Finite Element Method Chapter 1: The Finite Element Method Michael Hanke Read: Strang, p 428 436 A Model Problem Mathematical Models, Analysis and Simulation, Part Applications: u = fx), < x < 1 u) = u1) = D) axial deformation

More information

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep

More information

Chapter 1 Vector Spaces

Chapter 1 Vector Spaces Chapter 1 Vector Spaces Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 110 Linear Algebra Vector Spaces Definition A vector space V over a field

More information

Acceleration of a Domain Decomposition Method for Advection-Diffusion Problems

Acceleration of a Domain Decomposition Method for Advection-Diffusion Problems Acceleration of a Domain Decomposition Method for Advection-Diffusion Problems Gert Lube 1, Tobias Knopp 2, and Gerd Rapin 2 1 University of Göttingen, Institute of Numerical and Applied Mathematics (http://www.num.math.uni-goettingen.de/lube/)

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

How much should we rely on Besov spaces as a framework for the mathematical study of images?

How much should we rely on Besov spaces as a framework for the mathematical study of images? How much should we rely on Besov spaces as a framework for the mathematical study of images? C. Sinan Güntürk Princeton University, Program in Applied and Computational Mathematics Abstract Relations between

More information

Analysis of Multiclass Support Vector Machines

Analysis of Multiclass Support Vector Machines Analysis of Multiclass Support Vector Machines Shigeo Abe Graduate School of Science and Technology Kobe University Kobe, Japan abe@eedept.kobe-u.ac.jp Abstract Since support vector machines for pattern

More information

Machine Learning in a Nutshell. Data Model Performance Measure Machine Learner

Machine Learning in a Nutshell. Data Model Performance Measure Machine Learner Machine Learning 1 Machine Learning in a Nutshell Data Model Performance Measure Machine Learner 2 Machine Learning in a Nutshell Data Model Performance Measure Machine Learner Data with a5ributes ID A1

More information

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition Radial Basis Function Networks Ravi Kaushik Project 1 CSC 84010 Neural Networks and Pattern Recognition History Radial Basis Function (RBF) emerged in late 1980 s as a variant of artificial neural network.

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Chapter Two: Numerical Methods for Elliptic PDEs. 1 Finite Difference Methods for Elliptic PDEs

Chapter Two: Numerical Methods for Elliptic PDEs. 1 Finite Difference Methods for Elliptic PDEs Chapter Two: Numerical Methods for Elliptic PDEs Finite Difference Methods for Elliptic PDEs.. Finite difference scheme. We consider a simple example u := subject to Dirichlet boundary conditions ( ) u

More information

Content. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition

Content. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition Content Andrew Kusiak Intelligent Systems Laboratory 239 Seamans Center The University of Iowa Iowa City, IA 52242-527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Introduction to learning

More information

Introduction to Support Vector Machine

Introduction to Support Vector Machine Introduction to Support Vector Machine Yuh-Jye Lee National Taiwan University of Science and Technology September 23, 2009 1 / 76 Binary Classification Problem 2 / 76 Binary Classification Problem (A Fundamental

More information

CSC 411: Lecture 09: Naive Bayes

CSC 411: Lecture 09: Naive Bayes CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Taylor s Theorem Can often approximate a function by a polynomial The error in the approximation

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Consider the following example of a linear system:

Consider the following example of a linear system: LINEAR SYSTEMS Consider the following example of a linear system: Its unique solution is x + 2x 2 + 3x 3 = 5 x + x 3 = 3 3x + x 2 + 3x 3 = 3 x =, x 2 = 0, x 3 = 2 In general we want to solve n equations

More information

4. Numerical Quadrature. Where analytical abilities end... continued

4. Numerical Quadrature. Where analytical abilities end... continued 4. Numerical Quadrature Where analytical abilities end... continued Where analytical abilities end... continued, November 30, 22 1 4.3. Extrapolation Increasing the Order Using Linear Combinations Once

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

The Perceptron Algorithm

The Perceptron Algorithm The Perceptron Algorithm Greg Grudic Greg Grudic Machine Learning Questions? Greg Grudic Machine Learning 2 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output

More information

Solving PDEs with Multigrid Methods p.1

Solving PDEs with Multigrid Methods p.1 Solving PDEs with Multigrid Methods Scott MacLachlan maclachl@colorado.edu Department of Applied Mathematics, University of Colorado at Boulder Solving PDEs with Multigrid Methods p.1 Support and Collaboration

More information

Algorithms for Scientific Computing

Algorithms for Scientific Computing Algorithms for Scientific Computing Finite Element Methods Michael Bader Technical University of Munich Summer 2016 Part I Looking Back: Discrete Models for Heat Transfer and the Poisson Equation Modelling

More information

Semi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT)

Semi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT) Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT) Gigantic Image Collections What does the world look like? High

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

More information

Nonlinear Models. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University.

Nonlinear Models. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University. Nonlinear Models Numerical Methods for Deep Learning Lars Ruthotto Departments of Mathematics and Computer Science, Emory University Intro 1 Course Overview Intro 2 Course Overview Lecture 1: Linear Models

More information

Solving Boundary Value Problems (with Gaussians)

Solving Boundary Value Problems (with Gaussians) What is a boundary value problem? Solving Boundary Value Problems (with Gaussians) Definition A differential equation with constraints on the boundary Michael McCourt Division Argonne National Laboratory

More information

Chapter 13. Eddy Diffusivity

Chapter 13. Eddy Diffusivity Chapter 13 Eddy Diffusivity Glenn introduced the mean field approximation of turbulence in two-layer quasigesotrophic turbulence. In that approximation one must solve the zonally averaged equations for

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Text Categorization, Support Vector Machines Sameer Maskey Announcement Reading Assignments Will be posted online tonight Homework 1 Assigned and available from the course website

More information

Numerical Solution I

Numerical Solution I Numerical Solution I Stationary Flow R. Kornhuber (FU Berlin) Summerschool Modelling of mass and energy transport in porous media with practical applications October 8-12, 2018 Schedule Classical Solutions

More information