Lena Leitenmaier & Parikshit Upadhyaya KTH - Royal Institute of Technology
|
|
- Charlotte Ryan
- 5 years ago
- Views:
Transcription
1 Data mining with sparse grids Lena Leitenmaier & Parikshit Upadhyaya KTH - Royal Institute of Technology
2 Classification problem Given: training data d-dimensional data set (feature space); typically relatively big d danger of curse of dimensionality class label, e.g. { 1, 1}, for each data point Goal: based on the training data, construct a classifier that can predict the class of any given new data point Strategies to solve the problem: neural networks, support vector machines, regularization networks,... Approach here: regularization network based, minimization on sparse grids Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 2/31
3 More mathematical problem description Given: training set S = {(x i, y i ) R d R} M i=1 Assume data were obtained by sampling of unknown function f V defined over R d, sampling process was disturbed by noise. Goal: recover f from the data as good as possible. To make problem well-posed add additional smoothness constraint Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 3/31
4 More mathematical problem description Variational problem with R(f ) = 1 M min R(f ) f V M C(f (x i ), y i ) + λφ(f ) i=1 C(, ): cost function measuring interpolation error Φ(f ): smoothness functional λ: regularization parameter, balances the two terms. Example: C(x, y) = (x y) 2, Φ(f ) = f 2 2. λ has to be chosen appropriately, e.g. according to cross-validation Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 4/31
5 Discretization Restriction to finite dimensional subspace V N V : N f N = α j φ j (x), j=1 {αj } N j=1 : degrees of freedom {Φj } N j=1 : a basis for V N. Choose C(f N (x i ), y i ) = (f N (x i ) y i ) 2, Φ(f N ) = Pf N 2 L 2. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 5/31
6 Linear system Minimizing R(f ) = 1 M M (f (x i ), y i ) 2 + λ Pf N 2 L 2, i=1 1 M for f N V N is equivalent to solving N M N α j φ j (x i ) φ k (x i )+λ α j (Pφ j, Pφ k ) L2 = 1 j=1 i=1 j=1 M M y i φ k (x i ) i=1 In matrix form (B B T + λc)α = By, C R N N, B R N M Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 6/31
7 Direct FE approximation Naive approach: Direct application of finite element method, solving the arising linear system; Assume an equidistant grid Ω n with mesh size h n = 2 n for each coordinate direction Number of grid points would be of the order ) = O(2 nd ) not feasible! O(h d n Matrix sizes: C is (2 n + 1) d (2 n + 1) d, B is (2 n + 1) d M Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 7/31
8 Sparse grids V n = span{φ n,j, j t = 0,..., 2 n t = 1,..., d} l = (l 1,..., l d ), j = (j 1,..., j d ), j t = 0,..., 2 lt d φ l,j = φ lt,jt (x t ), where φ lt,jt (x t ) is the scaled and t=1 translated hat function. V l = span{φ l,j, j t = 0,..., 2 lt, t = 1,..., d} Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 8/31
9 Sparse grids V n = span{φ n,j, j t = 0,..., 2 n t = 1,..., d} l = (l 1,..., l d ), j = (j 1,..., j d ), j t = 0,..., 2 lt d φ l,j = φ lt,jt (x t ), where φ lt,jt (x t ) is the scaled and t=1 translated hat function. V l = span{φ l,j, j t = 0,..., 2 lt, t = 1,..., d} Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 8/31
10 Sparse grids d W l = V l \ V l et t=1 are the difference spaces V n = l n W l V n (s) = l 1 n+d 1 W l, dim(v l ) = O(n d 1 2 n ). (s) Lp f f n = O(h 2 n log(h 1 n ) d 1 ) Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 9/31
11 Sparse grids Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 10/31
12 Sparse grids Sparse grid linear system: λc s + B s B s T. C s is O((2 n n d 1 ) by (2 n n d 1 )) instead of O(2 nd by 2 nd ) C s is more densely populated than C. Assembly is a difficult problem. Sparse grid combination technique. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 11/31
13 Sparse grids Sparse grid linear system: λc s + B s B s T. C s is O((2 n n d 1 ) by (2 n n d 1 )) instead of O(2 nd by 2 nd ) C s is more densely populated than C. Assembly is a difficult problem. Sparse grid combination technique. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 11/31
14 Sparse grids Sparse grid linear system: λc s + B s B s T. C s is O((2 n n d 1 ) by (2 n n d 1 )) instead of O(2 nd by 2 nd ) C s is more densely populated than C. Assembly is a difficult problem. Sparse grid combination technique. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 11/31
15 Sparse grid combination technique Idea: Write the sparse grid interpolant as a linear combination of full grid interpolants. Discretize the problem independently on these full grids. In 2D, In 3D,... û c (n,n,n) = û c (n,n) = i+j+k=n+2 i+j=n+1 u i,j,k 2 u i,j i+j+k=n+1 u i,j i+j=n u i,j,k + u i,j,k i+j+k=n Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 12/31
16 Sparse grid combination technique Idea: Write the sparse grid interpolant as a linear combination of full grid interpolants. Discretize the problem independently on these full grids. In 2D, In 3D,... û c (n,n,n) = û c (n,n) = i+j+k=n+2 i+j=n+1 u i,j,k 2 u i,j i+j+k=n+1 u i,j i+j=n u i,j,k + u i,j,k i+j+k=n Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 12/31
17 Sparse grid combination technique Combination technique illustration, "Hierarchization for the sparse grid combination technique"by Philip Hupp Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 13/31
18 An error estimate Necessary assumptions Exact solution is sufficiently smooth There are pointwise asymptotic error expansions for the discrete solutions In 2D: for i, j = 1, 2,... u u i,j = C 1 (h i )h 2 i + C 2 (h j )h 2 j + D(h i, h j )h 2 i h 2 j Coefficient functions C1, C 2, D bounded, C 1 (h i ) κ, C 2 (h j ) κ, D(h i, h j ) κ Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 14/31
19 An error estimate Difference between exact and SGC solution: u ûn,n c = u ( u ij u ij ) = i+j=n+1 i+j=n+1 }{{} n terms i+j=n (u u ij ) (u u ij ) i+j=n }{{} n 1 terms Put in the error expansion u ûn,n c = (C 1 hi 2 + C 2 hj 2 + Dhi 2 hj 2 ) (C 1 hi 2 + C 2 hj 2 + Dhi 2 hj 2 ) i+j=n+1 = C 1 h 2 n + C 2 h 2 n + i+j=n+1 i+j=n Dh 2 i h 2 j i+j=n Dh 2 i h 2 j Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 15/31
20 An error estimate Use definition of mesh width, h i = 2 i, h j = 2 j { 2 hi 2 hj 2 = 2i 2 2(n+1 i) = 2 2n 2 2 = 1 4 h2 n i + j = n i 2 2(n i) = 2 2n = hn 2 i + j = n, hence u ûn,n c = (C 1 + C D D)hn 2 4 i+j=n+1 i+j=n Use boundedness of the coefficient functions u û c n,n (κ + κ nκ + (n 1)κ)h2 n = ( n)κh2 n Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 16/31
21 An error estimate Error estimate 2D u û c n,n ( n)κh2 n = O(h 2 n log 2 (h 1 n )) (1) since n = log 2 (2 n ) = log 2 (h 1 n ). Error estimate 3D u ûn,n,n c ( n n2 )hn 2 = O(hn 2 log 2 (hn 1 ) 2 ) (2) is obtained using the same steps Can be generalized to higher dimensions Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 17/31
22 SGC for the classification problem Solve problem on sequence of grids Ω l O(dn d 1 ) linear systems (λc l + B l B T l )α l = B l y where (C l ) j,k = M ( Φ l,j, Φ l,k ) and (B l ) j,i = Φ l,j (x i ) Each system of size dim(v l ) = O(2 n ) Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 18/31
23 SGC for the classification problem Put together combination solution f (c) n ( ) d 1 f n (c) d 1 (x) = ( 1) q q q=0 l =n+(d 1) q f l (x) f l is the solution on a full grid. f l (x) = j α l,j φ l,j (x) V l Any linear operation on f n (c) the combination formula ( ) d 1 F (f n (c) d 1 ) = ( 1) q q q=0 can be expressed by means of l =n+(d 1) q F (f l ) Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 19/31
24 Some implementational aspects Training: Go through all grids, solve small linear systems Algorithm 1 Training 1: for q=0:d-1: 2: for l 1 =1:n-q: 3: for... 4: l d = n - q - (l 1-1) (l d 1-1) 5: Assemble C l, B l. 6: Solve (λc l + B l Bl T )α l = B l y 7: Save α l We use preconditioned CG for solving the system This is the slow part Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 20/31
25 Some implementational aspects Testing: use saved α ls to compute labels for given data Algorithm 2 Testing 1: for q=0:d-1: 2: for l 1 =1:n-q: 3: for... 4: Evaluate f l (x) = 2 l 1 j l1 2 l d j ld α l Φ l,j (x) 5: Add weighted contribution to overall solution Also function evaluation only on the subgrids Once we have computed α l we can reuse it for any test data Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 21/31
26 Computational complexity (Theoretical) cost: Assembly C l : O(3 d N) (N = 2 n : number of unknowns) Assembly B l : O(NM) [O(d 2 d M)], (M: number of data points) total number of subgrids: O(dn d 1 ) Overall assembly cost: O(dn d 1 (3 d N + d2 d M)) Similar cost for solving the systems (Function evaluation: O(n d 1 N M)) Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 22/31
27 Numerical results Checkerboard example - 2D 1000 data points 750 training and 250 test points Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 23/31
28 Checkerboard example # levels training accuarcy test accuracy % 82.4 % % 84.4 % % 85.6 % % 90.8 % % 92.0 % Results for the checkerboard example with λ = Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 24/31
29 A second example Our classifier (with λ = 0.001): testing accuarcy of 91% for data set 1, 89% for data set 2, 86% for data set 3 For (a rough) comparison: Neural Net: 82%, 89%, 88% Gaussian process: 90%, 89%, 88% Slower Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 25/31
30 A second example The influence of lambda First synthetic data set λ = λ = 1e 5 n train test train test Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 26/31
31 Computational cost Dependence on number of levels Scaling of computational time with number of levels. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 27/31
32 Computational cost Dependence on number of data points Computational time scales linearly with number of data points. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 28/31
33 8-dimensional example: Diabetes dataset # levels training accuarcy test accuracy λ % 79 % 1e % 80.1 % 1e % 82.3 % 1e % 83.8 % 1e-4 Results for the 8D dataset 768 instances with 8 features (like body mass index, diastolic blood pressure, etc) and a class label. 700 traning points, 68 testing points. Highly sensitive to choice of training and testing data. Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 29/31
34 Finally... Questions and Remarks? Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 30/31
35 References J. Garcke, M. Griebel and M. Thess, Data Mining with Sparse Grids, Computing, 2001 J.Garcke, Sparse Grids in a Nutshell, Springer, M. Griebel, M. Schneider, C. Zenger A combination technique for the solution of sparse grid problems, 1990 Data mining with sparse grids P. Upadhyaya, L. Leitenmaier 31/31
Dimension-adaptive Sparse Grids
Dimension-adaptive Sparse Grids Jörg Blank April 14, 2008 1 Data Mining Function Reconstruction 2 Sparse Grids Motivation Introduction 3 Combination technique Motivation Dimension-adaptive 4 Examples 5
More informationAlgorithms for Scientific Computing
Algorithms for Scientific Computing Hierarchical Methods and Sparse Grids d-dimensional Hierarchical Basis Michael Bader Technical University of Munich Summer 208 Intermezzo/ Big Picture : Archimedes Quadrature
More informationSuboptimal feedback control of PDEs by solving Hamilton-Jacobi Bellman equations on sparse grids
Suboptimal feedback control of PDEs by solving Hamilton-Jacobi Bellman equations on sparse grids Jochen Garcke joint work with Axel Kröner, INRIA Saclay and CMAP, Ecole Polytechnique Ilja Kalmykov, Universität
More informationFitting multidimensional data using gradient penalties and combination techniques
1 Fitting multidimensional data using gradient penalties and combination techniques Jochen Garcke and Markus Hegland Mathematical Sciences Institute Australian National University Canberra ACT 0200 jochen.garcke,markus.hegland@anu.edu.au
More informationOn the computation of the eigenproblems of hydrogen and helium in strong magnetic and electric fields with the sparse grid combination technique
On the computation of the eigenproblems of hydrogen and helium in strong magnetic and electric fields with the sparse grid combination technique Jochen Garcke and Michael Griebel Institut für Angewandte
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationDeep Learning: Approximation of Functions by Composition
Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation
More informationFitting multidimensional data using gradient penalties and the sparse grid combination technique
Fitting multidimensional data using gradient penalties and the sparse grid combination technique Jochen Garcke 1 and Markus Hegland 2 1 Institut für Mathematik, Technische Universität Berlin Straße des
More informationMultilevel stochastic collocations with dimensionality reduction
Multilevel stochastic collocations with dimensionality reduction Ionut Farcas TUM, Chair of Scientific Computing in Computer Science (I5) 27.01.2017 Outline 1 Motivation 2 Theoretical background Uncertainty
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationPrincipal Manifold Learning by Sparse Grids
Wegelerstraße 6 535 Bonn Germany phone +49 228 73-3427 fax +49 228 73-7527 www.ins.uni-bonn.de Chr. Feuersänger, M. Griebel Principal Manifold Learning by Sparse Grids INS Preprint No. 8 April 28 Principal
More informationMATH 590: Meshfree Methods
MATH 590: Meshfree Methods Chapter 33: Adaptive Iteration Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2010 fasshauer@iit.edu MATH 590 Chapter 33 1 Outline 1 A
More informationAdaptive sparse grids
ANZIAM J. 44 (E) ppc335 C353, 2003 C335 Adaptive sparse grids M. Hegland (Received 1 June 2001) Abstract Sparse grids, as studied by Zenger and Griebel in the last 10 years have been very successful in
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationMATH 590: Meshfree Methods
MATH 590: Meshfree Methods Chapter 33: Adaptive Iteration Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2010 fasshauer@iit.edu MATH 590 Chapter 33 1 Outline 1 A
More informationVariational problems in machine learning and their solution with finite elements
Variational problems in machine learning and their solution with finite elements M. Hegland M. Griebel April 2007 Abstract Many machine learning problems deal with the estimation of conditional probabilities
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationNonparametric Bayesian Methods
Nonparametric Bayesian Methods Debdeep Pati Florida State University October 2, 2014 Large spatial datasets (Problem of big n) Large observational and computer-generated datasets: Often have spatial and
More informationRobust Domain Decomposition Preconditioners for Abstract Symmetric Positive Definite Bilinear Forms
www.oeaw.ac.at Robust Domain Decomposition Preconditioners for Abstract Symmetric Positive Definite Bilinear Forms Y. Efendiev, J. Galvis, R. Lazarov, J. Willems RICAM-Report 2011-05 www.ricam.oeaw.ac.at
More informationA sparse grid based generative topographic mapping for the dimensionality reduction of high-dimensional data
Wegelerstraße 6 535 Bonn Germany phone +49 228 73-3427 fax +49 228 73-7527 www.ins.uni-bonn.de M. Griebel, A. Hullmann A sparse grid based generative topographic mapping for the dimensionality reduction
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationLecture 1. Finite difference and finite element methods. Partial differential equations (PDEs) Solving the heat equation numerically
Finite difference and finite element methods Lecture 1 Scope of the course Analysis and implementation of numerical methods for pricing options. Models: Black-Scholes, stochastic volatility, exponential
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationComputer simulation of multiscale problems
Progress in the SSF project CutFEM, Geometry, and Optimal design Computer simulation of multiscale problems Axel Målqvist and Daniel Elfverson University of Gothenburg and Uppsala University Umeå 2015-05-20
More informationSuboptimal feedback control of PDEs by solving HJB equations on adaptive sparse grids
Wegelerstraße 6 53115 Bonn Germany phone +49 228 73-3427 fax +49 228 73-7527 www.ins.uni-bonn.de Jochen Garcke and Axel Kröner Suboptimal feedback control of PDEs by solving HJB equations on adaptive sparse
More informationArtificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!
Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationMachine Learning Linear Models
Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method
More informationARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationKarhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques
Institut für Numerische Mathematik und Optimierung Karhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques Oliver Ernst Computational Methods with Applications Harrachov, CR,
More informationPartial Differential Equations and Image Processing. Eshed Ohn-Bar
Partial Differential Equations and Image Processing Eshed Ohn-Bar OBJECTIVES In this presentation you will 1) Learn what partial differential equations are and where do they arise 2) Learn how to discretize
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More informationBagging and Other Ensemble Methods
Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained
More informationOn the convergence of the combination technique
Wegelerstraße 6 53115 Bonn Germany phone +49 228 73-3427 fax +49 228 73-7527 www.ins.uni-bonn.de M. Griebel, H. Harbrecht On the convergence of the combination technique INS Preprint No. 1304 2013 On
More informationTime Series Classification
Distance Measures Classifiers DTW vs. ED Further Work Questions August 31, 2017 Distance Measures Classifiers DTW vs. ED Further Work Questions Outline 1 2 Distance Measures 3 Classifiers 4 DTW vs. ED
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationInfinite Ensemble Learning with Support Vector Machinery
Infinite Ensemble Learning with Support Vector Machinery Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology ECML/PKDD, October 4, 2005 H.-T. Lin and L. Li (Learning Systems
More informationManifold Regularization
9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Other Artificial Neural Networks Samy Bengio IDIAP Research Institute, Martigny, Switzerland,
More informationOrthogonality of hat functions in Sobolev spaces
1 Orthogonality of hat functions in Sobolev spaces Ulrich Reif Technische Universität Darmstadt A Strobl, September 18, 27 2 3 Outline: Recap: quasi interpolation Recap: orthogonality of uniform B-splines
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationRegularization and Optimization of Backpropagation
Regularization and Optimization of Backpropagation The Norwegian University of Science and Technology (NTNU) Trondheim, Norway keithd@idi.ntnu.no October 17, 2017 Regularization Definition of Regularization
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 9: Basis Expansions Department of Statistics & Biostatistics Rutgers University Nov 01, 2011 Regression and Classification Linear Regression. E(Y X) = f(x) We want to learn
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationFeed-forward Network Functions
Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification
More informationNeural Network Language Modeling
Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation
More informationLinear Discriminant Functions
Linear Discriminant Functions Linear discriminant functions and decision surfaces Definition It is a function that is a linear combination of the components of g() = t + 0 () here is the eight vector and
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationNeural Networks DWML, /25
DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More information1 Discretizing BVP with Finite Element Methods.
1 Discretizing BVP with Finite Element Methods In this section, we will discuss a process for solving boundary value problems numerically, the Finite Element Method (FEM) We note that such method is a
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationDiscussion About Nonlinear Time Series Prediction Using Least Squares Support Vector Machine
Commun. Theor. Phys. (Beijing, China) 43 (2005) pp. 1056 1060 c International Academic Publishers Vol. 43, No. 6, June 15, 2005 Discussion About Nonlinear Time Series Prediction Using Least Squares Support
More informationAdapted Feature Extraction and Its Applications
saito@math.ucdavis.edu 1 Adapted Feature Extraction and Its Applications Naoki Saito Department of Mathematics University of California Davis, CA 95616 email: saito@math.ucdavis.edu URL: http://www.math.ucdavis.edu/
More information10701/15781 Machine Learning, Spring 2007: Homework 2
070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach
More informationRegularized Least Squares
Regularized Least Squares Ryan M. Rifkin Google, Inc. 2008 Basics: Data Data points S = {(X 1, Y 1 ),...,(X n, Y n )}. We let X simultaneously refer to the set {X 1,...,X n } and to the n by d matrix whose
More information1. Fast Iterative Solvers of SLE
1. Fast Iterative Solvers of crucial drawback of solvers discussed so far: they become slower if we discretize more accurate! now: look for possible remedies relaxation: explicit application of the multigrid
More informationBayesian Support Vector Machines for Feature Ranking and Selection
Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationAdaptive simulation of Vlasov equations in arbitrary dimension using interpolatory hierarchical bases
Adaptive simulation of Vlasov equations in arbitrary dimension using interpolatory hierarchical bases Erwan Deriaz Fusion Plasma Team Institut Jean Lamour (Nancy), CNRS / Université de Lorraine VLASIX
More informationBrief Introduction to Machine Learning
Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector
More informationFeature selection. Micha Elsner. January 29, 2014
Feature selection Micha Elsner January 29, 2014 2 Using megam as max-ent learner Hal Daume III from UMD wrote a max-ent learner Pretty typical of many classifiers out there... Step one: create a text file
More informationChapter 1: The Finite Element Method
Chapter 1: The Finite Element Method Michael Hanke Read: Strang, p 428 436 A Model Problem Mathematical Models, Analysis and Simulation, Part Applications: u = fx), < x < 1 u) = u1) = D) axial deformation
More informationIPAM Summer School Optimization methods for machine learning. Jorge Nocedal
IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep
More informationChapter 1 Vector Spaces
Chapter 1 Vector Spaces Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 110 Linear Algebra Vector Spaces Definition A vector space V over a field
More informationAcceleration of a Domain Decomposition Method for Advection-Diffusion Problems
Acceleration of a Domain Decomposition Method for Advection-Diffusion Problems Gert Lube 1, Tobias Knopp 2, and Gerd Rapin 2 1 University of Göttingen, Institute of Numerical and Applied Mathematics (http://www.num.math.uni-goettingen.de/lube/)
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationHow much should we rely on Besov spaces as a framework for the mathematical study of images?
How much should we rely on Besov spaces as a framework for the mathematical study of images? C. Sinan Güntürk Princeton University, Program in Applied and Computational Mathematics Abstract Relations between
More informationAnalysis of Multiclass Support Vector Machines
Analysis of Multiclass Support Vector Machines Shigeo Abe Graduate School of Science and Technology Kobe University Kobe, Japan abe@eedept.kobe-u.ac.jp Abstract Since support vector machines for pattern
More informationMachine Learning in a Nutshell. Data Model Performance Measure Machine Learner
Machine Learning 1 Machine Learning in a Nutshell Data Model Performance Measure Machine Learner 2 Machine Learning in a Nutshell Data Model Performance Measure Machine Learner Data with a5ributes ID A1
More informationRadial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition
Radial Basis Function Networks Ravi Kaushik Project 1 CSC 84010 Neural Networks and Pattern Recognition History Radial Basis Function (RBF) emerged in late 1980 s as a variant of artificial neural network.
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationChapter Two: Numerical Methods for Elliptic PDEs. 1 Finite Difference Methods for Elliptic PDEs
Chapter Two: Numerical Methods for Elliptic PDEs Finite Difference Methods for Elliptic PDEs.. Finite difference scheme. We consider a simple example u := subject to Dirichlet boundary conditions ( ) u
More informationContent. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition
Content Andrew Kusiak Intelligent Systems Laboratory 239 Seamans Center The University of Iowa Iowa City, IA 52242-527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Introduction to learning
More informationIntroduction to Support Vector Machine
Introduction to Support Vector Machine Yuh-Jye Lee National Taiwan University of Science and Technology September 23, 2009 1 / 76 Binary Classification Problem 2 / 76 Binary Classification Problem (A Fundamental
More informationCSC 411: Lecture 09: Naive Bayes
CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1
More informationMINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava
MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,
More informationEAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science
EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Taylor s Theorem Can often approximate a function by a polynomial The error in the approximation
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationConsider the following example of a linear system:
LINEAR SYSTEMS Consider the following example of a linear system: Its unique solution is x + 2x 2 + 3x 3 = 5 x + x 3 = 3 3x + x 2 + 3x 3 = 3 x =, x 2 = 0, x 3 = 2 In general we want to solve n equations
More information4. Numerical Quadrature. Where analytical abilities end... continued
4. Numerical Quadrature Where analytical abilities end... continued Where analytical abilities end... continued, November 30, 22 1 4.3. Extrapolation Increasing the Order Using Linear Combinations Once
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationThe Perceptron Algorithm
The Perceptron Algorithm Greg Grudic Greg Grudic Machine Learning Questions? Greg Grudic Machine Learning 2 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output
More informationSolving PDEs with Multigrid Methods p.1
Solving PDEs with Multigrid Methods Scott MacLachlan maclachl@colorado.edu Department of Applied Mathematics, University of Colorado at Boulder Solving PDEs with Multigrid Methods p.1 Support and Collaboration
More informationAlgorithms for Scientific Computing
Algorithms for Scientific Computing Finite Element Methods Michael Bader Technical University of Munich Summer 2016 Part I Looking Back: Discrete Models for Heat Transfer and the Poisson Equation Modelling
More informationSemi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT)
Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT) Gigantic Image Collections What does the world look like? High
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification
More informationNonlinear Models. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University.
Nonlinear Models Numerical Methods for Deep Learning Lars Ruthotto Departments of Mathematics and Computer Science, Emory University Intro 1 Course Overview Intro 2 Course Overview Lecture 1: Linear Models
More informationSolving Boundary Value Problems (with Gaussians)
What is a boundary value problem? Solving Boundary Value Problems (with Gaussians) Definition A differential equation with constraints on the boundary Michael McCourt Division Argonne National Laboratory
More informationChapter 13. Eddy Diffusivity
Chapter 13 Eddy Diffusivity Glenn introduced the mean field approximation of turbulence in two-layer quasigesotrophic turbulence. In that approximation one must solve the zonally averaged equations for
More informationStatistical Methods for NLP
Statistical Methods for NLP Text Categorization, Support Vector Machines Sameer Maskey Announcement Reading Assignments Will be posted online tonight Homework 1 Assigned and available from the course website
More informationNumerical Solution I
Numerical Solution I Stationary Flow R. Kornhuber (FU Berlin) Summerschool Modelling of mass and energy transport in porous media with practical applications October 8-12, 2018 Schedule Classical Solutions
More information