From approximation theory to machine learning

Size: px
Start display at page:

Download "From approximation theory to machine learning"

Transcription

1 1/40 From approximation theory to machine learning New perspectives in the theory of function spaces and their applications September 2017, Bedlewo, Poland Jan Vybíral Charles University/Czech Technical University Prague, Czech Republic

2 2/40 Introduction Approximation Theory vs. Data Analysis (or Machine Learning, or Learning Theory) Example I: l 1 -minimization Example II: Support vector machines Example III: Ridge functions and neural networks

3 3/40 Approximation theory K - class of objects of interest: vectors, functions, operators, approximated by (same or simpler) objects usually with only limited information about the unknown object... with the similarity (approximation error) measured by some sort of distance worst case - supremum of the approximation error over K average case - mean value of the (square of the) approximation error over K w.r.t. some probability measure on K

4 4/40 Approximation of multivariate functions Let f : Ω R d R be a function of many (d 1) variables We want to approximate f using only (a small number of) function values f (x 1 ),..., f (x n ) Questions: Which functions (assumptions on f )? How to measure the error? How to choose the sampling points? Decay of the error with growing number of sampling points? Algorithms and optimality? Dependence on d?

5 5/40 Data Analysis Discovering structure in given data, allowing for predictions, conclusions, etc. Given data can have different (and also heterogeneous) formats. For inputs x 1,..., x n R d and outputs y 1,..., y n R we would like to study the functional dependence y i f (x i ) The performance is in practice often tested by learning on a subset of the data and testing the rest. To allow theoretical results, we often assume that the data is drawn independently from some (unknown) underlying probability distribution, i.e. (x i, y i ) A with prob. P(A).

6 6/40 Least squares & ridge regression Least squares: x 1,..., x n R d : n data points (inputs) in R d y 1,..., y n R: n real values (outputs) We look for dependence y i ω, x i n We minimize y i ω, x i 2 = y X ω 2 2 over ω R d i=1 Ridge regression: Adding regularization term λ ω 2 2 LASSO: Adding regularization term λ ω 1

7 7/40 Principal Component Analysis (PCA) PCA: Classical dimensionality-reduction technique Given data points x 1,..., x n R d minimize over subspaces L with dim L = k the distance of x i s to L; i.e. min L:dim L=k n x i P L x i 2 2 the answer is given by singular value decomposition of the n d data matrix. i=1... swiss army knife of numerical linear algebra...

8 8/40 Support Vector Machines V. N. Vapnik and A. Ya. Chervonenkis (1963) B. E. Boser, I. M. Guyon and V. N. Vapnik (1992) - non-linear C. Cortes and V. N. Vapnik (1995) - soft margin Given data points x 1,..., x n R d and labels y i { 1, 1}, separate the sets {x i : y i = 1} and {x i : y i = +1} by a linear hyperplane... i.e. f (x) = g( a, x ), where { 1, if t 0, g(t) = is known. 1, if t < 0

9 9/40 Soft margin SVM: min ω R d n (1 y i ω, x i ) + + λ ω 2 2 i=1

10 Limits of the theory... All theory, dear friend, is grey, but the golden tree of actual life springs ever green. Johann Wolfgang von Goethe It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts. Sherlock Holmes, A Scandal in Bohemia Approximation theory: The class of functions of interest is unclear. The data points sometimes can not be designed. Data analysis - learning theory: The independence is not always ensured The underlying probability (and its basic properties) are unknown The success measured by leave-out data differs from one case to the other 10/40

11 11/40 Active choice of points! link between data analysis and approximation theory Difference between regression and sampling?... active choice of the points... Nowadays, more and more data analysis is done on calculated data! In material science, material properties (like color or conductivity) are calculated ab initio from the molecular information, not measured in the lab. Typically, these properties are governed by a PDE (like Schrödinger s equation), where the structure of the material comes in as initial data.

12 12/40 Calculation of one property for one material is then a numerical solution of a parametric: PDE(u, p, x) = 0 The (unknown) functional dependence f = f (p) of the material property on the input parameters is calculated for materials described with parameters p 1,..., p m Sampling of f at p j query of the solution running an expensive simulation Nearly noise-free sampling possible! Trade-off: more noisy samples vs. fewer less noisy samples The materials with p 1,..., p m can be actively chosen!

13 13/40 Summary of the introduction Approximation theory and Data Analysis often study similar problems from different points of view The choice of approximation algorithms is free in both areas Approximation theory (usually) assumes that the data can be designed/chosen The aim of the talk is to show the benefits of the combination of both approaches

14 14/40 l 1 -norm minimization Data Analysis LASSO (least absolute shrinkage and selection operator): Tibshirani (1996) Least squares with penalization term λ ω 1 X = (x 1,..., x n ) R n d : n data points (inputs) in R d y = (y 1,..., y n ) R n : n real values (outputs) LASSO: arg min y X ω λ ω 1 weights the fit y X ω ω R d against the number of non-zero coordinates of ω Leaves open a number of questions - how many data points and how distributed are sufficient to identify/approximate ω...?

15 15/40 l 1 -norm minimization Approximation theory Compressed Sensing: Let w R d be sparse (with small number s d of non-zero coefficients on unknown positions) and consider the linear function g(x) = x, w. Design n (as small as possible) sampling points x 1,..., x n and the function values y j = g(x j ) = x j, w and find a recovery mapping : R n R d, such that (y) is equal/close to w

16 15/40 l 1 -norm minimization Approximation theory Compressed Sensing: Let w R d be sparse (with small number s d of non-zero coefficients on unknown positions) and consider the linear function g(x) = x, w. Design n (as small as possible) sampling points x 1,..., x n and the function values y j = g(x j ) = x j, w and find a recovery mapping : R n R d, such that (y) is equal/close to w Take x j s independent at random, i.e. y = Xw Use l 1 -minimization for recovery (y) = arg min ω 1, s.t. X ω = y ω R d It is enough to take n s log(d)

17 16/40 l 1 -norm minimization Approximation theory Theorem (Donoho, Candes, Romberg, Tao ) There is a constant C > 0 such that if m Cs log(d) and the components of x i are independent Gaussian random variables, then (y) = w (with high probability). Many different aspects (other x i s, noisy sampling, nearly-sparse vectors w R d, etc.) were studied intensively. Benefits for Data Analysis: Estimates of the minimal number of data points If possible, choose mostly uncorrelated data points Better information about possibilities and limits of LASSO

18 17/40 l 1 -SVM: l 1 -SVM Data Analysis P.S. Bradley, O.L. Mangasarian (1998) J. Zhu, S. Rosset, T. Hastie, R. Tibshirani (2004) Support vector machine with l 1 -penalization term X = (x 1,..., x n ) R n d : n data points (inputs) in R d y = (y 1,..., y n ) { 1, +1}: n labels l 1 -SVM: min ω R d n (1 y i ω, x i ) + + λ ω 1 i=1 Weights the quality of the classification against the number of non-zero coordinates of ω Leaves open a number of questions - how many data points and how distributed are sufficient to identify ω...?

19 18/40 l 1 -SVM Data Analysis Standard technique of sparse classification Bioinformatics gene selection microarray classification cancer classification feature selection face recognition...

20 19/40 l 1 -SVM Data Analysis Results: I. Steinwart, Support vector machines are universally consistent, J. Complexity (2002) I. Steinwart, Consistency of support vector machines and other regularized kernel classifiers, IEEE Trans. Inf. Theory (2005): The SVM s are consistent (in general setting with kernels), i.e. they can learn the hidden classifier for n almost surely; the parameter of SVM has to grow to infinity.

21 20/40 l 1 -SVM Approximation theory We want to identify sparse (or nearly sparse) ω R d We assume ω K = {x R d : x p 1}, p < 1 We are allowed to take only 1-bit measurements y i = sign( ω, x i ) - non-linear! We are allowed to design the sampling points x 1,..., x n R d and a (non-linear) recovery algorithm Recent area of 1-bit Compressed Sensing

22 21/40 l 1 -SVM Approximation theory 1-bit Compressed Sensing: Boufounos & Baraniuk (2008), Y. Plan & R. Vershynin (2013) ω K = {x R d : x 2 1, x 1 s} Result: For n Cδ 6 w(k) 2 x i N (0, id), y i = sign( ω, x i ) n ˆω := arg max y i x i, w w K i=1 ˆω ω 2 2 δ log(e/δ) with prob. 1 8 exp( cδ 2 n) Mean Gaussian width: w(k) = E sup x K K g, x

23 l 1 -SVM Approximation theory What if we insist on l 1 -SVM as the recovery algorithm? Setting: ω 2 = 1, ω 1 R x i N (0, r 2 id), y i = sign( x i, ω ), i = 1,..., n i = 1,..., n ˆω R d a minimizer of the l 1 -SVM n [1 y i x i, w ] + subject to w 1 R. min w R d i=1 Theorem (Kolleck, V. (2016)) Let 0 < ε < 0.18, r > 2π(0.57 πε) 1, n Cε 2 r 2 R 2 ln(d) Then it holds ω ˆω ˆω 2 C (ε + 1 ) 2 r with probability at least 1 exp ( C ln(d)) 22/40

24 23/40 l 1 -SVM Summary on SVM s: Non-asymptotic analysis of (l 1 -)SVM s allows to predict the limits of the algorithm For random (i.e. highly uncorrelated data) small number n of measurements/data is sufficient Motivated by the analysis, we proposed an SVM, which combines the l 1 and the l 2 penalty; this method performs better in analysis, but also in the numerical simulation! Use of the 1-Bit CS algorithm in Machine Learning?

25 24/40 Ridge functions & Neural networks Ridge functions Let g : R R and a R d \ {0}. Ridge function with ridge profile g and ridge vector a is the function f (x) := g( a, x ). Constant along the hyperplane a = {y R d : y, a = 0} and its translates. More general, if g : R k R and A R k d with k d then is a k ridge function. f (x) := g(ax)

26 25/40 Ridge functions in mathematics Kernels of important transforms (Fourier, Radon) Plane waves in PDE s: Solutions to r ( b i x a i y are of the form i=1 F (x, y) = ) F = 0 r f i (a i x + b i y). i=1 Ridgelets, curvelets, shearlets,... : wavelet-like frames capturing singularities along curves and edges (Candes, Donoho, Kutyniok, Labate,... )

27 26/40 Ridge functions in approximation theory Approximation of a function by functions from the dictionary D ridge = {ϱ( k, x b) : k R d, b R} Fundamentality Greedy algorithms Lin & Pinkus, Fundamentality of ridge functions, J. Approx. Theory 75 (1993), no. 3, Cybenko, Approximation by superpositions of a sigmoidal function, Math. Control Signals Systems 2 (1989), Leshno, Lin, Pinkus & Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function, Neural Networks 6 (1993),

28 27/40 Neural networks Motivated by biological research on human brain and neurons W. McCulloch, W. Pitts (1943); M. Minsky, S. Papert (1969) Artificial Neuron:... gets activated if a linear combination of its inputs grows over a certain threshold... Inputs x = (x 1,..., x n ) R n Weights w = (w 1,..., w n ) R n Comparing w, x with a threshold b R Plugging the result into the activation function - jump (or smoothed jump) function σ Artificial neuron is a function x σ( x, w b), where σ : R R might be σ(x) = sign(x) or σ(x) = e x /(1 + e x ), etc.

29 28/40 Neural networks Artificial neural network is a directed, acyclic graph of artificial neurons Input: x = (x 1,..., x n ) R n First layer of neurons: y 1 = σ( x, w 1 1 b1 1 ),..., y n 1 = σ( x, w 1 n 1 b 1 n 1 ) The outputs y = (y 1,..., y n1 ) become inputs for the next layer... ; last layer outputs y R Deep Learning relies on an artificial neural network with layers Training the network: given inputs x 1,..., x N and outputs y 1,..., y N and optimize over weights w s and b s Non-convex minimization over a huge space...???

30 29/40 Approximation of ridge functions (and their sums) k = 1: f (x) = g( a, x ), a 2 = 1, g smooth Approximation has two parts: approximation of g and of a Recovery of a - from f (x): f (x) = g ( a, x )a, f (0) = g (0)a. After recovering a, the problem becomes essentially one-dimensional and one can use arbitrary sampling method to approximate g. g (0) 0... g (0) = 1

31 30/40 Buhmann & Pinkus 99 Identifying linear combinations of ridge functions Approximation of functions f (x) = m g i ( a i, x ), i=1 x R d g i C 2m 1 (R), i = 1,..., m; g (2m 1) i (0) 0, i = 1,..., m; m (Du 2m 1 k Dv k f )(0) = ( u, a i ) 2m 1 k ( v, a i ) k g (2m 1) i (0) i=1 for k = 0,..., 2m 1 and v 1,..., v d R d and solving this system of equations.

32 31/40 A.Cohen, I.Daubechies, R.DeVore, G.Kerkyacharian, D.Picard, 12 Capturing ridge functions in high dimensions from point queries k = 1 : f (x) = g( a, x ) f : [0, 1] d R g C s ([0, 1]), 1 < s g C s M 0 a l d q M 1, 0 < q 1 a 0 Then { ( ) } 1 + log(d/l) 1/q 1 f ˆf CM 0 L s + M 1 L using 3L + 2 sampling points

33 32/40 First sampling along the diagonal i L 1 = i L (1,..., 1), i = 0,..., L : ( i ) ( i ) f L 1 = g L 1, a = g(i a 1 /L) Recovery of g on a grid of [0, a 1 ] Finding i 0 with largest g((i 0 + 1) a 1 /L) g(i 0 a 1 /L) Approximating D ϕj f (i 0 /L 1) = g (i 0 a 1 /L) a, ϕ j by first order differences Then recovery of a from a, ϕ 1,..., a, ϕ m by methods of compressed sensing (CS)

34 33/40 Introduction l 1 -minimization SVM s Ridge functions & Neural networks M. Fornasier, K. Schnass, J. V., Learning functions of few arbitrary linear parameters in high dimensions (2012) Put f : B(0, 1) R a 2 = 1 0 < q 1, a q c g C 2 [ 1, 1] y j := f (hϕ j) f (0) h where h > 0 is small, and g (0) a, ϕ j, j = 1,..., m Φ, m Φ d, ϕ j,k = ± 1 mφ, k = 1,..., d

35 34/40 y j are scalar products a, ϕ j corrupted by deterministic noise ã = arg min z 1, s.t. ϕ j, z = y j, j = 1,..., m Φ. z R d â = ã/ ã 2 - approximation of a ĝ is obtained by sampling f along â: ĝ(y) := f (â t), t ( 1, 1). Then ˆf (x) := ĝ( â, x ), has the approximation property f ˆf [ ( ( ) m 1 Φ q 1 2 C log(d/m Φ ) + 1 ) + h mφ ].

36 35/40 Active coordinates: R. DeVore, G. Petrova, P. Wojtaszczyk, Approximation of functions of few variables in high dimensions, Constr. Appr. 11 K. Schnass, J.V., Compressed learning of high-dimensional sparse functions, Proceedings of ICASSP 11 f (x) = g(x i1,..., x ik ) Use of low-rank matrix recovery: H. Tyagi, V. Cevher, Learning non-parametric basis independent models from point queries via low-rank methods, ACHA 14

37 36/40 I. Daubechies, M. Fornasier, J.V., Approximation of sums of ridge functions, in preparation: Sums of ridge functions Recovery of k f (x) = g j ( a j, x ) j=1 We would like to identify a 1,..., a k, then g 1,..., g k Step 1.: Sampling of f (x) = k j=1 g j ( a j, x )a j at different points gives elements of span{a 1,..., a k } R d Afterwards, we can reduce the dimension to d = k... one k-dimensional problem...

38 37/40 Recovery of individual a i s for d = k? Step 2.: Second order derivatives: 2 f (x) = k j=1 g j ( a j, x )a j a j We can recover L - an approximation of L = span{a i a i, i = 1,..., k} R k k

39 38/40 Step 3.: We try to find a i a i in L We look for matrices in L with the smallest rank We analyze the non-convex problem arg max M, s.t. M L, M F 1 Every algorithm, which is able to find a 1 a 1 can also find a j a j, j = 2,..., k, hence it must be non-convex

40 39/40 Summary Approximation theory and Machine Learning study similar problems from different points of view They can inspire/enrich one the other Non-linear, non-convex classes; non-linear information Sparsity and other structural assumptions Open problem: Geometrical properties of the class of functions, which can be modeled by neural networks N d,l = {f : R d R : f is a neural network with L layers}

41 40/40 Thank you for your attention!

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Sparse Proteomics Analysis (SPA)

Sparse Proteomics Analysis (SPA) Sparse Proteomics Analysis (SPA) Toward a Mathematical Theory for Feature Selection from Forward Models Martin Genzel Technische Universität Berlin Winter School on Compressed Sensing December 5, 2015

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Compressed Sensing in Cancer Biology? (A Work in Progress)

Compressed Sensing in Cancer Biology? (A Work in Progress) Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Approximation of Multivariate Functions

Approximation of Multivariate Functions Approximation of Multivariate Functions Vladimir Ya. Lin and Allan Pinkus Abstract. We discuss one approach to the problem of approximating functions of many variables which is truly multivariate in character.

More information

Komprimované snímání a LASSO jako metody zpracování vysocedimenzionálních dat

Komprimované snímání a LASSO jako metody zpracování vysocedimenzionálních dat Komprimované snímání a jako metody zpracování vysocedimenzionálních dat Jan Vybíral (Charles University Prague, Czech Republic) November 2014 VUT Brno 1 / 49 Definition and motivation Its use in bioinformatics

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

Infinite Ensemble Learning with Support Vector Machinery

Infinite Ensemble Learning with Support Vector Machinery Infinite Ensemble Learning with Support Vector Machinery Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology ECML/PKDD, October 4, 2005 H.-T. Lin and L. Li (Learning Systems

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Optimisation Combinatoire et Convexe.

Optimisation Combinatoire et Convexe. Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Neural Networks: Introduction

Neural Networks: Introduction Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Andreas Maletti Technische Universität Dresden Fakultät Informatik June 15, 2006 1 The Problem 2 The Basics 3 The Proposed Solution Learning by Machines Learning

More information

Deep Learning: Approximation of Functions by Composition

Deep Learning: Approximation of Functions by Composition Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Sparse Algorithms are not Stable: A No-free-lunch Theorem

Sparse Algorithms are not Stable: A No-free-lunch Theorem Sparse Algorithms are not Stable: A No-free-lunch Theorem Huan Xu Shie Mannor Constantine Caramanis Abstract We consider two widely used notions in machine learning, namely: sparsity algorithmic stability.

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Issues and Techniques in Pattern Classification

Issues and Techniques in Pattern Classification Issues and Techniques in Pattern Classification Carlotta Domeniconi www.ise.gmu.edu/~carlotta Machine Learning Given a collection of data, a machine learner eplains the underlying process that generated

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER

IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER 2015 1239 Preconditioning for Underdetermined Linear Systems with Sparse Solutions Evaggelia Tsiligianni, StudentMember,IEEE, Lisimachos P. Kondi,

More information

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 29 April, SoSe 2015 Support Vector Machines (SVMs) 1. One of

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

High-dimensional Statistical Models

High-dimensional Statistical Models High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For

More information

High-dimensional Statistics

High-dimensional Statistics High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given

More information

A Bahadur Representation of the Linear Support Vector Machine

A Bahadur Representation of the Linear Support Vector Machine A Bahadur Representation of the Linear Support Vector Machine Yoonkyung Lee Department of Statistics The Ohio State University October 7, 2008 Data Mining and Statistical Learning Study Group Outline Support

More information

Sparse Approximation and Variable Selection

Sparse Approximation and Variable Selection Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation

More information

Generalized Power Method for Sparse Principal Component Analysis

Generalized Power Method for Sparse Principal Component Analysis Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work

More information

On Ridge Functions. Allan Pinkus. September 23, Technion. Allan Pinkus (Technion) Ridge Function September 23, / 27

On Ridge Functions. Allan Pinkus. September 23, Technion. Allan Pinkus (Technion) Ridge Function September 23, / 27 On Ridge Functions Allan Pinkus Technion September 23, 2013 Allan Pinkus (Technion) Ridge Function September 23, 2013 1 / 27 Foreword In this lecture we will survey a few problems and properties associated

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

A Lower Bound Theorem. Lin Hu.

A Lower Bound Theorem. Lin Hu. American J. of Mathematics and Sciences Vol. 3, No -1,(January 014) Copyright Mind Reader Publications ISSN No: 50-310 A Lower Bound Theorem Department of Applied Mathematics, Beijing University of Technology,

More information

2.3. Clustering or vector quantization 57

2.3. Clustering or vector quantization 57 Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :

More information

Mathematical Methods in Machine Learning

Mathematical Methods in Machine Learning UMD, Spring 2016 Outline Lecture 2: Role of Directionality 1 Lecture 2: Role of Directionality Anisotropic Harmonic Analysis Harmonic analysis decomposes signals into simpler elements called analyzing

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

Interpolation via weighted l 1 minimization

Interpolation via weighted l 1 minimization Interpolation via weighted l 1 minimization Rachel Ward University of Texas at Austin December 12, 2014 Joint work with Holger Rauhut (Aachen University) Function interpolation Given a function f : D C

More information

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection An Improved 1-norm SVM for Simultaneous Classification and Variable Selection Hui Zou School of Statistics University of Minnesota Minneapolis, MN 55455 hzou@stat.umn.edu Abstract We propose a novel extension

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

SPARSE SIGNAL RESTORATION. 1. Introduction

SPARSE SIGNAL RESTORATION. 1. Introduction SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

9 Classification. 9.1 Linear Classifiers

9 Classification. 9.1 Linear Classifiers 9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive

More information

Approximation theory in neural networks

Approximation theory in neural networks Approximation theory in neural networks Yanhui Su yanhui su@brown.edu March 30, 2018 Outline 1 Approximation of functions by a sigmoidal function 2 Approximations of continuous functionals by a sigmoidal

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find

More information

Fast Hard Thresholding with Nesterov s Gradient Method

Fast Hard Thresholding with Nesterov s Gradient Method Fast Hard Thresholding with Nesterov s Gradient Method Volkan Cevher Idiap Research Institute Ecole Polytechnique Federale de ausanne volkan.cevher@epfl.ch Sina Jafarpour Department of Computer Science

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

Kernel Methods & Support Vector Machines

Kernel Methods & Support Vector Machines Kernel Methods & Support Vector Machines Mahdi pakdaman Naeini PhD Candidate, University of Tehran Senior Researcher, TOSAN Intelligent Data Miners Outline Motivation Introduction to pattern recognition

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Recovering overcomplete sparse representations from structured sensing

Recovering overcomplete sparse representations from structured sensing Recovering overcomplete sparse representations from structured sensing Deanna Needell Claremont McKenna College Feb. 2015 Support: Alfred P. Sloan Foundation and NSF CAREER #1348721. Joint work with Felix

More information

arxiv: v1 [cs.it] 21 Feb 2013

arxiv: v1 [cs.it] 21 Feb 2013 q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Is the test error unbiased for these programs?

Is the test error unbiased for these programs? Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x

More information

Discriminative Learning and Big Data

Discriminative Learning and Big Data AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk Model-Based Compressive Sensing for Signal Ensembles Marco F. Duarte Volkan Cevher Richard G. Baraniuk Concise Signal Structure Sparse signal: only K out of N coordinates nonzero model: union of K-dimensional

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction

Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee Predictive

More information

Oslo Class 6 Sparsity based regularization

Oslo Class 6 Sparsity based regularization RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity

More information

Comparing Robustness of Pairwise and Multiclass Neural-Network Systems for Face Recognition

Comparing Robustness of Pairwise and Multiclass Neural-Network Systems for Face Recognition Comparing Robustness of Pairwise and Multiclass Neural-Network Systems for Face Recognition J. Uglov, V. Schetinin, C. Maple Computing and Information System Department, University of Bedfordshire, Luton,

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

Exponential decay of reconstruction error from binary measurements of sparse signals

Exponential decay of reconstruction error from binary measurements of sparse signals Exponential decay of reconstruction error from binary measurements of sparse signals Deanna Needell Joint work with R. Baraniuk, S. Foucart, Y. Plan, and M. Wootters Outline Introduction Mathematical Formulation

More information

An Overview of Sparsity with Applications to Compression, Restoration, and Inverse Problems

An Overview of Sparsity with Applications to Compression, Restoration, and Inverse Problems An Overview of Sparsity with Applications to Compression, Restoration, and Inverse Problems Justin Romberg Georgia Tech, School of ECE ENS Winter School January 9, 2012 Lyon, France Applied and Computational

More information

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina INDUSTRIAL MATHEMATICS INSTITUTE 2007:08 A remark on compressed sensing B.S. Kashin and V.N. Temlyakov IMI Preprint Series Department of Mathematics University of South Carolina A remark on compressed

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Signal Recovery, Uncertainty Relations, and Minkowski Dimension

Signal Recovery, Uncertainty Relations, and Minkowski Dimension Signal Recovery, Uncertainty Relations, and Minkowski Dimension Helmut Bőlcskei ETH Zurich December 2013 Joint work with C. Aubel, P. Kuppinger, G. Pope, E. Riegler, D. Stotz, and C. Studer Aim of this

More information

Super-resolution via Convex Programming

Super-resolution via Convex Programming Super-resolution via Convex Programming Carlos Fernandez-Granda (Joint work with Emmanuel Candès) Structure and Randomness in System Identication and Learning, IPAM 1/17/2013 1/17/2013 1 / 44 Index 1 Motivation

More information

Is the test error unbiased for these programs? 2017 Kevin Jamieson

Is the test error unbiased for these programs? 2017 Kevin Jamieson Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning

More information

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

ACTIVE LEARNING OF SELF-CONCORDANT LIKE MULTI-INDEX FUNCTIONS

ACTIVE LEARNING OF SELF-CONCORDANT LIKE MULTI-INDEX FUNCTIONS ACTIVE LEARNING OF SELF-CONCORDANT LIKE MULTI-INDEX FUNCTIONS Ilija Bogunovic Volkan Cevher Jarvis Haupt Jonathan Scarlett LIONS, EPFL, Lausanne, Switzerland {ilija.bogunovic, volkan.cevher, jonathan.scarlett}@epfl.ch

More information

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de

More information

Lecture Notes 5: Multiresolution Analysis

Lecture Notes 5: Multiresolution Analysis Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Mini-project in scientific computing

Mini-project in scientific computing Mini-project in scientific computing Eran Treister Computer Science Department, Ben-Gurion University of the Negev, Israel. March 7, 2018 1 / 30 Scientific computing Involves the solution of large computational

More information