Ventral Visual Stream and Deep Networks

Size: px
Start display at page:

Download "Ventral Visual Stream and Deep Networks"

Transcription

1 Ma191b Winter 2017 Geometry of Neuroscience

2 References for this lecture: Tomaso A. Poggio and Fabio Anselmi, Visual Cortex and Deep Networks, MIT Press, 2016 F. Cucker, S. Smale, On the mathematical foundations of learning, Bulletin of the American Math. Society 39 (2001) N.1, 1 49.

3 Modeling Ventral Visual Stream via Deep Neural Networks Ventral Visual Stream considered responsible for object recognition abilities dorsal (green) and ventral (purple) visual streams responsible for first 100msc time of processing visual information from initial visual stimulus to activation of inferior temporal cortex neurons

4 mathematical model describing learning of invariant representations in the Ventral Visual Stream working hypothesis: main computational goal of the Ventral Visual Stream is compute neural representations of images that are invariant with respect to certain groups of transformations (mostly affine transformations: translations, rotations, scaling) model based on unsupervised learning far fewer examples are needed to train a classifier for recognition if using an invariant representation Gabor functions and frames optimal templates for simultaneously maximizing invariance with respect to translations and scaling architecture: hierarchy of Hubel Wiesel modules

5 a significant difference between (supervised) learning algorithms and functioning of the brain is that learning in the brain seems to require a very small number of labelled examples conjecture: key to reducing sample complexity of object recognition is invariance under transformations two aspects: recognition and categorization for recognition it is clear that complexity is greatly increased by transformations (same objects seen from different perspectives, in different light conditions, etc.) for categorizations also (distinguishing between different classes of objects: cats/dogs, etc.) transformations can hide intrinsic characteristics of an object empirical evidence: accuracy of a classifier per number of examples greatly improved in the presence of an oracle that factors out transformations (solid curve, rectified; dashed, non-rectified)

6

7 order of magnitude for number of different object categorizations (e.g. distinguishable different types of dogs) smaller than magnitude for different viewpoints generated by group actions reducing the variability by transformations makes greatly reduces the learning task complexity refer to sample complexity as number of examples needed for estimating a target function within an assigned error rate transform problem of distinguishing images into problem of distinguishing orbits under a given group action

8 Feedforward architecture in the ventral stream two main stages 1 retinotopic areas computing a representation that is invariant under affine transformations 2 approximate invariance to other object-specific dependencies, not described by group actions (parallel pathways) first stage realized through Gabor frames analysis overall model relies on a mathematical model of learning (Cucker-Smale)

9 architecture layers: red circle = vector computed by one of the modules, double arrow = its receptive field; image at level zero (bottom), vector computed at top layer consists of invariant features (fed as input to a supervised learning classifier)

10 biologically plausible algorithm (Hubel Wiesel modules) two types of neurons roles: simple cells: perform an operation of inner product with a template t H Hilbert space; a further non-linear operation (a threshold) is also applied complex cells: aggregate the outputs of several simple cells steps: (assume G finite subgroup of affine transformations) 1 unsupervised learning of group G by storing memory of orbit G t = {gt : g G} of a set of templates t H 2 computation of invariant representation: new image I H compute gt k, I for g G and t k, k = 1,..., K templates and µ k h (I) = 1 σ h ( gt k, I ) #G g G σ h a set of nonlinear functions (e.g. threshold functions)

11 computed µ k h (I) called signature of I signature µ k h (I) clearly G-invariant Selectivity Question: how well does µ k h (I) distinguish different objects? different meaning G I G I Main Selectivity Result (Poggio-Anselmi) want to be able to distinguish images within a given set of N images I, with an error of at most a given ɛ > 0 the signatures µ k h (I) can ɛ-approximate the distance between pairs among the N images with probability 1 δ provided that the number of templates used is at least K > c ɛ 2 log N δ more detailed discussion of this statement below; main point: need of the order of log(n) templates to distinguish N images

12 General problem: when two sets of random variables x, y are probabilistically related relation described by probability distribution P(x, y) some square loss problem (minimization problem) E(f ) = (y f (x)) 2 P(x, y) dx dy distribution itself unknown, but minimize empirical error E N (f ) = 1 N N (y i f (x i )) 2 i=1 over a set of random sampled data points {(x i, y i )} i=1,...,n if f N minimizes empirical error, want that the probability P( E(f N ) E N (f N ) > ɛ) is sufficiently small Problem depends on the function space where f N lives

13 General setting F. Cucker, S. Smale, On the mathematical foundations of learning, Bulletin of the American Math. Society 39 (2001) N.1, X compact manifold, Y = R k (for simplicity k = 1), Z = X Y with Borel measure ρ ξ random variable (real valued) on probability space (Z, ρ) expectation value and variance E(ξ) = ξ dρ, σ 2 (ξ) = E((ξ E(ξ)) 2 ) = E(ξ 2 ) E(ξ) 2 Z function f : X Y, least squares error of f E(f ) = (f (x) y) 2 dρ Z measures average error incurred in using f (x) as a model of the dependence between y and x Problem: how to minimize the error?

14 conditional probability ρ(y x) (probability measure on Y ) marginal probability ρ X (S) = ρ(π 1 (S)) on X, with projection π : Z = X Y X relation between these measures ( φ(x, y) dρ = Z X Y ) φ(x, y) dρ(y x) dρ X breaking of ρ(x, y) into ρ(y x) and ρ X (S) is breaking of Z into input X and output Y

15 regression function f ρ : X Y f ρ (x) = y dρ(y x) assumption: f ρ is bounded for fixed x X map Y to R via Y y y f ρ (x) expectation value is zero so variance σ 2 (x) = (y f ρ (x)) 2 dρ(y x) averaged variance σρ 2 = X Y σ 2 (x) dρ X = E(f ρ ) measures how well conditioned ρ is Note: in general ρ and f ρ not known but ρ X known

16 error, regression, and variance: E(f ) = ((f (x) f ρ (x)) 2 + σρ) 2 dρ X X What this says: σ 2 ρ is a lower bound for the error E(f ) for all f, and f = f ρ has the smallest possible error (which depends only on ρ) why identity holds:

17 Goal: learn (= find a good approximation for) f ρ given random samples of Z Z N z = ((x 1, y 1 ),..., (x N, y N )) sample set of points (x i, y i ) independently drawn with probability ρ empirical error E z (f ) = 1 N N (f (x i ) y i ) 2 i=1 for random variable ξ empirical mean E z (ξ) = 1 N N ξ(z i, y i ) i=1 given f : X Y take f Y : Z Y to be f Y : (x, y) f (x) y E(f ) = E(f 2 Y ), E z(f ) = E z (f 2 Y )

18 Facts of Probability Theory (quantitative versions of law of large numbers) ξ random variable on probability space Z with mean E(ξ) = µ and variance σ 2 (ξ) σ 2 Chebyshev: for all ɛ > 0 { P z Z m 1 : m } m ξ(z i ) µ ɛ i=1 σ2 mɛ 2 Bernstein: if ξ(z) E(ξ) M for almost all z Z then ɛ > 0 { } ( ) P z Z m 1 m : ξ(z i ) µ m ɛ mɛ 2 2 exp 2(σ Mɛ) i=1 Hoeffding: { P z Z m : 1 m } m ξ(z i ) µ ɛ i=1 ) 2 exp ( mɛ2 2M 2

19 Defect Function of f : X Y L z (f ) := E(f ) E z (f ) discrepancy between error and empirical error (only E z (f ) measured directly) estimate of defect if f (x) y M almost everywhere, then ɛ > 0, with σ 2 variance of fy 2 ( ) P{z Z m mɛ 2 : L z (f ) ɛ} 1 2ɛ exp 2(σ M2 ɛ) from previous Bernstein estimate taking ξ = f 2 Y when is f (x) y M a.e. satisfied? e.g. for M = M ρ + P M ρ = inf{ M : {(x, y) Z : y f ρ (x) M} measure zero } P sup f (x) f ρ (x) x X

20 Hypothesis Space a learning process requires a datum of a class of functions (hypothesis space) within which the best approximation for f ρ C(X ) algebra of continuous functions on topological space X H C(X ) compact subset (not necessarily subalgebra) look for minimizer (not necessarily unique) f H = argmin f H (f (x) y) 2 because E(f ) = X (f f ρ) 2 + σρ 2 also minimizer f H = argmin f H (f f ρ ) 2 continuity: if for f H have f (x) y M a.e., bounds E(f 1 ) E(f 2 ) 2M f 1 f 2 and for E z also, so E and E z continuous compactness of H ensures existence of minimizer but not uniqueness (a uniqueness result when H convex) Z X

21 Empirical target function f H,z minimizer (non unique in general) f H,z = argmin f H 1 m m (f (x i ) y i ) 2 i=1 Normalized Error E H (f ) = E(f ) E(f H ) E H (f ) 0 vanishing at f H Sample Error E H (f H,z ) E(f H,z ) = E H (f H,z ) + E(f H ) = X (f H,z f ρ ) 2 + σ 2 ρ estimating E(f H,z ) by estimating sample and approximation errors, E H (f H,z ) and E(f H ) one on H the other independent of sample z

22 bias-variance trade-off bias = approximation error; variance = sample error fix H: sample error E H (f H,z ) decreases by increasing number m of samples fix m: approximation error E(f H ) decreases when enlarging H procedure: 1 estimate how close f H,z and f H depending on m 2 how to choose dim H when m is fixed first problem: how many examples need to draw to say with confidence 1 δ that X (f H,z f H ) 2 ɛ?

23 Uniformity Estimate (Vapnik s Statistical Learning Theory) covering number: S metric space, s > 0, number N (S, s) minimal l N so that disks in S radii s covering S; for S compact N (S, s) finite uniform estimate: H C(X ) compact, if for all f H have f (x) y M a.e., then ɛ > 0 ( ) P{z Z m ɛ mɛ 2 : sup L z (f ) ɛ} 1 N (H, )2 exp f H 8M 4(2σ M2 ɛ) with σ 2 = sup f H σ 2 (f 2 Y ) main idea: like previous estimate of defect but passing from a single function to a family of functions, using a uniformity based on covering number

24 Estimate of Sample Error H C(X ) compact, with f (x) y M a.e. for all f H, and σ 2 = sup f H σ 2 (fy 2 ), then ɛ > 0 ( ) P{z Z m ɛ mɛ 2 : E H (f z ) ɛ} 1 N (H, )2 exp 16M 8(4σ M2 ɛ) obtained from previous estimate using L z (f ) = E(f ) E z (f ) so answer to first question: to ensure probability above 1 δ need to take at least m 8(4σ M2 ɛ) ɛ 2 ( log(2n (H, ) ɛ 16M )) + log(1 δ ) obtained by setting ( ) ɛ mɛ 2 δ = N (H, )2 exp 16M 8(4σ M2 ɛ) need various techniques for estimating covering numbers N (H, s) depending on the choice of the compact set H

25 Second Question: Estimating the Approximation Error E(f H,z ) = E H (f H,z ) + E(f H ) focus on E(f H ), which depends on H and ρ (f H f ρ ) 2 + σρ 2 X second term independent of H so focus on first; f ρ bounded, but not in H nor necessarily in C(X ) Main idea: use finite dimensional hypothesis space H; estimate in terms of growth of eigenvalues of an operator Main technique: Fourier analysis; Hilbert spaces

26 Fourier Series: start with case of X = T n = (S 1 ) n torus Hilbert space L 2 (X ) Lebesgue measure with complete orthonormal system φ α (x) = (2π) n/2 exp(iα x), α = (α 1,..., α n ) Z n Fourier series expansion f = α Z n c α φ α finite dimensional subspaces H N L 2 (X ) spanned by φ α with α B, dimension N(B) number of lattice points in ball radius B in R n N(B) (2B) n/2 H hypothesis space: ball H N,R of radius R in norm in H N

27 Laplacian on torus X = T n Laplacian : C (X ) C (X ) (f ) = n 2 f x 2 i=1 i Fourier series basis φ α are eigenfunctions of with eigenvalue α 2 more general X : bounded domain X R n with smooth boundary X and a complete orthonormal system φ k of L 2 (X ) (Lebesgue measure) of eigenfunctions of Laplacian with (φ k ) = ζ k φ k, φ k X 0, k 1 0 < ζ 1 ζ 2 ζ k subspace H N of L 2 (X ) generated by {φ 1,..., φ N } hypothesis space H = H N,R ball of radius R for in H N

28 Construction of f H Lebesgue measure µ on X and measure ρ (marginal probability ρ X induced by ρ on Z = X Y ) consider regression function f ρ (x) = Y y dρ(y x) assumption f ρ bounded on X so in L 2 ρ(x ) and in L 2 µ(x ) choice of R: assume also that R f ρ, which implies R f ρ ρ then f H is orthogonal projection of f ρ onto H N using inner product in L 2 ρ(x ) goal: estimate approximation error E(f H ) for this f H

29 Distorsion factor: identity function on bounded functions extends to J : L 2 µ(x ) L 2 ρ(x ) distorsion of ρ with respect to µ D ρµ = J operator norm: how much ρ distorts the ambient measure µ reasonable assumption: distorsion is finite in general ρ not known, but ρ X is known, so D ρµ can be computed

30 Weyl Law Weyl law on rate of growth of eigenvalues of the Laplacian (acting on functions vanishing on boundary of domain X R n ) N(λ) lim λ λ n/2 = (2π) n B n Vol(X ) B n volume of unit ball in R n ; N(λ) number of eigenvalues (with multiplicity) up to λ Weyl law: Li Yau version ζ k n n + 2 4π2 ( k B n Vol(X ) ) 2/n P. Li and S.-T. Yau, On the parabolic kernel of the Schrödinger operator, Acta Math. 156 (1986), from this get a weaker estimate, using explicit volume B n ( ) k 2/n ζ k Vol(X )

31 Approximation Error and Weyl Law norm K : for f = k=1 c k φ k with φ k eigenfunctions of ( ) 1/2 f K := ck 2 ζ k k=1 like L 2 -norm but weighted by eigenvalues of Laplacian in l 2 measure of c = (c k ) Approximation Error Estimate: for H and f H as above ( ) k 2/n E(f H ) Dρµ 2 f ρ 2 K Vol(X ) + σ2 ρ proved using Weyl law and estimates d µ (f ρ, H N ) 2 = f ρ f H ρ = d ρ (f ρ, H N ) J d µ (f ρ, H N ) k=n+1 where f ρ = k c kφ k c k φ k 2 µ = k=n+1 c 2 k = k=n+1 c 2 k ζ k 1 ζ k 1 ζ N+1 f ρ 2 K

32 Solution of the bias-variance problem mimimize E(f H,z ) by minimizing both sample error and approximation error minimization as a function of N N (for the choice of hypothesis space H = H N,R ) select integer N N that minimizes A(N) + ɛ(n) where ɛ = ɛ(n) as in previous estimate of sample error and A(N) = D 2 ρµ ( ) 2/n k f ρ 2 K + σρ 2 Vol(X ) from previous relation between m, R = f ρ, δ and ɛ obtain ( ɛ 288M2 N log( 96RM ) log( 1 ) m ɛ δ ) 0 find N that minimizes ɛ with this constraint no explicit closed form solution for N minimizing A(N) + ɛ(n) but can be estimated numerically in specific cases

33 back to the visual cortex modeling (Poggio-Anselmi) stored templates t k, k = 1,..., K and new images I in some finite dimensional approximation H N to a Hilbert space simple cells perform inner products gt k, I in H N estimate in terms of 1D-projections: I R d some in general large d; projections t k, I for a set of normalized vectors t k S d 1 (unit sphere) Z : S d 1 R +, Z(t) = µ t (I) µ t (I ) distance between images d(i, I ) think of as a distance between two probability distributions P I, P I on R d measure distance in terms of d(p I, P I ) Z(t) dvol(t) S d 1

34 model this in terms of ˆd(P I, P I ) := 1 K K Z(t k ) k=1 want to evaluate the error incurred in using ˆd(P I, P ci ) (1D projections and templates) to estimate d(p I, P I ) as in the Cucker-Smale setting, evaluate the error and the probability of error in terms of the Hoeffding estimate 1 K d(p I, P I ) ˆd(P I, P ci ) = Z(t k ) E(Z) K k=1 probability of error ( ) 1 K P Z(t k ) E(Z) K > ɛ k=1 if a.e. bound Z(t) E(Z) M 2e Kɛ2 2M 2

35 want this estimate to hold uniformly over a set of N images: want same bound to hold over each pair so error probability is at most ) ) N(N 1) exp ( Kɛ2 2Mmin 2 N 2 exp ( Kɛ2 2Mmin 2 δ 2 with M min the smallest M over all pairs This is at most a given δ 2 whenever K 4M2 min ɛ 2 log N δ

36 Group Actions and Orbits {t k } k=1,...,k given templates G finite subgroup of the affine group (translations, rotations, scaling) G acts on set of images I: orbit GI projection P : R d R K of images I onto span of templates t k Johnson Lindenstrauss lemma: low distorsion embeddings of sets of points from a high-dimensional to a low-dimensional Euclidean space (special case with map an orthogonal projection) given 0 < ɛ < 1; given finite set X of n points in R d take K > 8 log(n)/ɛ 2 then there is a linear map f given by a multiple of an orthogonal projection onto a (random) subspace of dimension K such that, for all u, v X (1 ɛ) u v 2 R d f (u) f (v) 2 R K (1 + ɛ) u v 2 R d result depends on concentration of measure phenomenon

37 up to a scaling, for a good choice of subspace spanned by templates, can take P to satisfy Johnson-Lindenstrauss lemma starting from finite set X = {u} of images, can generate another set by including all group translates X G = {g u : g G, u X } then for Johnson-Lindenstrauss lemma required accuracy for X G log(n #G) K > 8 ɛ 2 so can estimate sufficiently well the distance between images in R d using the distance between projections t k, gi of their group orbits onto the space of templates by t k, gi = g 1 t k, I for unitary representations it would seem one needs to increase by K #G K the number of templates to distinguish orbits, but in fact by argument above need an increase K K + 8 log(#g)/ɛ 2

38 given t k, gi = g 1 t k, I computed by the simple cells, pooling by complex cells by computing µ k h (I) = 1 #G σ h ( gt k, I ) g G σ h a set of nonlinear functions: examples µ k average(i) = 1 #G g G gt k, I µ k energy(i) = 1 #G g G gtk, I 2 µ k max(i) = max g G gt k, I other nonlinear functions: especially useful case, when σ h : R R + is injective Note: stored knowledge of gt k for g G allows the system to be automatically invariant wrt G action on images I

39 Localization and uncertainty principle would like templates t(x) to be localized in x: small outside of some interval x would also like ˆt to be localized in frequency: small outside an interval ω but... uncertainty principle: localized in x / delocalized in ω x ω 1

40 Optimal localization optimal possible localization when x ω = 1 realized by the Gabor functions t(x) = e iω 0x e x2 2σ 2

41 each cell applies a Gabor filter; plotted n y /n x anisotropy ratios

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

TUM 2016 Class 1 Statistical learning theory

TUM 2016 Class 1 Statistical learning theory TUM 2016 Class 1 Statistical learning theory Lorenzo Rosasco UNIGE-MIT-IIT July 25, 2016 Machine learning applications Texts Images Data: (x 1, y 1 ),..., (x n, y n ) Note: x i s huge dimensional! All

More information

1. Mathematical Foundations of Machine Learning

1. Mathematical Foundations of Machine Learning 1. Mathematical Foundations of Machine Learning 1.1 Basic Concepts Definition of Learning Definition [Mitchell (1997)] A computer program is said to learn from experience E with respect to some class of

More information

On invariance and selectivity in representation learning

On invariance and selectivity in representation learning Information and Inference Advance Access published May 9, 2016 Information and Inference: A Journal of the IMA (2016) Page 1 of 25 doi:10.1093/imaiai/iaw009 On invariance and selectivity in representation

More information

Unsupervised Learning of Invariant Representations in Hierarchical Architectures

Unsupervised Learning of Invariant Representations in Hierarchical Architectures Unsupervised Learning of nvariant Representations in Hierarchical Architectures Fabio Anselmi, Joel Z Leibo, Lorenzo Rosasco, Jim Mutch, Andrea Tacchetti, and Tomaso Poggio stituto taliano di Tecnologia,

More information

RegML 2018 Class 8 Deep learning

RegML 2018 Class 8 Deep learning RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x

More information

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite

More information

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators

More information

Eigenvalues and eigenfunctions of the Laplacian. Andrew Hassell

Eigenvalues and eigenfunctions of the Laplacian. Andrew Hassell Eigenvalues and eigenfunctions of the Laplacian Andrew Hassell 1 2 The setting In this talk I will consider the Laplace operator,, on various geometric spaces M. Here, M will be either a bounded Euclidean

More information

Class Learning Data Representations: beyond DeepLearning: the Magic Theory. Tomaso Poggio. Thursday, December 5, 13

Class Learning Data Representations: beyond DeepLearning: the Magic Theory. Tomaso Poggio. Thursday, December 5, 13 Class 24-26 Learning Data Representations: beyond DeepLearning: the Magic Theory Tomaso Poggio Connection with the topic of learning theory 2 Notices of the American Mathematical Society (AMS), Vol. 50,

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

On Invariance and Selectivity in Representation Learning

On Invariance and Selectivity in Representation Learning CBMM Memo No. 029 March 23, 2015 On Invariance and Selectivity in Representation Learning by Fabio Anselmi 1,2, Lorenzo Rosasco 1,2,3, Tomaso Poggio 1,2 1 Center for Brains, Minds and Machines, McGovern

More information

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space 1 Professor Carl Cowen Math 54600 Fall 09 PROBLEMS 1. (Geometry in Inner Product Spaces) (a) (Parallelogram Law) Show that in any inner product space x + y 2 + x y 2 = 2( x 2 + y 2 ). (b) (Polarization

More information

Representation Learning in Sensory Cortex: a theory

Representation Learning in Sensory Cortex: a theory CBMM Memo No. 026 November 14, 2014 Representation Learning in Sensory Cortex: a theory by Fabio Anselmi, and Tomaso Armando Poggio, Center for Brains, Minds and Machines, McGovern Institute, Massachusetts

More information

QUALIFYING EXAMINATION Harvard University Department of Mathematics Tuesday September 21, 2004 (Day 1)

QUALIFYING EXAMINATION Harvard University Department of Mathematics Tuesday September 21, 2004 (Day 1) QUALIFYING EXAMINATION Harvard University Department of Mathematics Tuesday September 21, 2004 (Day 1) Each of the six questions is worth 10 points. 1) Let H be a (real or complex) Hilbert space. We say

More information

Overview of normed linear spaces

Overview of normed linear spaces 20 Chapter 2 Overview of normed linear spaces Starting from this chapter, we begin examining linear spaces with at least one extra structure (topology or geometry). We assume linearity; this is a natural

More information

Class 2 & 3 Overfitting & Regularization

Class 2 & 3 Overfitting & Regularization Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating

More information

Neural Codes and Neural Rings: Topology and Algebraic Geometry

Neural Codes and Neural Rings: Topology and Algebraic Geometry Neural Codes and Neural Rings: Topology and Algebraic Geometry Ma191b Winter 2017 Geometry of Neuroscience References for this lecture: Curto, Carina; Itskov, Vladimir; Veliz-Cuba, Alan; Youngs, Nora,

More information

Qualifying Exams I, 2014 Spring

Qualifying Exams I, 2014 Spring Qualifying Exams I, 2014 Spring 1. (Algebra) Let k = F q be a finite field with q elements. Count the number of monic irreducible polynomials of degree 12 over k. 2. (Algebraic Geometry) (a) Show that

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

MAT 578 FUNCTIONAL ANALYSIS EXERCISES

MAT 578 FUNCTIONAL ANALYSIS EXERCISES MAT 578 FUNCTIONAL ANALYSIS EXERCISES JOHN QUIGG Exercise 1. Prove that if A is bounded in a topological vector space, then for every neighborhood V of 0 there exists c > 0 such that tv A for all t > c.

More information

Geometry on Probability Spaces

Geometry on Probability Spaces Geometry on Probability Spaces Steve Smale Toyota Technological Institute at Chicago 427 East 60th Street, Chicago, IL 60637, USA E-mail: smale@math.berkeley.edu Ding-Xuan Zhou Department of Mathematics,

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

THE COMPUTATIONAL MAGIC OF THE VENTRAL STREAM

THE COMPUTATIONAL MAGIC OF THE VENTRAL STREAM December 6, 2011 THE COMPUTATIONAL MAGIC OF THE VENTRAL STREAM Online archived report: usage instructions. This is version 2 of a report first published online on July 20, 2011 (npre.2011.6117.1) and it

More information

I-theory on depth vs width: hierarchical function composition

I-theory on depth vs width: hierarchical function composition CBMM Memo No. 041 December 29, 2015 I-theory on depth vs width: hierarchical function composition by Tomaso Poggio with Fabio Anselmi and Lorenzo Rosasco Abstract Deep learning networks with convolution,

More information

Peak Point Theorems for Uniform Algebras on Smooth Manifolds

Peak Point Theorems for Uniform Algebras on Smooth Manifolds Peak Point Theorems for Uniform Algebras on Smooth Manifolds John T. Anderson and Alexander J. Izzo Abstract: It was once conjectured that if A is a uniform algebra on its maximal ideal space X, and if

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Unsupervised learning of invariant representations with low sample complexity: the magic of sensory cortex or a new framework for machine learning?

Unsupervised learning of invariant representations with low sample complexity: the magic of sensory cortex or a new framework for machine learning? CBMM Memo No. 001 March 1, 014 arxiv:1311.4158v5 [cs.cv] 11 Mar 014 Unsupervised learning of invariant representations with low sample complexity: the magic of sensory cortex or a new framework for machine

More information

Analysis Preliminary Exam Workshop: Hilbert Spaces

Analysis Preliminary Exam Workshop: Hilbert Spaces Analysis Preliminary Exam Workshop: Hilbert Spaces 1. Hilbert spaces A Hilbert space H is a complete real or complex inner product space. Consider complex Hilbert spaces for definiteness. If (, ) : H H

More information

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms (February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops

More information

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008 Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections

More information

Solutions: Problem Set 4 Math 201B, Winter 2007

Solutions: Problem Set 4 Math 201B, Winter 2007 Solutions: Problem Set 4 Math 2B, Winter 27 Problem. (a Define f : by { x /2 if < x

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Variations on Quantum Ergodic Theorems. Michael Taylor

Variations on Quantum Ergodic Theorems. Michael Taylor Notes available on my website, under Downloadable Lecture Notes 8. Seminar talks and AMS talks See also 4. Spectral theory 7. Quantum mechanics connections Basic quantization: a function on phase space

More information

Problem Set 6: Solutions Math 201A: Fall a n x n,

Problem Set 6: Solutions Math 201A: Fall a n x n, Problem Set 6: Solutions Math 201A: Fall 2016 Problem 1. Is (x n ) n=0 a Schauder basis of C([0, 1])? No. If f(x) = a n x n, n=0 where the series converges uniformly on [0, 1], then f has a power series

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

RANDOM FIELDS AND GEOMETRY. Robert Adler and Jonathan Taylor

RANDOM FIELDS AND GEOMETRY. Robert Adler and Jonathan Taylor RANDOM FIELDS AND GEOMETRY from the book of the same name by Robert Adler and Jonathan Taylor IE&M, Technion, Israel, Statistics, Stanford, US. ie.technion.ac.il/adler.phtml www-stat.stanford.edu/ jtaylor

More information

REAL AND COMPLEX ANALYSIS

REAL AND COMPLEX ANALYSIS REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books. Applied Analysis APPM 44: Final exam 1:3pm 4:pm, Dec. 14, 29. Closed books. Problem 1: 2p Set I = [, 1]. Prove that there is a continuous function u on I such that 1 ux 1 x sin ut 2 dt = cosx, x I. Define

More information

Techinical Proofs for Nonlinear Learning using Local Coordinate Coding

Techinical Proofs for Nonlinear Learning using Local Coordinate Coding Techinical Proofs for Nonlinear Learning using Local Coordinate Coding 1 Notations and Main Results Denition 1.1 (Lipschitz Smoothness) A function f(x) on R d is (α, β, p)-lipschitz smooth with respect

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Takens embedding theorem for infinite-dimensional dynamical systems

Takens embedding theorem for infinite-dimensional dynamical systems Takens embedding theorem for infinite-dimensional dynamical systems James C. Robinson Mathematics Institute, University of Warwick, Coventry, CV4 7AL, U.K. E-mail: jcr@maths.warwick.ac.uk Abstract. Takens

More information

1. If 1, ω, ω 2, -----, ω 9 are the 10 th roots of unity, then (1 + ω) (1 + ω 2 ) (1 + ω 9 ) is A) 1 B) 1 C) 10 D) 0

1. If 1, ω, ω 2, -----, ω 9 are the 10 th roots of unity, then (1 + ω) (1 + ω 2 ) (1 + ω 9 ) is A) 1 B) 1 C) 10 D) 0 4 INUTES. If, ω, ω, -----, ω 9 are the th roots of unity, then ( + ω) ( + ω ) ----- ( + ω 9 ) is B) D) 5. i If - i = a + ib, then a =, b = B) a =, b = a =, b = D) a =, b= 3. Find the integral values for

More information

SPHERICAL UNITARY REPRESENTATIONS FOR REDUCTIVE GROUPS

SPHERICAL UNITARY REPRESENTATIONS FOR REDUCTIVE GROUPS SPHERICAL UNITARY REPRESENTATIONS FOR REDUCTIVE GROUPS DAN CIUBOTARU 1. Classical motivation: spherical functions 1.1. Spherical harmonics. Let S n 1 R n be the (n 1)-dimensional sphere, C (S n 1 ) the

More information

Tutorial on Machine Learning for Advanced Electronics

Tutorial on Machine Learning for Advanced Electronics Tutorial on Machine Learning for Advanced Electronics Maxim Raginsky March 2017 Part I (Some) Theory and Principles Machine Learning: estimation of dependencies from empirical data (V. Vapnik) enabling

More information

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee 9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic

More information

Quantum ergodicity. Nalini Anantharaman. 22 août Université de Strasbourg

Quantum ergodicity. Nalini Anantharaman. 22 août Université de Strasbourg Quantum ergodicity Nalini Anantharaman Université de Strasbourg 22 août 2016 I. Quantum ergodicity on manifolds. II. QE on (discrete) graphs : regular graphs. III. QE on graphs : other models, perspectives.

More information

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that

More information

McGill University Department of Mathematics and Statistics. Ph.D. preliminary examination, PART A. PURE AND APPLIED MATHEMATICS Paper BETA

McGill University Department of Mathematics and Statistics. Ph.D. preliminary examination, PART A. PURE AND APPLIED MATHEMATICS Paper BETA McGill University Department of Mathematics and Statistics Ph.D. preliminary examination, PART A PURE AND APPLIED MATHEMATICS Paper BETA 17 August, 2018 1:00 p.m. - 5:00 p.m. INSTRUCTIONS: (i) This paper

More information

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING. Introduction

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING. Introduction BULLETIN (New Series) OF THE AMERICAN MATHEMATICAL SOCIETY Volume 39, Number 1, Pages 1 49 S0273-0979(01)00923-5 Article electronically published on October 5, 2001 ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

More information

9 Radon-Nikodym theorem and conditioning

9 Radon-Nikodym theorem and conditioning Tel Aviv University, 2015 Functions of real variables 93 9 Radon-Nikodym theorem and conditioning 9a Borel-Kolmogorov paradox............. 93 9b Radon-Nikodym theorem.............. 94 9c Conditioning.....................

More information

RUTGERS UNIVERSITY GRADUATE PROGRAM IN MATHEMATICS Written Qualifying Examination August, Session 1. Algebra

RUTGERS UNIVERSITY GRADUATE PROGRAM IN MATHEMATICS Written Qualifying Examination August, Session 1. Algebra RUTGERS UNIVERSITY GRADUATE PROGRAM IN MATHEMATICS Written Qualifying Examination August, 2014 Session 1. Algebra The Qualifying Examination consists of three two-hour sessions. This is the first session.

More information

Learning Theory of Randomized Kaczmarz Algorithm

Learning Theory of Randomized Kaczmarz Algorithm Journal of Machine Learning Research 16 015 3341-3365 Submitted 6/14; Revised 4/15; Published 1/15 Junhong Lin Ding-Xuan Zhou Department of Mathematics City University of Hong Kong 83 Tat Chee Avenue Kowloon,

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

How far from automatically interpreting deep learning

How far from automatically interpreting deep learning How far from automatically interpreting deep learning Jinwei Zhao a,b, Qizhou Wang a,b, Yufei Wang a,b, Xinhong Hei a,b+, Yu Liu a,b a School of Computer Science and Engineering, Xi an University of Technology,

More information

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents MATH 3969 - MEASURE THEORY AND FOURIER ANALYSIS ANDREW TULLOCH Contents 1. Measure Theory 2 1.1. Properties of Measures 3 1.2. Constructing σ-algebras and measures 3 1.3. Properties of the Lebesgue measure

More information

Characterization of half-radial matrices

Characterization of half-radial matrices Characterization of half-radial matrices Iveta Hnětynková, Petr Tichý Faculty of Mathematics and Physics, Charles University, Sokolovská 83, Prague 8, Czech Republic Abstract Numerical radius r(a) is the

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Suppose again we have n sample points x,..., x n R p. The data-point x i R p can be thought of as the i-th row X i of an n p-dimensional

More information

An introduction to some aspects of functional analysis

An introduction to some aspects of functional analysis An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms

More information

Problem Set 1: Solutions Math 201A: Fall Problem 1. Let (X, d) be a metric space. (a) Prove the reverse triangle inequality: for every x, y, z X

Problem Set 1: Solutions Math 201A: Fall Problem 1. Let (X, d) be a metric space. (a) Prove the reverse triangle inequality: for every x, y, z X Problem Set 1: s Math 201A: Fall 2016 Problem 1. Let (X, d) be a metric space. (a) Prove the reverse triangle inequality: for every x, y, z X d(x, y) d(x, z) d(z, y). (b) Prove that if x n x and y n y

More information

Generalization and Properties of the Neural Response Jake Bouvrie, Tomaso Poggio, Lorenzo Rosasco, Steve Smale, and Andre Wibisono

Generalization and Properties of the Neural Response Jake Bouvrie, Tomaso Poggio, Lorenzo Rosasco, Steve Smale, and Andre Wibisono Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2010-051 CBCL-292 November 19, 2010 Generalization and Properties of the Neural Response Jake Bouvrie, Tomaso Poggio,

More information

Short note on compact operators - Monday 24 th March, Sylvester Eriksson-Bique

Short note on compact operators - Monday 24 th March, Sylvester Eriksson-Bique Short note on compact operators - Monday 24 th March, 2014 Sylvester Eriksson-Bique 1 Introduction In this note I will give a short outline about the structure theory of compact operators. I restrict attention

More information

The Gaussian free field, Gibbs measures and NLS on planar domains

The Gaussian free field, Gibbs measures and NLS on planar domains The Gaussian free field, Gibbs measures and on planar domains N. Burq, joint with L. Thomann (Nantes) and N. Tzvetkov (Cergy) Université Paris Sud, Laboratoire de Mathématiques d Orsay, CNRS UMR 8628 LAGA,

More information

Spectral Processing. Misha Kazhdan

Spectral Processing. Misha Kazhdan Spectral Processing Misha Kazhdan [Taubin, 1995] A Signal Processing Approach to Fair Surface Design [Desbrun, et al., 1999] Implicit Fairing of Arbitrary Meshes [Vallet and Levy, 2008] Spectral Geometry

More information

THE INVERSE FUNCTION THEOREM

THE INVERSE FUNCTION THEOREM THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)

More information

Topics in Harmonic Analysis Lecture 1: The Fourier transform

Topics in Harmonic Analysis Lecture 1: The Fourier transform Topics in Harmonic Analysis Lecture 1: The Fourier transform Po-Lam Yung The Chinese University of Hong Kong Outline Fourier series on T: L 2 theory Convolutions The Dirichlet and Fejer kernels Pointwise

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

Implicit Functions, Curves and Surfaces

Implicit Functions, Curves and Surfaces Chapter 11 Implicit Functions, Curves and Surfaces 11.1 Implicit Function Theorem Motivation. In many problems, objects or quantities of interest can only be described indirectly or implicitly. It is then

More information

Decouplings and applications

Decouplings and applications April 27, 2018 Let Ξ be a collection of frequency points ξ on some curved, compact manifold S of diameter 1 in R n (e.g. the unit sphere S n 1 ) Let B R = B(c, R) be a ball with radius R 1. Let also a

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

SYMPLECTIC GEOMETRY: LECTURE 5

SYMPLECTIC GEOMETRY: LECTURE 5 SYMPLECTIC GEOMETRY: LECTURE 5 LIAT KESSLER Let (M, ω) be a connected compact symplectic manifold, T a torus, T M M a Hamiltonian action of T on M, and Φ: M t the assoaciated moment map. Theorem 0.1 (The

More information

Why is Deep Learning so effective?

Why is Deep Learning so effective? Ma191b Winter 2017 Geometry of Neuroscience The unreasonable effectiveness of deep learning This lecture is based entirely on the paper: Reference: Henry W. Lin and Max Tegmark, Why does deep and cheap

More information

MATH & MATH FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING Scientia Imperii Decus et Tutamen 1

MATH & MATH FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING Scientia Imperii Decus et Tutamen 1 MATH 5310.001 & MATH 5320.001 FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING 2016 Scientia Imperii Decus et Tutamen 1 Robert R. Kallman University of North Texas Department of Mathematics 1155

More information

MET Workshop: Exercises

MET Workshop: Exercises MET Workshop: Exercises Alex Blumenthal and Anthony Quas May 7, 206 Notation. R d is endowed with the standard inner product (, ) and Euclidean norm. M d d (R) denotes the space of n n real matrices. When

More information

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN Contribution from: Mathematical Physics Studies Vol. 7 Perspectives in Analysis Essays in Honor of Lennart Carleson s 75th Birthday Michael Benedicks, Peter W. Jones, Stanislav Smirnov (Eds.) Springer

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Ph.D. Qualifying Exam: Algebra I

Ph.D. Qualifying Exam: Algebra I Ph.D. Qualifying Exam: Algebra I 1. Let F q be the finite field of order q. Let G = GL n (F q ), which is the group of n n invertible matrices with the entries in F q. Compute the order of the group G

More information

8. COMPACT LIE GROUPS AND REPRESENTATIONS

8. COMPACT LIE GROUPS AND REPRESENTATIONS 8. COMPACT LIE GROUPS AND REPRESENTATIONS. Abelian Lie groups.. Theorem. Assume G is a Lie group and g its Lie algebra. Then G 0 is abelian iff g is abelian... Proof.. Let 0 U g and e V G small (symmetric)

More information

Simple Cell Receptive Fields in V1.

Simple Cell Receptive Fields in V1. Simple Cell Receptive Fields in V1. The receptive field properties of simple cells in V1 were studied by Hubel and Wiesel [65][66] who showed that many cells were tuned to the orientation of edges and

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Natural Image Statistics

Natural Image Statistics Natural Image Statistics A probabilistic approach to modelling early visual processing in the cortex Dept of Computer Science Early visual processing LGN V1 retina From the eye to the primary visual cortex

More information

Manifold Regularization

Manifold Regularization 9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,

More information

2) Let X be a compact space. Prove that the space C(X) of continuous real-valued functions is a complete metric space.

2) Let X be a compact space. Prove that the space C(X) of continuous real-valued functions is a complete metric space. University of Bergen General Functional Analysis Problems with solutions 6 ) Prove that is unique in any normed space. Solution of ) Let us suppose that there are 2 zeros and 2. Then = + 2 = 2 + = 2. 2)

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

ON NEARLY SEMIFREE CIRCLE ACTIONS

ON NEARLY SEMIFREE CIRCLE ACTIONS ON NEARLY SEMIFREE CIRCLE ACTIONS DUSA MCDUFF AND SUSAN TOLMAN Abstract. Recall that an effective circle action is semifree if the stabilizer subgroup of each point is connected. We show that if (M, ω)

More information

1.3.1 Definition and Basic Properties of Convolution

1.3.1 Definition and Basic Properties of Convolution 1.3 Convolution 15 1.3 Convolution Since L 1 (R) is a Banach space, we know that it has many useful properties. In particular the operations of addition and scalar multiplication are continuous. However,

More information

Topics in Representation Theory: Roots and Weights

Topics in Representation Theory: Roots and Weights Topics in Representation Theory: Roots and Weights 1 The Representation Ring Last time we defined the maximal torus T and Weyl group W (G, T ) for a compact, connected Lie group G and explained that our

More information

Hilbert Space Methods in Learning

Hilbert Space Methods in Learning Hilbert Space Methods in Learning guest lecturer: Risi Kondor 6772 Advanced Machine Learning and Perception (Jebara), Columbia University, October 15, 2003. 1 1. A general formulation of the learning problem

More information

THEOREMS, ETC., FOR MATH 516

THEOREMS, ETC., FOR MATH 516 THEOREMS, ETC., FOR MATH 516 Results labeled Theorem Ea.b.c (or Proposition Ea.b.c, etc.) refer to Theorem c from section a.b of Evans book (Partial Differential Equations). Proposition 1 (=Proposition

More information

1 Math 241A-B Homework Problem List for F2015 and W2016

1 Math 241A-B Homework Problem List for F2015 and W2016 1 Math 241A-B Homework Problem List for F2015 W2016 1.1 Homework 1. Due Wednesday, October 7, 2015 Notation 1.1 Let U be any set, g be a positive function on U, Y be a normed space. For any f : U Y let

More information

Universal Approximation properties of Feedforward Artificial Neural Networks

Universal Approximation properties of Feedforward Artificial Neural Networks Universal Approximation properties of Feedforward Artificial Neural Networks A thesis submitted in partial fulfilment of the requirements for the degree of MASTER OF SCIENCE of RHODES UNIVERSITY by STUART

More information

l(y j ) = 0 for all y j (1)

l(y j ) = 0 for all y j (1) Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that

More information

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem 56 Chapter 7 Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem Recall that C(X) is not a normed linear space when X is not compact. On the other hand we could use semi

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

Contact Geometry of the Visual Cortex

Contact Geometry of the Visual Cortex Ma191b Winter 2017 Geometry of Neuroscience References for this lecture Jean Petitot, Neurogéométrie de la Vision, Les Éditions de l École Polytechnique, 2008 John B. Entyre, Introductory Lectures on Contact

More information

NATIONAL BOARD FOR HIGHER MATHEMATICS. Research Scholarships Screening Test. Saturday, January 20, Time Allowed: 150 Minutes Maximum Marks: 40

NATIONAL BOARD FOR HIGHER MATHEMATICS. Research Scholarships Screening Test. Saturday, January 20, Time Allowed: 150 Minutes Maximum Marks: 40 NATIONAL BOARD FOR HIGHER MATHEMATICS Research Scholarships Screening Test Saturday, January 2, 218 Time Allowed: 15 Minutes Maximum Marks: 4 Please read, carefully, the instructions that follow. INSTRUCTIONS

More information