Infinite Dimensional Vector Spaces. 1. Motivation: Statistical machine learning and reproducing kernel Hilbert Spaces

MA 751 Part 3 Infinite Dimensional Vector Spaces 1. Motivation: Statistical machine learning and reproducing kernel Hilbert Spaces Microarray experiment: Question: Gene expression - when is the DNA in a gene 1 transcribed and thus expressed (as RNA) in a cell?

One solution: Measure RNA levels (result of transcription) Method: Microarray

"Probe": Single-stranded DNA with a defined identity tethered to a solid medium

"Target": DNA or RNA (labeled with a phosphorescent color to identify its presence) Two main types of Microarrays cdna arrays: Spotted onto surface oligonucleotide arrays: created on surface

population of cdna microarray chip

Resulting Microarrays:

From: "Data Analysis Tools for DNA microarrays" by S. Draghici, published by Chapman and Hall/CRC Press.

Affymetrix Agilent Amersham Mergen Heart replicates Heart replicates Heart replicates Heart replicates Heart:Liver Heart:Liver Heart:Liver Heart:Liver Human liver vs. human heart: 3904/22,283 (18%) Human liver vs. human liver: 3875/22,283 (17%) Human heart vs. human heart: 4026/22,283 (18%) Human liver vs. human heart: 6963/14,159 (49%) Human liver vs. human liver: 5129/14,159 (36%) Human heart vs. human heart: 1204/18,006 (6%) Human liver vs. human heart: 2595/9970 (26%) Human liver vs. human liver: 318/9778 (3%) Human heart vs. human heart: 454/9772 (5%) Human liver vs. human heart: 8572/11,904 (72%) Human liver vs. human liver: 2811/11,904 (24%) Human heart vs. human heart: 3515/11,904 (30%)

Result: for each subject tissue sample =, obtain a feature vector: FÐ=Ñ œ x œ ÐB" ß á ß B$!ß!!! Ñ consisting of expression levels of 30,000 genes. Can we classify tissues this way? Goals: 1. differentiate two different but similar cancers. 2. Understand genetic pathways of cancer

Basic difficulties: few samples (e.g., 30-200); high dimension (e.g., 5,000-100,000). Curse of dimensionality - too few samples and too many parameters (dimensions) to fit them. Tool: Support vector machine (SVM)

Procedure: look at feature space J in which FÐ=Ñ lives, and differentiate examples of one and the other cancer with a hyperplane: Methods needed: Machine learning; reproducing kernel Hilbert spaces (RKHS)

2. Statistical & machine Learning: More Motivation The role of learning theory has grown a great deal in: Mathematics Statistics Computational Biology Neurosciences, e.g., theory of plasticity, workings of visual cortex

Source: University of Washington

Computer science, e.g., vision theory, graphics, speech synthesis Source: T. Poggio/M

Face identification: MIT

People classification or detection: Poggio/MIT

We want the theory behind such learning algorithms- 3. The problem: Learning theory Given an unknown function 0ÐxÑ À Ä ß learn 0ÐxÑ. We have already seen the problem of representing a known function 0ÐxÑ using neural networks. Now we wish to find a way to determine an unknown 0Ðx Ñ from knowing its value at several inputs..

Example 1: x is retinal activation pattern (i.e., B3 œ activation level of retinal neuron 3), and Cœ0Ðx Ñ! if the retinal pattern is a chair; Cœ0Ðx Ñ! otherwise. [Thus: want concept of a chair] Given: examples of chairs (and non-chairs): x" ßx# ßáßx8, together with proper outputs C" ßáßC8Þ This is the information: R0 œ Ð0Ðx Ñßáß0Ðx ÑÑ " 8

Goal: Give best possible estimate of the unknown function 0, i.e., try to learn the concept 0 from the examples R0. But: given pointwise information about 0 not sufficient: which is the "right" 0ÐBÑ given the data R0 below?

(a)

(b) [How to decide?]

4. Infinite dimensional vector spaces: [This material is essentially the content of a course in real analysis; see me if you want more information on this.] Let L be a vector space with inner product. Recall by definition Recall also m@m œ length of @. Ð@ß @Ñ œ m@m #.

Define distance between two vectors @ß@ " # to be m@ " @# m. Consider an infinite collection of vectors W œö@ ß@ ß@ á Þ " # $ Define infinite linear combinations by: _ " -@ œa 3œ" 3 3 if 8 ma! - 3@ 3m 8Ä_ Ò!Þ 3œ"

[Definitions of linear independence and spanning for infinite collections of vectors are the same, except we allow infinite sums. Henceforth we always allow infinite linear combinations.] Def. 1: In an inner product vector space L, a Cauchy sequence is a sequence of vectors @ß@ßá " # for which their mutual distances go to 0 (so they behave as if converging to a limit), i.e., ll@ 8 @ 7ll 8ß 7 Ò Ä _!.

Def 2: An inner product vector space L with an inner product is complete if any infinite sequence of vectors @ 3 which is a Cauchy sequence (i.e., behaves convergently) actually converges to some limiting vector @, i.e., for some @. ll@ 8 @ll 8Ä_ Ò! Def 3: An infinite collection of vectors F œö@ " ß@ # ßá spans L if every vector A L can be written as a (possibly infinite) linear combination of vectors in F.

The collection F is linearly independent if no vector in F is a linear combination of the others. The collection F is a basis for L if it spans L and is linearly indepenent. [I.E., the notions of span, linear independence, and basis work the same way as before, except we now allow infinite linear combinations] Ex: Not all inner product spaces are Hilbert spaces since not all are complete. As an example, consider the space T œ Öall polynomials on Ò!ß "Ó. Define " inner product Ð0ß 1Ñ œ '! 0ÐBÑ1ÐBÑ.BÞ

# ' " #! Then resulting norm is ll0ðbñll œ Ø0ß 0Ù œ 0 ÐBÑ.BÞ This space is not complete: consider the vectors @ R defined by the partial sums of the Taylor series for B / À @ R R 8 B œ " = a polynomial. 8x 8œ!

Note that if R Q then R 8 Q 8 R 8 R 8 B B B B ll@ R @ Qll œ ¾" " ¾ œ ¾ " ¾ Ÿ " ½ 8x 8x 8x 8 But: 8 8œ! 8œ! 8œQ " 8œQ " B " 8 " #8 " " m m œ mb m œ B.B œ Þ 8x 8x 8x ( Œ 8x #8 "! " "Î# "Î#

_ So easy to show! m that 8=0 8 B 8x m _. Thus it easily follows ll@ R @ Q ll Ò!ß RßQ Ä _ so that the sequence @ is a Cauchy sequence in L. R

But note that by Taylor series R 8 B B @ R ÐBÑ / œ " B / Ä! 8x RÄ_ 8œ! uniformly on [0,1]. Thus easy to show that B m@ R ÐBÑ / m Ò!Þ RÄ_

So: B @ R ÐBÑ Ä / Þ But: can show that a sequence of functions can't converge to 2 different functions. Thus there is no polynomial :ÐBÑ (i.e. something in our space T ) such that @ ÐBÑ Ä :ÐBÑÞ R Thus @ R do not converge to something in T and thus T is not complete! [Moral: intuitively, complete space is one where any convergent sequence T8 converges to an element T of the original space.]

Theorem 4: If F œö@ " ß@ # ß@ $ ßá is a collection of vectors which is orthonormal (i.e, unit lengths and inner product!), then it is automatically linearly independent. If F is a basis for L and is orthonormal, it is called an orthonormal basis.

$ Ex 2: Lœ œö@œð@ß@ " #ß@Ñl@ $ 3 is a Hilbert space (i.e., not hard to show that it's complete). Inner product is the usual one for vectors: This L is a Hilbert space. Orthonormal basis: Ð@ßAÑœ@A @A @AÞ " " # # $ $ Ô " Ô! Ô! e" œ! à e# œ " à e$ œ! Þ Õ! Ø Õ! Ø Õ" Ø

Ex 3: #! " $ # 3 Lœ œ polynomials on Ò!ß "Ó œ Ö+ + B + B À + forms a Hilbert space. Inner product: " # ' "! " # Ð: ÐBÑß : ÐBÑÑ œ : ÐBÑ: ÐBÑ.BÞ Note this is complete since an infinite sum of second order polynomials which converges, converges to another second order polynomial (this can be proved). Thus L is a Hilbert space.

Ex 4: Let Hilbert _ L œ œö@œð@ " ß@ # ß@ $ ßáÑl@ 3 space, if we define the inner product Ð@ß AÑ œ @ A @ A á œ "@ A _ " " # # 3 3 3œ" is a Length of a vector @ is m@m œ Ë" @ 3@ 3 œ Ë" @ 3 # Þ 3œ" 3œ"

Thus to have well-defined lengths we add the condition that m@m _ to the definition of L. Then can show that L satisfies all the properties of a Hilbert space (in particular it's complete).

Can show that the set of vectors @ @ @ ã " # $ œ Ð"ß!ß!ßáÑ œ Ð!ß"ß!ßáÑ œð!ß!ß"ßáñ is certainly orthonormal, and it spans orthonormal basis for L. L, so it is an