Infinite Dimensional Vector Spaces. 1. Motivation: Statistical machine learning and reproducing kernel Hilbert Spaces

Similar documents
Gene expression experiments. Infinite Dimensional Vector Spaces. 1. Motivation: Statistical machine learning and reproducing kernel Hilbert Spaces

SVM example: cancer classification Support Vector Machines

Solving SVM: Quadratic Programming

Statistical machine learning and kernel methods. Primary references: John Shawe-Taylor and Nello Cristianini, Kernel Methods for Pattern Analysis

Inner Product Spaces

Dr. H. Joseph Straight SUNY Fredonia Smokin' Joe's Catalog of Groups: Direct Products and Semi-direct Products

Probability basics. Probability and Measure Theory. Probability = mathematical analysis

E.G., data mining, prediction. Function approximation problem: How >9 best estimate the function 0 from partial information (examples)

Systems of Equations 1. Systems of Linear Equations

Inner Product Spaces. Another notation sometimes used is X.?

8œ! This theorem is justified by repeating the process developed for a Taylor polynomial an infinite number of times.

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

PART I. Multiple choice. 1. Find the slope of the line shown here. 2. Find the slope of the line with equation $ÐB CÑœ(B &.

Manifold Regularization

Notes on the Unique Extension Theorem. Recall that we were interested in defining a general measure of a size of a set on Ò!ß "Ó.

B œ c " " ã B œ c 8 8. such that substituting these values for the B 3 's will make all the equations true

Here are proofs for some of the results about diagonalization that were presented without proof in class.

Support Vector Machines

Proofs Involving Quantifiers. Proof Let B be an arbitrary member Proof Somehow show that there is a value

MAP METHODS FOR MACHINE LEARNING. Mark A. Kon, Boston University Leszek Plaskota, Warsaw University, Warsaw Andrzej Przybyszewski, McGill University

: œ Ö: =? À =ß> real numbers. œ the previous plane with each point translated by : Ðfor example,! is translated to :)

Framework for functional tree simulation applied to 'golden delicious' apple trees

Introduction to Diagonalization

LA PRISE DE CALAIS. çoys, çoys, har - dis. çoys, dis. tons, mantz, tons, Gas. c est. à ce. C est à ce. coup, c est à ce

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

QUALIFYING EXAMINATION II IN MATHEMATICAL STATISTICS SATURDAY, MAY 9, Examiners: Drs. K. M. Ramachandran and G. S. Ladde

Applied Machine Learning Annalisa Marsico

CIS 520: Machine Learning Oct 09, Kernel Methods

Math 131 Exam 4 (Final Exam) F04M

The Hahn-Banach and Radon-Nikodym Theorems

Analysis of Spectral Kernel Design based Semi-supervised Learning

ETIKA V PROFESII PSYCHOLÓGA

Æ Å not every column in E is a pivot column (so EB œ! has at least one free variable) Æ Å E has linearly dependent columns

Implicit in the definition of orthogonal arrays is a projection propertyþ

Introduction to Support Vector Machines

1 Mathematical preliminaries

Kernel Methods. Barnabás Póczos

Kernels for Multi task Learning

Lecture 16: Modern Classification (I) - Separating Hyperplanes

Statistical learning theory, Support vector machines, and Bioinformatics

7. Random variables as observables. Definition 7. A random variable (function) X on a probability measure space is called an observable

GS Analysis of Microarray Data

Day 4: Classification, support vector machines

Suggestions - Problem Set (a) Show the discriminant condition (1) takes the form. ln ln, # # R R

SIMULTANEOUS CONFIDENCE BANDS FOR THE PTH PERCENTILE AND THE MEAN LIFETIME IN EXPONENTIAL AND WEIBULL REGRESSION MODELS. Ping Sa and S.J.

Lesson 11. Functional Genomics I: Microarray Analysis

GS Analysis of Microarray Data

Cellular Automaton Growth on # : Theorems, Examples, and Problems

GS Analysis of Microarray Data

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Automatic Control III (Reglerteknik III) fall Nonlinear systems, Part 3

SIMULATION - PROBLEM SET 1

Discovering Correlation in Data. Vinh Nguyen Research Fellow in Data Science Computing and Information Systems DMD 7.

Hilbert Spaces: Infinite-Dimensional Vector Spaces

DEPARTMENT OF MATHEMATICS AND STATISTICS QUALIFYING EXAMINATION II IN MATHEMATICAL STATISTICS SATURDAY, SEPTEMBER 24, 2016

AN IDENTIFICATION ALGORITHM FOR ARMAX SYSTEMS

Engineering Mathematics (E35 317) Final Exam December 18, 2007

Function Space and Convergence Types

Full file at 2CT IMPULSE, IMPULSE RESPONSE, AND CONVOLUTION CHAPTER 2CT


Engineering Mathematics (E35 317) Exam 3 November 7, 2007

The quantum state as a vector

Math 61CM - Solutions to homework 6

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

IE 400 Principles of Engineering Management. Graphical Solution of 2-variable LP Problems

The Spotted Owls 5 " 5. = 5 " œ!þ")45 Ð'!% leave nest, 30% of those succeed) + 5 " œ!þ("= 5!Þ*%+ 5 ( adults live about 20 yrs)

Generalization theory

B-9= B.Bà B 68 B.Bß B/.Bß B=38 B.B >+8 ab b.bß ' 68aB b.bà

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco

Example Let VœÖÐBßCÑ À b-, CœB - ( see the example above). Explain why

1. Classify each number. Choose all correct answers. b. È # : (i) natural number (ii) integer (iii) rational number (iv) real number

Definition Suppose M is a collection (set) of sets. M is called inductive if

Kernel expansions with unlabeled examples

An Introduction to Optimal Control Applied to Disease Models

Optimal data-independent point locations for RBF interpolation

Human genome contains ~30K genes Not all genes are expressed at same time

Infinite Ensemble Learning with Support Vector Machinery

Microarray Data Analysis: Discovery

Applications of Discrete Mathematics to the Analysis of Algorithms

Jeff Howbert Introduction to Machine Learning Winter

Math 131, Exam 2 Solutions, Fall 2010

Chapter Seven The Quantum Mechanical Simple Harmonic Oscillator

A Semi-Supervised Dimensionality Reduction Method to Reduce Batch Effects in Genomic Data

Margin Maximizing Loss Functions

Fourier and Wavelet Signal Processing

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

probability of k samples out of J fall in R.

Support Vector Machines: Maximum Margin Classifiers

MATH 1301 (College Algebra) - Final Exam Review

Monic Adjuncts of Quadratics (Developed, Composed & Typeset by: J B Barksdale Jr / )

Support Vector Machines

CS 231A Section 1: Linear Algebra & Probability Review

Bioinformatics. Transcriptome

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

Reproducing Kernel Hilbert Spaces

Planning for Reactive Behaviors in Hide and Seek

T i t l e o f t h e w o r k : L a M a r e a Y o k o h a m a. A r t i s t : M a r i a n o P e n s o t t i ( P l a y w r i g h t, D i r e c t o r )

Learning Kernel Parameters by using Class Separability Measure

Introduction to Support Vector Machines

Lecture 10: Support Vector Machine and Large Margin Classifier

Transcription:

MA 751 Part 3 Infinite Dimensional Vector Spaces 1. Motivation: Statistical machine learning and reproducing kernel Hilbert Spaces Microarray experiment: Question: Gene expression - when is the DNA in a gene 1 transcribed and thus expressed (as RNA) in a cell?

One solution: Measure RNA levels (result of transcription) Method: Microarray

"Probe": Single-stranded DNA with a defined identity tethered to a solid medium

"Target": DNA or RNA (labeled with a phosphorescent color to identify its presence) Two main types of Microarrays cdna arrays: Spotted onto surface oligonucleotide arrays: created on surface

population of cdna microarray chip

Resulting Microarrays:

From: "Data Analysis Tools for DNA microarrays" by S. Draghici, published by Chapman and Hall/CRC Press.

Affymetrix Agilent Amersham Mergen Heart replicates Heart replicates Heart replicates Heart replicates Heart:Liver Heart:Liver Heart:Liver Heart:Liver Human liver vs. human heart: 3904/22,283 (18%) Human liver vs. human liver: 3875/22,283 (17%) Human heart vs. human heart: 4026/22,283 (18%) Human liver vs. human heart: 6963/14,159 (49%) Human liver vs. human liver: 5129/14,159 (36%) Human heart vs. human heart: 1204/18,006 (6%) Human liver vs. human heart: 2595/9970 (26%) Human liver vs. human liver: 318/9778 (3%) Human heart vs. human heart: 454/9772 (5%) Human liver vs. human heart: 8572/11,904 (72%) Human liver vs. human liver: 2811/11,904 (24%) Human heart vs. human heart: 3515/11,904 (30%)

Result: for each subject tissue sample =, obtain a feature vector: FÐ=Ñ œ x œ ÐB" ß á ß B$!ß!!! Ñ consisting of expression levels of 30,000 genes. Can we classify tissues this way? Goals: 1. differentiate two different but similar cancers. 2. Understand genetic pathways of cancer

Basic difficulties: few samples (e.g., 30-200); high dimension (e.g., 5,000-100,000). Curse of dimensionality - too few samples and too many parameters (dimensions) to fit them. Tool: Support vector machine (SVM)

Procedure: look at feature space J in which FÐ=Ñ lives, and differentiate examples of one and the other cancer with a hyperplane: Methods needed: Machine learning; reproducing kernel Hilbert spaces (RKHS)

2. Statistical & machine Learning: More Motivation The role of learning theory has grown a great deal in: Mathematics Statistics Computational Biology Neurosciences, e.g., theory of plasticity, workings of visual cortex

Source: University of Washington

Computer science, e.g., vision theory, graphics, speech synthesis Source: T. Poggio/M

Face identification: MIT

People classification or detection: Poggio/MIT

We want the theory behind such learning algorithms- 3. The problem: Learning theory Given an unknown function 0ÐxÑ À Ä ß learn 0ÐxÑ. We have already seen the problem of representing a known function 0ÐxÑ using neural networks. Now we wish to find a way to determine an unknown 0Ðx Ñ from knowing its value at several inputs..

Example 1: x is retinal activation pattern (i.e., B3 œ activation level of retinal neuron 3), and Cœ0Ðx Ñ! if the retinal pattern is a chair; Cœ0Ðx Ñ! otherwise. [Thus: want concept of a chair] Given: examples of chairs (and non-chairs): x" ßx# ßáßx8, together with proper outputs C" ßáßC8Þ This is the information: R0 œ Ð0Ðx Ñßáß0Ðx ÑÑ " 8

Goal: Give best possible estimate of the unknown function 0, i.e., try to learn the concept 0 from the examples R0. But: given pointwise information about 0 not sufficient: which is the "right" 0ÐBÑ given the data R0 below?

(a)

(b) [How to decide?]

4. Infinite dimensional vector spaces: [This material is essentially the content of a course in real analysis; see me if you want more information on this.] Let L be a vector space with inner product. Recall by definition Recall also m@m œ length of @. Ð@ß @Ñ œ m@m #.

Define distance between two vectors @ß@ " # to be m@ " @# m. Consider an infinite collection of vectors W œö@ ß@ ß@ á Þ " # $ Define infinite linear combinations by: _ " -@ œa 3œ" 3 3 if 8 ma! - 3@ 3m 8Ä_ Ò!Þ 3œ"

[Definitions of linear independence and spanning for infinite collections of vectors are the same, except we allow infinite sums. Henceforth we always allow infinite linear combinations.] Def. 1: In an inner product vector space L, a Cauchy sequence is a sequence of vectors @ß@ßá " # for which their mutual distances go to 0 (so they behave as if converging to a limit), i.e., ll@ 8 @ 7ll 8ß 7 Ò Ä _!.

Def 2: An inner product vector space L with an inner product is complete if any infinite sequence of vectors @ 3 which is a Cauchy sequence (i.e., behaves convergently) actually converges to some limiting vector @, i.e., for some @. ll@ 8 @ll 8Ä_ Ò! Def 3: An infinite collection of vectors F œö@ " ß@ # ßá spans L if every vector A L can be written as a (possibly infinite) linear combination of vectors in F.

The collection F is linearly independent if no vector in F is a linear combination of the others. The collection F is a basis for L if it spans L and is linearly indepenent. [I.E., the notions of span, linear independence, and basis work the same way as before, except we now allow infinite linear combinations] Ex: Not all inner product spaces are Hilbert spaces since not all are complete. As an example, consider the space T œ Öall polynomials on Ò!ß "Ó. Define " inner product Ð0ß 1Ñ œ '! 0ÐBÑ1ÐBÑ.BÞ

# ' " #! Then resulting norm is ll0ðbñll œ Ø0ß 0Ù œ 0 ÐBÑ.BÞ This space is not complete: consider the vectors @ R defined by the partial sums of the Taylor series for B / À @ R R 8 B œ " = a polynomial. 8x 8œ!

Note that if R Q then R 8 Q 8 R 8 R 8 B B B B ll@ R @ Qll œ ¾" " ¾ œ ¾ " ¾ Ÿ " ½ 8x 8x 8x 8 But: 8 8œ! 8œ! 8œQ " 8œQ " B " 8 " #8 " " m m œ mb m œ B.B œ Þ 8x 8x 8x ( Œ 8x #8 "! " "Î# "Î#

_ So easy to show! m that 8=0 8 B 8x m _. Thus it easily follows ll@ R @ Q ll Ò!ß RßQ Ä _ so that the sequence @ is a Cauchy sequence in L. R

But note that by Taylor series R 8 B B @ R ÐBÑ / œ " B / Ä! 8x RÄ_ 8œ! uniformly on [0,1]. Thus easy to show that B m@ R ÐBÑ / m Ò!Þ RÄ_

So: B @ R ÐBÑ Ä / Þ But: can show that a sequence of functions can't converge to 2 different functions. Thus there is no polynomial :ÐBÑ (i.e. something in our space T ) such that @ ÐBÑ Ä :ÐBÑÞ R Thus @ R do not converge to something in T and thus T is not complete! [Moral: intuitively, complete space is one where any convergent sequence T8 converges to an element T of the original space.]

Theorem 4: If F œö@ " ß@ # ß@ $ ßá is a collection of vectors which is orthonormal (i.e, unit lengths and inner product!), then it is automatically linearly independent. If F is a basis for L and is orthonormal, it is called an orthonormal basis.

$ Ex 2: Lœ œö@œð@ß@ " #ß@Ñl@ $ 3 is a Hilbert space (i.e., not hard to show that it's complete). Inner product is the usual one for vectors: This L is a Hilbert space. Orthonormal basis: Ð@ßAÑœ@A @A @AÞ " " # # $ $ Ô " Ô! Ô! e" œ! à e# œ " à e$ œ! Þ Õ! Ø Õ! Ø Õ" Ø

Ex 3: #! " $ # 3 Lœ œ polynomials on Ò!ß "Ó œ Ö+ + B + B À + forms a Hilbert space. Inner product: " # ' "! " # Ð: ÐBÑß : ÐBÑÑ œ : ÐBÑ: ÐBÑ.BÞ Note this is complete since an infinite sum of second order polynomials which converges, converges to another second order polynomial (this can be proved). Thus L is a Hilbert space.

Ex 4: Let Hilbert _ L œ œö@œð@ " ß@ # ß@ $ ßáÑl@ 3 space, if we define the inner product Ð@ß AÑ œ @ A @ A á œ "@ A _ " " # # 3 3 3œ" is a Length of a vector @ is m@m œ Ë" @ 3@ 3 œ Ë" @ 3 # Þ 3œ" 3œ"

Thus to have well-defined lengths we add the condition that m@m _ to the definition of L. Then can show that L satisfies all the properties of a Hilbert space (in particular it's complete).

Can show that the set of vectors @ @ @ ã " # $ œ Ð"ß!ß!ßáÑ œ Ð!ß"ß!ßáÑ œð!ß!ß"ßáñ is certainly orthonormal, and it spans orthonormal basis for L. L, so it is an