6.867 Machine learning, lecture 7 (Jaakkola) 1

Size: px
Start display at page:

Download "6.867 Machine learning, lecture 7 (Jaakkola) 1"

Transcription

1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit the offset parameter θ 0, reducig the model to y = θ T φ(x) + ɛ where φ(x) is a particular feature expasio (e.g., polyomial). Our goal here is to tur both the estimatio problem ad the subsequet predictio task ito forms that ivolve oly ier products betwee the feature vectors. We have already emphasized that regularizatio is ecessary i cojuctio with mappig examples to higher dimesioal feature vectors. The regularized least squares objective to be miimized, with parameter λ, is give by J(θ) = ( yt θ T φ(x t ) ) 2 + λ θ 2 This form ca be derived from pealized log-likelihood estimatio (see previous lecture otes). The effect of the regularizatio pealty is to pull all the parameters towards zero. So ay liear dimesios i the parameters that the traiig feature vectors do ot pertai to are set explicitly to zero. We would therefore expect the optimal parameters to lie i the spa of the feature vectors correspodig to the traiig examples. This is ideed the case. As before, the optimality coditio for θ follows from settig the gradiet to zero: dj(θ) dθ α t ( {}} = 2 y t θ T φ(x t ) ){ φ(x t ) + 2λθ = 0 (2) We ca therefore costruct the optimal θ i terms of predictio differeces α t ad the feature vectors: 1 λ (1) θ = α t φ(x t ) (3) The implicatio is that the optimal θ (however high dimesioal) will lie i the spa of the feature vectors correspodig to the traiig examples. This is due to the regularizatio

2 6.867 Machie learig, lecture 7 (Jaakkola) 2 pealty we added. But how do we set α t? The values for α t ca be foud by isistig that they ideed ca be iterpreted as predictio differeces: 1 λ t =1 α t = y t θ T φ(x t ) = y t α t φ(x t ) T φ(x t ) (4) Thus α t depeds oly o the actual resposes y t ad the ier products betwee the traiig examples, the Gram matrix : φ(x 1 ) T φ(x 1 ) φ(x 1 ) T φ(x ) K = (5) φ(x ) T φ(x 1 )... φ(x ) T φ(x ) I a vector form, a = [α 1,..., α ] T, (6) y = [y 1,..., y ] T, (7) a = 1 y Ka λ (8) the solutio is ( ) 1 â = λ λi + K y (9) Note that fidig the estimates ˆα t requires ivertig a matrix. This is the cost of dealig with ier products as opposed to hadig feature vectors directly. I some cases, the beefit is substatial sice the feature vectors i the ier products may be ifiite dimesioal but ever eeded explicitly. As a result of fidig ˆα t we ca cast the predictios for ew examples also i terms of ier products: y = θˆt φ(x) = (ˆα t /λ)φ(x t ) T φ(x) = αˆtk(x t, x) (10) where we view K(x t, x) as a kerel fuctio, a fuctio of two argumets x t ad x. Kerels So we have ow successfully tured a regularized liear regressio problem ito a kerel form. This meas that we ca simply substitute differet kerel fuctios K(x, x ) ito the estimatio/predictio equatios. This gives us a easy access to a wide rage of possible regressio fuctios. Here are a couple of stadard examples of kerels:

3 6.867 Machie learig, lecture 7 (Jaakkola) 3 Polyomial kerel K(x, x ) = (1 + x T x ) p, p = 1, 2,... (11) Radial basis kerel ( ) β K(x, x ) = exp x x 2, β > 0 (12) 2 We have already discussed the feature vectors correspodig to the polyomial kerel. The compoets of these feature vectors were polyomial terms up to degree p with specifically chose coefficiets. The restricted choice of coefficiets was ecessary i order to collapse the ier product calculatios. The feature vectors correspodig to the radial basis kerel are ifiite dimesioal! The compoets of these vectors are idexed by z R d where d is the dimesio of the origial iput x. More precisely, the feature vectors are fuctios: φ z (x) = c(β, d) N(z; x, 1/2β) (13) where N(z; x, (1/β)) is a ormal pdf over z ad c(β, d) is a costat. Roughly speakig, the radial basis kerel measures the probability that you would get the same sample z (i the same small regio) from two ormal distributios with meas x ad x ad a commo variace 1/2β. This is a reasoable measure of similarity betwee x ad x ad kerels are ofte defied from this perspective. The ier product givig rise to the radial basis kerel is defied through itegratio K(x, x ) = φ z (x)φ z (x )dz (14) We ca also costruct various types of kerels from simpler oes. Here are a few rules to guide us. Assume K 1 (x, x ) ad K 2 (x, x ) are valid kerels (correspod to ier products of some feature vectors), the 1. K(x, x ) = f(x)k 1 (x, x )f(x ) for ay fuctio f(x), 2. K(x, x ) = K 1 (x, x ) + K 2 (x, x ), 3. K(x, x ) = K 1 (x, x )K 2 (x, x )

4 6.867 Machie learig, lecture 7 (Jaakkola) 4 are all valid kerels. While simple, these rules are quite powerful. Let s first uderstad these rules from the poit of view of the implicit feature vectors. For each rule, let φ(x) be the feature vector correspodig to K ad φ (1) (x) ad φ (2) (x) the feature vectors associated with K 1 ad K 2, respectively. The feature mappig for the first rule is give simply by multiplyig with the scalar fuctio f(x): φ(x) = f(x)φ (1) (x) (15) so that φ(x) T φ(x ) = f(x)φ (1) (x) T φ (1) (x )f(x ) = f(x)k 1 (x, x )f(x ). The secod rule, addig kerels, correspods to just cocateatig the feature vectors [ ] φ (1) (x) φ(x) = φ (2) (16) (x) The third ad the last rule is a little more complicated but ot much. Suppose we use a double idex i, j to idex the compoets of φ(x) where i rages over the compoets of φ (1) (x) ad j refers to the compoets of φ (2) (x). The It is ow easy to see that (1) (2) φ i,j (x) = φ i (x)φ j (x) (17) K(x, x ) = φ(x) T φ(x ) (18) = φ i,j (x)φ i,j (x ) (19) i,j = φ (1) i (x)φ (2) j (x)φ (1) i (x )φ (2) j (x ) (20) i,j = [ φ (1) i (x)φ (1) i (x )][ φ (2) j (x)φ (2) j (x )] (21) i j = [φ (1) (x) T φ (1) (x )][φ (2) (x) T φ (2) (x )] (22) = K 1 (x, x )K 2 (x, x ) (23) These costructio rules ca also be used to verify that somethig is a valid kerel. As a example, let s figure out why a radial basis kerel K(x, x ) = exp{ 2 1 x x 2 } (24)

5 6.867 Machie learig, lecture 7 (Jaakkola) 5 is a valid kerel. exp{ 1 2 x x 2 } = exp{ 1 2 x T x + x T x 1 2 x T x } (25) f(x) f(x {}}{{ ) }}{ = exp{ 1 2 x T x} exp{x T x } exp{ 1 2 x T x } (26) Here exp{x T x } is a sum of simple products x T x ad is therefore a kerel based o the secod ad third rules; the first rule allows us to icorporate f(x) ad f(x ). Strig kerels. It is ofte ecessary to make predictios (classify, assess risk, determie user ratigs) o the basis of more complex objects such as variable legth sequeces or graphs that do ot ecessarily permit a simple descriptio as poits i R d. The idea of kerels exteds to such objects as well. Cosider, for example, the case where the iputs x are variable legth sequeces (e.g., documets or biosequeces) with elemets from some commo alphabet A (e.g., letters or protei residues). Oe way to compare such sequeces is to cosider subsequeces that they may share. Let u A k deote a legth k sequece from this alphabet ad i a sequece of k idexes. So, for example, we ca say that u = x[i] if u 1 = x i1, u 2 = x i2,..., u k = x ik. I other words, x cotais the elemets of u i positios i 1 < i 2 < < i k. If the elemets of u are foud i successive positios i x, the i k i 1 = k 1. A simple strig kerel correspods to feature vectors with couts of occureces of legth k subsequeces: φ u (x) = δ(i k i 1, k 1) (27) i:u=x[i] I other words, the compoets are idexed by subsequeces u ad the value of u- compoet is the umber of times x cotais u as a cotiguous subsequece. For example, φ o (the commo costruct) = 2 (28) The umber of compoets i such feature vectors is very large (expoetial i k). Yet, the ier product φ u (x)φ u (x ) (29) u A k ca be computed efficietly (there are oly a limited umber of possible cotiguous subsequeces i x ad x ). The reaso for this differece, ad the argumet i favor of kerels

6 6.867 Machie learig, lecture 7 (Jaakkola) 6 more geerally, is that the feature vectors have to aggregate the iformatio ecessary to compare ay two sequeces while the ier product is evaluated for two specific sequeces. We ca also relax the requiremet that matches must be cotiguous. To this ed, we defie the legth of the widow of x where u appears as l(i) = i k i 1. The feature vectors i a weighted gapped substrig kerel are give by φ u (x) = λ l(i) (30) i:u=x[i] where the parameter λ (0, 1) specifies the pealty for o-cotiguous matches to u. The resultig kerel K(x, x ) = φ u (x)φ u (x ) = λ l(i) λ l(i) (31) u A k u A k i:u=x[i] i:u=x [i] ca be computed recursively. It is ofte useful to ormalize such a kerel so as to remove ay immediate effect from the sequece legth: K (x, x K(x, x ) ) = K(x, x) K(x, x ) (32) Appedix (optioal): Kerel liear regressio with offset Give a feature expasio specified by φ(x) we try to miimize ( ) 2 J(θ, θ 0 ) = y t θ T φ(x t ) θ 0 + λ θ 2 (33) where we have chose ot to regularize θ 0 to preserve the similarity to classificatio discussed later o. Not regularizig θ 0 meas, e.g., that we do ot care whether all the resposes have a costat added to them; the value of the objective, after optimizig θ 0, would remai the same with or without such costat. Settig the derivatives with respect to θ 0 ad θ to zero gives the followig optimality coditios: dj(θ, θ 0 ) ( ) = 2 y t θ T φ(x t ) θ 0 = 0 (34) dθ 0 dj(θ, θ 0 ) dθ α t = 2λθ 2 { ( }} ) { yt θ T φ(x t ) θ 0 φ(x t ) = 0 (35)

7 6.867 Machie learig, lecture 7 (Jaakkola) 7 We ca therefore costruct the optimal θ i terms of predictio differeces α t ad the feature vectors as before: 1 λ θ = α t φ(x t ) (36) Usig this form of the solutio for θ ad Eq.(34) we ca also express the optimal θ 0 as a fuctio of the predictio differeces α t : ( ) 1 ( ) 1 1 θ 0 = y t θ T φ(x t ) = y t α t φ(x t ) T φ(x t ) (37) λ t =1 We ca ow costrai α t to take o values that ca ideed be iterpreted as predictio differeces: α i = y i θ T φ(x i ) θ 0 (38) 1 = y i α t φ(x t ) T φ(x i ) θ 0 λ (39) t =1 ( ) = y i α t φ(x t ) T φ(x i ) y t α t φ(x t ) T φ(x t ) (40) λ λ t =1 t ( =1 ) = y i y t α t φ(x t ) T φ(x i ) φ(x t ) T φ(x t ) (41) λ t =1 With the same matrix otatio as before, ad lettig 1 = [1,..., 1] T, we ca rewrite the above coditio as C {}}{ 1 a = (I 11 T /) y (I 11 T /)Ka (42) λ where C = I 11 T / is a ceterig matrix. Ay solutio to the above equatio has to satisfy 1 T a = 0 (just left multiply the equatio with 1 T ). Note that this is exactly the optimality coditio for θ 0 i Eq.(34). Usig this summig to zero property of the solutio we ca rewrite the above equatio as 1 a = Cy CKCa (43) λ

8 6.867 Machie learig, lecture 7 (Jaakkola) 8 where we have itroduced a additioal ceterig operatio o the right had side. This caot chage the solutio sice Ca = a wheever 1 T a = 0. The solutio â is the â = λ (λi + CKC) 1 Cy (44) Oce we have â we ca recostruct θˆ0 from Eq.(37). θˆt φ(x) reduces to the kerel form as before.

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Istitute of Techology 6.867 Machie Learig, Fall 6 Problem Set : Solutios. (a) (5 poits) From the lecture otes (Eq 4, Lecture 5), the optimal parameter values for liear regressio give the

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

Lecture 11: Pseudorandom functions

Lecture 11: Pseudorandom functions COM S 6830 Cryptography Oct 1, 2009 Istructor: Rafael Pass 1 Recap Lecture 11: Pseudoradom fuctios Scribe: Stefao Ermo Defiitio 1 (Ge, Ec, Dec) is a sigle message secure ecryptio scheme if for all uppt

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

The Growth of Functions. Theoretical Supplement

The Growth of Functions. Theoretical Supplement The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j. Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α

More information

CHAPTER 5. Theory and Solution Using Matrix Techniques

CHAPTER 5. Theory and Solution Using Matrix Techniques A SERIES OF CLASS NOTES FOR 2005-2006 TO INTRODUCE LINEAR AND NONLINEAR PROBLEMS TO ENGINEERS, SCIENTISTS, AND APPLIED MATHEMATICIANS DE CLASS NOTES 3 A COLLECTION OF HANDOUTS ON SYSTEMS OF ORDINARY DIFFERENTIAL

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

The z-transform. 7.1 Introduction. 7.2 The z-transform Derivation of the z-transform: x[n] = z n LTI system, h[n] z = re j

The z-transform. 7.1 Introduction. 7.2 The z-transform Derivation of the z-transform: x[n] = z n LTI system, h[n] z = re j The -Trasform 7. Itroductio Geeralie the complex siusoidal represetatio offered by DTFT to a represetatio of complex expoetial sigals. Obtai more geeral characteristics for discrete-time LTI systems. 7.

More information

Complex Numbers Solutions

Complex Numbers Solutions Complex Numbers Solutios Joseph Zoller February 7, 06 Solutios. (009 AIME I Problem ) There is a complex umber with imagiary part 64 ad a positive iteger such that Fid. [Solutio: 697] 4i + + 4i. 4i 4i

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

The multiplicative structure of finite field and a construction of LRC

The multiplicative structure of finite field and a construction of LRC IERG6120 Codig for Distributed Storage Systems Lecture 8-06/10/2016 The multiplicative structure of fiite field ad a costructio of LRC Lecturer: Keeth Shum Scribe: Zhouyi Hu Notatios: We use the otatio

More information

CHAPTER I: Vector Spaces

CHAPTER I: Vector Spaces CHAPTER I: Vector Spaces Sectio 1: Itroductio ad Examples This first chapter is largely a review of topics you probably saw i your liear algebra course. So why cover it? (1) Not everyoe remembers everythig

More information

September 2012 C1 Note. C1 Notes (Edexcel) Copyright - For AS, A2 notes and IGCSE / GCSE worksheets 1

September 2012 C1 Note. C1 Notes (Edexcel) Copyright   - For AS, A2 notes and IGCSE / GCSE worksheets 1 September 0 s (Edecel) Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22 CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first

More information

R is a scalar defined as follows:

R is a scalar defined as follows: Math 8. Notes o Dot Product, Cross Product, Plaes, Area, ad Volumes This lecture focuses primarily o the dot product ad its may applicatios, especially i the measuremet of agles ad scalar projectio ad

More information

b i u x i U a i j u x i u x j

b i u x i U a i j u x i u x j M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

ECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations

ECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations ECE-S352 Itroductio to Digital Sigal Processig Lecture 3A Direct Solutio of Differece Equatios Discrete Time Systems Described by Differece Equatios Uit impulse (sample) respose h() of a DT system allows

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett Lecture Note 8 Poit Estimators ad Poit Estimatio Methods MIT 14.30 Sprig 2006 Herma Beett Give a parameter with ukow value, the goal of poit estimatio is to use a sample to compute a umber that represets

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

Regression and generalization

Regression and generalization Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability

More information

FIR Filters. Lecture #7 Chapter 5. BME 310 Biomedical Computing - J.Schesser

FIR Filters. Lecture #7 Chapter 5. BME 310 Biomedical Computing - J.Schesser FIR Filters Lecture #7 Chapter 5 8 What Is this Course All About? To Gai a Appreciatio of the Various Types of Sigals ad Systems To Aalyze The Various Types of Systems To Lear the Skills ad Tools eeded

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Math 113 Exam 3 Practice

Math 113 Exam 3 Practice Math Exam Practice Exam will cover.-.9. This sheet has three sectios. The first sectio will remid you about techiques ad formulas that you should kow. The secod gives a umber of practice questios for you

More information

NICK DUFRESNE. 1 1 p(x). To determine some formulas for the generating function of the Schröder numbers, r(x) = a(x) =

NICK DUFRESNE. 1 1 p(x). To determine some formulas for the generating function of the Schröder numbers, r(x) = a(x) = AN INTRODUCTION TO SCHRÖDER AND UNKNOWN NUMBERS NICK DUFRESNE Abstract. I this article we will itroduce two types of lattice paths, Schröder paths ad Ukow paths. We will examie differet properties of each,

More information

6.003 Homework #3 Solutions

6.003 Homework #3 Solutions 6.00 Homework # Solutios Problems. Complex umbers a. Evaluate the real ad imagiary parts of j j. π/ Real part = Imagiary part = 0 e Euler s formula says that j = e jπ/, so jπ/ j π/ j j = e = e. Thus the

More information

CHAPTER 10 INFINITE SEQUENCES AND SERIES

CHAPTER 10 INFINITE SEQUENCES AND SERIES CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Math Solutions to homework 6

Math Solutions to homework 6 Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there

More information

Assignment 2 Solutions SOLUTION. ϕ 1 Â = 3 ϕ 1 4i ϕ 2. The other case can be dealt with in a similar way. { ϕ 2 Â} χ = { 4i ϕ 1 3 ϕ 2 } χ.

Assignment 2 Solutions SOLUTION. ϕ 1  = 3 ϕ 1 4i ϕ 2. The other case can be dealt with in a similar way. { ϕ 2 Â} χ = { 4i ϕ 1 3 ϕ 2 } χ. PHYSICS 34 QUANTUM PHYSICS II (25) Assigmet 2 Solutios 1. With respect to a pair of orthoormal vectors ϕ 1 ad ϕ 2 that spa the Hilbert space H of a certai system, the operator  is defied by its actio

More information

Chapter 4. Fourier Series

Chapter 4. Fourier Series Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,

More information

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 No-Parametric Techiques Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 Parametric vs. No-Parametric Parametric Based o Fuctios (e.g Normal Distributio) Uimodal Oly oe peak Ulikely real data cofies

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects

More information

PAPER : IIT-JAM 2010

PAPER : IIT-JAM 2010 MATHEMATICS-MA (CODE A) Q.-Q.5: Oly oe optio is correct for each questio. Each questio carries (+6) marks for correct aswer ad ( ) marks for icorrect aswer.. Which of the followig coditios does NOT esure

More information

Math 128A: Homework 1 Solutions

Math 128A: Homework 1 Solutions Math 8A: Homework Solutios Due: Jue. Determie the limits of the followig sequeces as. a) a = +. lim a + = lim =. b) a = + ). c) a = si4 +6) +. lim a = lim = lim + ) [ + ) ] = [ e ] = e 6. Observe that

More information

Ma 530 Infinite Series I

Ma 530 Infinite Series I Ma 50 Ifiite Series I Please ote that i additio to the material below this lecture icorporated material from the Visual Calculus web site. The material o sequeces is at Visual Sequeces. (To use this li

More information

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A) REGRESSION (Physics 0 Notes, Partial Modified Appedix A) HOW TO PERFORM A LINEAR REGRESSION Cosider the followig data poits ad their graph (Table I ad Figure ): X Y 0 3 5 3 7 4 9 5 Table : Example Data

More information

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials Math 60 www.timetodare.com 3. Properties of Divisio 3.3 Zeros of Polyomials 3.4 Complex ad Ratioal Zeros of Polyomials I these sectios we will study polyomials algebraically. Most of our work will be cocered

More information

Fourier Series and the Wave Equation

Fourier Series and the Wave Equation Fourier Series ad the Wave Equatio We start with the oe-dimesioal wave equatio u u =, x u(, t) = u(, t) =, ux (,) = f( x), u ( x,) = This represets a vibratig strig, where u is the displacemet of the strig

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled 1 Lecture : Area Area ad distace traveled Approximatig area by rectagles Summatio The area uder a parabola 1.1 Area ad distace Suppose we have the followig iformatio about the velocity of a particle, how

More information

FIR Filter Design: Part II

FIR Filter Design: Part II EEL335: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we cosider how we might go about desigig FIR filters with arbitrary frequecy resposes, through compositio of multiple sigle-peak

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

Math 475, Problem Set #12: Answers

Math 475, Problem Set #12: Answers Math 475, Problem Set #12: Aswers A. Chapter 8, problem 12, parts (b) ad (d). (b) S # (, 2) = 2 2, sice, from amog the 2 ways of puttig elemets ito 2 distiguishable boxes, exactly 2 of them result i oe

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n Review of Power Series, Power Series Solutios A power series i x - a is a ifiite series of the form c (x a) =c +c (x a)+(x a) +... We also call this a power series cetered at a. Ex. (x+) is cetered at

More information

5. Matrix exponentials and Von Neumann s theorem The matrix exponential. For an n n matrix X we define

5. Matrix exponentials and Von Neumann s theorem The matrix exponential. For an n n matrix X we define 5. Matrix expoetials ad Vo Neuma s theorem 5.1. The matrix expoetial. For a matrix X we defie e X = exp X = I + X + X2 2! +... = 0 X!. We assume that the etries are complex so that exp is well defied o

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

Chapter 10: Power Series

Chapter 10: Power Series Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

2.4 - Sequences and Series

2.4 - Sequences and Series 2.4 - Sequeces ad Series Sequeces A sequece is a ordered list of elemets. Defiitio 1 A sequece is a fuctio from a subset of the set of itegers (usually either the set 80, 1, 2, 3,... < or the set 81, 2,

More information

Fortgeschrittene Datenstrukturen Vorlesung 11

Fortgeschrittene Datenstrukturen Vorlesung 11 Fortgeschrittee Datestruture Vorlesug 11 Schriftführer: Marti Weider 19.01.2012 1 Succict Data Structures (ctd.) 1.1 Select-Queries A slightly differet approach, compared to ra, is used for select. B represets

More information

is also known as the general term of the sequence

is also known as the general term of the sequence Lesso : Sequeces ad Series Outlie Objectives: I ca determie whether a sequece has a patter. I ca determie whether a sequece ca be geeralized to fid a formula for the geeral term i the sequece. I ca determie

More information

Frequency Domain Filtering

Frequency Domain Filtering Frequecy Domai Filterig Raga Rodrigo October 19, 2010 Outlie Cotets 1 Itroductio 1 2 Fourier Represetatio of Fiite-Duratio Sequeces: The Discrete Fourier Trasform 1 3 The 2-D Discrete Fourier Trasform

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018 CSE 353 Discrete Computatioal Structures Sprig 08 Sequeces, Mathematical Iductio, ad Recursio (Chapter 5, Epp) Note: some course slides adopted from publisher-provided material Overview May mathematical

More information

Machine Learning Assignment-1

Machine Learning Assignment-1 Uiversity of Utah, School Of Computig Machie Learig Assigmet-1 Chadramouli, Shridhara sdhara@cs.utah.edu 00873255) Sigla, Sumedha sumedha.sigla@utah.edu 00877456) September 10, 2013 1 Liear Regressio a)

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

U8L1: Sec Equations of Lines in R 2

U8L1: Sec Equations of Lines in R 2 MCVU U8L: Sec. 8.9. Equatios of Lies i R Review of Equatios of a Straight Lie (-D) Cosider the lie passig through A (-,) with slope, as show i the diagram below. I poit slope form, the equatio of the lie

More information

1 1 2 = show that: over variables x and y. [2 marks] Write down necessary conditions involving first and second-order partial derivatives for ( x0, y

1 1 2 = show that: over variables x and y. [2 marks] Write down necessary conditions involving first and second-order partial derivatives for ( x0, y Questio (a) A square matrix A= A is called positive defiite if the quadratic form waw > 0 for every o-zero vector w [Note: Here (.) deotes the traspose of a matrix or a vector]. Let 0 A = 0 = show that:

More information

Ma 530 Introduction to Power Series

Ma 530 Introduction to Power Series Ma 530 Itroductio to Power Series Please ote that there is material o power series at Visual Calculus. Some of this material was used as part of the presetatio of the topics that follow. What is a Power

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Zeros of Polynomials

Zeros of Polynomials Math 160 www.timetodare.com 4.5 4.6 Zeros of Polyomials I these sectios we will study polyomials algebraically. Most of our work will be cocered with fidig the solutios of polyomial equatios of ay degree

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Signals & Systems Chapter3

Signals & Systems Chapter3 Sigals & Systems Chapter3 1.2 Discrete-Time (D-T) Sigals Electroic systems do most of the processig of a sigal usig a computer. A computer ca t directly process a C-T sigal but istead eeds a stream of

More information

Chapter Vectors

Chapter Vectors Chapter 4. Vectors fter readig this chapter you should be able to:. defie a vector. add ad subtract vectors. fid liear combiatios of vectors ad their relatioship to a set of equatios 4. explai what it

More information

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.

More information