6.867 Machine learning, lecture 7 (Jaakkola) 1
|
|
- Rosalind Gregory
- 6 years ago
- Views:
Transcription
1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit the offset parameter θ 0, reducig the model to y = θ T φ(x) + ɛ where φ(x) is a particular feature expasio (e.g., polyomial). Our goal here is to tur both the estimatio problem ad the subsequet predictio task ito forms that ivolve oly ier products betwee the feature vectors. We have already emphasized that regularizatio is ecessary i cojuctio with mappig examples to higher dimesioal feature vectors. The regularized least squares objective to be miimized, with parameter λ, is give by J(θ) = ( yt θ T φ(x t ) ) 2 + λ θ 2 This form ca be derived from pealized log-likelihood estimatio (see previous lecture otes). The effect of the regularizatio pealty is to pull all the parameters towards zero. So ay liear dimesios i the parameters that the traiig feature vectors do ot pertai to are set explicitly to zero. We would therefore expect the optimal parameters to lie i the spa of the feature vectors correspodig to the traiig examples. This is ideed the case. As before, the optimality coditio for θ follows from settig the gradiet to zero: dj(θ) dθ α t ( {}} = 2 y t θ T φ(x t ) ){ φ(x t ) + 2λθ = 0 (2) We ca therefore costruct the optimal θ i terms of predictio differeces α t ad the feature vectors: 1 λ (1) θ = α t φ(x t ) (3) The implicatio is that the optimal θ (however high dimesioal) will lie i the spa of the feature vectors correspodig to the traiig examples. This is due to the regularizatio
2 6.867 Machie learig, lecture 7 (Jaakkola) 2 pealty we added. But how do we set α t? The values for α t ca be foud by isistig that they ideed ca be iterpreted as predictio differeces: 1 λ t =1 α t = y t θ T φ(x t ) = y t α t φ(x t ) T φ(x t ) (4) Thus α t depeds oly o the actual resposes y t ad the ier products betwee the traiig examples, the Gram matrix : φ(x 1 ) T φ(x 1 ) φ(x 1 ) T φ(x ) K = (5) φ(x ) T φ(x 1 )... φ(x ) T φ(x ) I a vector form, a = [α 1,..., α ] T, (6) y = [y 1,..., y ] T, (7) a = 1 y Ka λ (8) the solutio is ( ) 1 â = λ λi + K y (9) Note that fidig the estimates ˆα t requires ivertig a matrix. This is the cost of dealig with ier products as opposed to hadig feature vectors directly. I some cases, the beefit is substatial sice the feature vectors i the ier products may be ifiite dimesioal but ever eeded explicitly. As a result of fidig ˆα t we ca cast the predictios for ew examples also i terms of ier products: y = θˆt φ(x) = (ˆα t /λ)φ(x t ) T φ(x) = αˆtk(x t, x) (10) where we view K(x t, x) as a kerel fuctio, a fuctio of two argumets x t ad x. Kerels So we have ow successfully tured a regularized liear regressio problem ito a kerel form. This meas that we ca simply substitute differet kerel fuctios K(x, x ) ito the estimatio/predictio equatios. This gives us a easy access to a wide rage of possible regressio fuctios. Here are a couple of stadard examples of kerels:
3 6.867 Machie learig, lecture 7 (Jaakkola) 3 Polyomial kerel K(x, x ) = (1 + x T x ) p, p = 1, 2,... (11) Radial basis kerel ( ) β K(x, x ) = exp x x 2, β > 0 (12) 2 We have already discussed the feature vectors correspodig to the polyomial kerel. The compoets of these feature vectors were polyomial terms up to degree p with specifically chose coefficiets. The restricted choice of coefficiets was ecessary i order to collapse the ier product calculatios. The feature vectors correspodig to the radial basis kerel are ifiite dimesioal! The compoets of these vectors are idexed by z R d where d is the dimesio of the origial iput x. More precisely, the feature vectors are fuctios: φ z (x) = c(β, d) N(z; x, 1/2β) (13) where N(z; x, (1/β)) is a ormal pdf over z ad c(β, d) is a costat. Roughly speakig, the radial basis kerel measures the probability that you would get the same sample z (i the same small regio) from two ormal distributios with meas x ad x ad a commo variace 1/2β. This is a reasoable measure of similarity betwee x ad x ad kerels are ofte defied from this perspective. The ier product givig rise to the radial basis kerel is defied through itegratio K(x, x ) = φ z (x)φ z (x )dz (14) We ca also costruct various types of kerels from simpler oes. Here are a few rules to guide us. Assume K 1 (x, x ) ad K 2 (x, x ) are valid kerels (correspod to ier products of some feature vectors), the 1. K(x, x ) = f(x)k 1 (x, x )f(x ) for ay fuctio f(x), 2. K(x, x ) = K 1 (x, x ) + K 2 (x, x ), 3. K(x, x ) = K 1 (x, x )K 2 (x, x )
4 6.867 Machie learig, lecture 7 (Jaakkola) 4 are all valid kerels. While simple, these rules are quite powerful. Let s first uderstad these rules from the poit of view of the implicit feature vectors. For each rule, let φ(x) be the feature vector correspodig to K ad φ (1) (x) ad φ (2) (x) the feature vectors associated with K 1 ad K 2, respectively. The feature mappig for the first rule is give simply by multiplyig with the scalar fuctio f(x): φ(x) = f(x)φ (1) (x) (15) so that φ(x) T φ(x ) = f(x)φ (1) (x) T φ (1) (x )f(x ) = f(x)k 1 (x, x )f(x ). The secod rule, addig kerels, correspods to just cocateatig the feature vectors [ ] φ (1) (x) φ(x) = φ (2) (16) (x) The third ad the last rule is a little more complicated but ot much. Suppose we use a double idex i, j to idex the compoets of φ(x) where i rages over the compoets of φ (1) (x) ad j refers to the compoets of φ (2) (x). The It is ow easy to see that (1) (2) φ i,j (x) = φ i (x)φ j (x) (17) K(x, x ) = φ(x) T φ(x ) (18) = φ i,j (x)φ i,j (x ) (19) i,j = φ (1) i (x)φ (2) j (x)φ (1) i (x )φ (2) j (x ) (20) i,j = [ φ (1) i (x)φ (1) i (x )][ φ (2) j (x)φ (2) j (x )] (21) i j = [φ (1) (x) T φ (1) (x )][φ (2) (x) T φ (2) (x )] (22) = K 1 (x, x )K 2 (x, x ) (23) These costructio rules ca also be used to verify that somethig is a valid kerel. As a example, let s figure out why a radial basis kerel K(x, x ) = exp{ 2 1 x x 2 } (24)
5 6.867 Machie learig, lecture 7 (Jaakkola) 5 is a valid kerel. exp{ 1 2 x x 2 } = exp{ 1 2 x T x + x T x 1 2 x T x } (25) f(x) f(x {}}{{ ) }}{ = exp{ 1 2 x T x} exp{x T x } exp{ 1 2 x T x } (26) Here exp{x T x } is a sum of simple products x T x ad is therefore a kerel based o the secod ad third rules; the first rule allows us to icorporate f(x) ad f(x ). Strig kerels. It is ofte ecessary to make predictios (classify, assess risk, determie user ratigs) o the basis of more complex objects such as variable legth sequeces or graphs that do ot ecessarily permit a simple descriptio as poits i R d. The idea of kerels exteds to such objects as well. Cosider, for example, the case where the iputs x are variable legth sequeces (e.g., documets or biosequeces) with elemets from some commo alphabet A (e.g., letters or protei residues). Oe way to compare such sequeces is to cosider subsequeces that they may share. Let u A k deote a legth k sequece from this alphabet ad i a sequece of k idexes. So, for example, we ca say that u = x[i] if u 1 = x i1, u 2 = x i2,..., u k = x ik. I other words, x cotais the elemets of u i positios i 1 < i 2 < < i k. If the elemets of u are foud i successive positios i x, the i k i 1 = k 1. A simple strig kerel correspods to feature vectors with couts of occureces of legth k subsequeces: φ u (x) = δ(i k i 1, k 1) (27) i:u=x[i] I other words, the compoets are idexed by subsequeces u ad the value of u- compoet is the umber of times x cotais u as a cotiguous subsequece. For example, φ o (the commo costruct) = 2 (28) The umber of compoets i such feature vectors is very large (expoetial i k). Yet, the ier product φ u (x)φ u (x ) (29) u A k ca be computed efficietly (there are oly a limited umber of possible cotiguous subsequeces i x ad x ). The reaso for this differece, ad the argumet i favor of kerels
6 6.867 Machie learig, lecture 7 (Jaakkola) 6 more geerally, is that the feature vectors have to aggregate the iformatio ecessary to compare ay two sequeces while the ier product is evaluated for two specific sequeces. We ca also relax the requiremet that matches must be cotiguous. To this ed, we defie the legth of the widow of x where u appears as l(i) = i k i 1. The feature vectors i a weighted gapped substrig kerel are give by φ u (x) = λ l(i) (30) i:u=x[i] where the parameter λ (0, 1) specifies the pealty for o-cotiguous matches to u. The resultig kerel K(x, x ) = φ u (x)φ u (x ) = λ l(i) λ l(i) (31) u A k u A k i:u=x[i] i:u=x [i] ca be computed recursively. It is ofte useful to ormalize such a kerel so as to remove ay immediate effect from the sequece legth: K (x, x K(x, x ) ) = K(x, x) K(x, x ) (32) Appedix (optioal): Kerel liear regressio with offset Give a feature expasio specified by φ(x) we try to miimize ( ) 2 J(θ, θ 0 ) = y t θ T φ(x t ) θ 0 + λ θ 2 (33) where we have chose ot to regularize θ 0 to preserve the similarity to classificatio discussed later o. Not regularizig θ 0 meas, e.g., that we do ot care whether all the resposes have a costat added to them; the value of the objective, after optimizig θ 0, would remai the same with or without such costat. Settig the derivatives with respect to θ 0 ad θ to zero gives the followig optimality coditios: dj(θ, θ 0 ) ( ) = 2 y t θ T φ(x t ) θ 0 = 0 (34) dθ 0 dj(θ, θ 0 ) dθ α t = 2λθ 2 { ( }} ) { yt θ T φ(x t ) θ 0 φ(x t ) = 0 (35)
7 6.867 Machie learig, lecture 7 (Jaakkola) 7 We ca therefore costruct the optimal θ i terms of predictio differeces α t ad the feature vectors as before: 1 λ θ = α t φ(x t ) (36) Usig this form of the solutio for θ ad Eq.(34) we ca also express the optimal θ 0 as a fuctio of the predictio differeces α t : ( ) 1 ( ) 1 1 θ 0 = y t θ T φ(x t ) = y t α t φ(x t ) T φ(x t ) (37) λ t =1 We ca ow costrai α t to take o values that ca ideed be iterpreted as predictio differeces: α i = y i θ T φ(x i ) θ 0 (38) 1 = y i α t φ(x t ) T φ(x i ) θ 0 λ (39) t =1 ( ) = y i α t φ(x t ) T φ(x i ) y t α t φ(x t ) T φ(x t ) (40) λ λ t =1 t ( =1 ) = y i y t α t φ(x t ) T φ(x i ) φ(x t ) T φ(x t ) (41) λ t =1 With the same matrix otatio as before, ad lettig 1 = [1,..., 1] T, we ca rewrite the above coditio as C {}}{ 1 a = (I 11 T /) y (I 11 T /)Ka (42) λ where C = I 11 T / is a ceterig matrix. Ay solutio to the above equatio has to satisfy 1 T a = 0 (just left multiply the equatio with 1 T ). Note that this is exactly the optimality coditio for θ 0 i Eq.(34). Usig this summig to zero property of the solutio we ca rewrite the above equatio as 1 a = Cy CKCa (43) λ
8 6.867 Machie learig, lecture 7 (Jaakkola) 8 where we have itroduced a additioal ceterig operatio o the right had side. This caot chage the solutio sice Ca = a wheever 1 T a = 0. The solutio â is the â = λ (λi + CKC) 1 Cy (44) Oce we have â we ca recostruct θˆ0 from Eq.(37). θˆt φ(x) reduces to the kerel form as before.
Support vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationMassachusetts Institute of Technology
Massachusetts Istitute of Techology 6.867 Machie Learig, Fall 6 Problem Set : Solutios. (a) (5 poits) From the lecture otes (Eq 4, Lecture 5), the optimal parameter values for liear regressio give the
More informationTopics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion
.87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses
More informationDiscrete-Time Systems, LTI Systems, and Discrete-Time Convolution
EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationMath 155 (Lecture 3)
Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,
More information62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +
62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of
More informationLecture 11: Pseudorandom functions
COM S 6830 Cryptography Oct 1, 2009 Istructor: Rafael Pass 1 Recap Lecture 11: Pseudoradom fuctios Scribe: Stefao Ermo Defiitio 1 (Ge, Ec, Dec) is a sigle message secure ecryptio scheme if for all uppt
More informationLinear Classifiers III
Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models
More informationThe Growth of Functions. Theoretical Supplement
The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that
More informationCS284A: Representations and Algorithms in Molecular Biology
CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by
More informationSupport Vector Machines and Kernel Methods
Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationThe Method of Least Squares. To understand least squares fitting of data.
The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve
More informationApply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.
Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α
More informationCHAPTER 5. Theory and Solution Using Matrix Techniques
A SERIES OF CLASS NOTES FOR 2005-2006 TO INTRODUCE LINEAR AND NONLINEAR PROBLEMS TO ENGINEERS, SCIENTISTS, AND APPLIED MATHEMATICIANS DE CLASS NOTES 3 A COLLECTION OF HANDOUTS ON SYSTEMS OF ORDINARY DIFFERENTIAL
More informationSequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet
More informationLecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)
Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell
More informationSequences. Notation. Convergence of a Sequence
Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it
More informationThe z-transform. 7.1 Introduction. 7.2 The z-transform Derivation of the z-transform: x[n] = z n LTI system, h[n] z = re j
The -Trasform 7. Itroductio Geeralie the complex siusoidal represetatio offered by DTFT to a represetatio of complex expoetial sigals. Obtai more geeral characteristics for discrete-time LTI systems. 7.
More informationComplex Numbers Solutions
Complex Numbers Solutios Joseph Zoller February 7, 06 Solutios. (009 AIME I Problem ) There is a complex umber with imagiary part 64 ad a positive iteger such that Fid. [Solutio: 697] 4i + + 4i. 4i 4i
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationThe multiplicative structure of finite field and a construction of LRC
IERG6120 Codig for Distributed Storage Systems Lecture 8-06/10/2016 The multiplicative structure of fiite field ad a costructio of LRC Lecturer: Keeth Shum Scribe: Zhouyi Hu Notatios: We use the otatio
More informationCHAPTER I: Vector Spaces
CHAPTER I: Vector Spaces Sectio 1: Itroductio ad Examples This first chapter is largely a review of topics you probably saw i your liear algebra course. So why cover it? (1) Not everyoe remembers everythig
More informationSeptember 2012 C1 Note. C1 Notes (Edexcel) Copyright - For AS, A2 notes and IGCSE / GCSE worksheets 1
September 0 s (Edecel) Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More informationDefinition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.
4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad
More informationLecture 7: Density Estimation: k-nearest Neighbor and Basis Approach
STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22
CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first
More informationR is a scalar defined as follows:
Math 8. Notes o Dot Product, Cross Product, Plaes, Area, ad Volumes This lecture focuses primarily o the dot product ad its may applicatios, especially i the measuremet of agles ad scalar projectio ad
More informationb i u x i U a i j u x i u x j
M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations
ECE-S352 Itroductio to Digital Sigal Processig Lecture 3A Direct Solutio of Differece Equatios Discrete Time Systems Described by Differece Equatios Uit impulse (sample) respose h() of a DT system allows
More informationA sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationLecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett
Lecture Note 8 Poit Estimators ad Poit Estimatio Methods MIT 14.30 Sprig 2006 Herma Beett Give a parameter with ukow value, the goal of poit estimatio is to use a sample to compute a umber that represets
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationGeometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT
OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca
More informationRegression and generalization
Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability
More informationFIR Filters. Lecture #7 Chapter 5. BME 310 Biomedical Computing - J.Schesser
FIR Filters Lecture #7 Chapter 5 8 What Is this Course All About? To Gai a Appreciatio of the Various Types of Sigals ad Systems To Aalyze The Various Types of Systems To Lear the Skills ad Tools eeded
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More informationMath 113 Exam 3 Practice
Math Exam Practice Exam will cover.-.9. This sheet has three sectios. The first sectio will remid you about techiques ad formulas that you should kow. The secod gives a umber of practice questios for you
More informationNICK DUFRESNE. 1 1 p(x). To determine some formulas for the generating function of the Schröder numbers, r(x) = a(x) =
AN INTRODUCTION TO SCHRÖDER AND UNKNOWN NUMBERS NICK DUFRESNE Abstract. I this article we will itroduce two types of lattice paths, Schröder paths ad Ukow paths. We will examie differet properties of each,
More information6.003 Homework #3 Solutions
6.00 Homework # Solutios Problems. Complex umbers a. Evaluate the real ad imagiary parts of j j. π/ Real part = Imagiary part = 0 e Euler s formula says that j = e jπ/, so jπ/ j π/ j j = e = e. Thus the
More informationCHAPTER 10 INFINITE SEQUENCES AND SERIES
CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationStatistical Pattern Recognition
Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig
More informationMath Solutions to homework 6
Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there
More informationAssignment 2 Solutions SOLUTION. ϕ 1 Â = 3 ϕ 1 4i ϕ 2. The other case can be dealt with in a similar way. { ϕ 2 Â} χ = { 4i ϕ 1 3 ϕ 2 } χ.
PHYSICS 34 QUANTUM PHYSICS II (25) Assigmet 2 Solutios 1. With respect to a pair of orthoormal vectors ϕ 1 ad ϕ 2 that spa the Hilbert space H of a certai system, the operator  is defied by its actio
More informationChapter 4. Fourier Series
Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,
More informationJacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3
No-Parametric Techiques Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 Parametric vs. No-Parametric Parametric Based o Fuctios (e.g Normal Distributio) Uimodal Oly oe peak Ulikely real data cofies
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationRecursive Algorithms. Recurrences. Recursive Algorithms Analysis
Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects
More informationPAPER : IIT-JAM 2010
MATHEMATICS-MA (CODE A) Q.-Q.5: Oly oe optio is correct for each questio. Each questio carries (+6) marks for correct aswer ad ( ) marks for icorrect aswer.. Which of the followig coditios does NOT esure
More informationMath 128A: Homework 1 Solutions
Math 8A: Homework Solutios Due: Jue. Determie the limits of the followig sequeces as. a) a = +. lim a + = lim =. b) a = + ). c) a = si4 +6) +. lim a = lim = lim + ) [ + ) ] = [ e ] = e 6. Observe that
More informationMa 530 Infinite Series I
Ma 50 Ifiite Series I Please ote that i additio to the material below this lecture icorporated material from the Visual Calculus web site. The material o sequeces is at Visual Sequeces. (To use this li
More informationREGRESSION (Physics 1210 Notes, Partial Modified Appendix A)
REGRESSION (Physics 0 Notes, Partial Modified Appedix A) HOW TO PERFORM A LINEAR REGRESSION Cosider the followig data poits ad their graph (Table I ad Figure ): X Y 0 3 5 3 7 4 9 5 Table : Example Data
More information3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials
Math 60 www.timetodare.com 3. Properties of Divisio 3.3 Zeros of Polyomials 3.4 Complex ad Ratioal Zeros of Polyomials I these sectios we will study polyomials algebraically. Most of our work will be cocered
More informationFourier Series and the Wave Equation
Fourier Series ad the Wave Equatio We start with the oe-dimesioal wave equatio u u =, x u(, t) = u(, t) =, ux (,) = f( x), u ( x,) = This represets a vibratig strig, where u is the displacemet of the strig
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationThe picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled
1 Lecture : Area Area ad distace traveled Approximatig area by rectagles Summatio The area uder a parabola 1.1 Area ad distace Suppose we have the followig iformatio about the velocity of a particle, how
More informationFIR Filter Design: Part II
EEL335: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we cosider how we might go about desigig FIR filters with arbitrary frequecy resposes, through compositio of multiple sigle-peak
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More information4.3 Growth Rates of Solutions to Recurrences
4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.
More informationMath 475, Problem Set #12: Answers
Math 475, Problem Set #12: Aswers A. Chapter 8, problem 12, parts (b) ad (d). (b) S # (, 2) = 2 2, sice, from amog the 2 ways of puttig elemets ito 2 distiguishable boxes, exactly 2 of them result i oe
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationWe are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n
Review of Power Series, Power Series Solutios A power series i x - a is a ifiite series of the form c (x a) =c +c (x a)+(x a) +... We also call this a power series cetered at a. Ex. (x+) is cetered at
More information5. Matrix exponentials and Von Neumann s theorem The matrix exponential. For an n n matrix X we define
5. Matrix expoetials ad Vo Neuma s theorem 5.1. The matrix expoetial. For a matrix X we defie e X = exp X = I + X + X2 2! +... = 0 X!. We assume that the etries are complex so that exp is well defied o
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric
More informationBoosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32
Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260
More informationChapter 10: Power Series
Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because
More informationSECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,
More information2.4 - Sequences and Series
2.4 - Sequeces ad Series Sequeces A sequece is a ordered list of elemets. Defiitio 1 A sequece is a fuctio from a subset of the set of itegers (usually either the set 80, 1, 2, 3,... < or the set 81, 2,
More informationFortgeschrittene Datenstrukturen Vorlesung 11
Fortgeschrittee Datestruture Vorlesug 11 Schriftführer: Marti Weider 19.01.2012 1 Succict Data Structures (ctd.) 1.1 Select-Queries A slightly differet approach, compared to ra, is used for select. B represets
More informationis also known as the general term of the sequence
Lesso : Sequeces ad Series Outlie Objectives: I ca determie whether a sequece has a patter. I ca determie whether a sequece ca be geeralized to fid a formula for the geeral term i the sequece. I ca determie
More informationFrequency Domain Filtering
Frequecy Domai Filterig Raga Rodrigo October 19, 2010 Outlie Cotets 1 Itroductio 1 2 Fourier Represetatio of Fiite-Duratio Sequeces: The Discrete Fourier Trasform 1 3 The 2-D Discrete Fourier Trasform
More informationECON 3150/4150, Spring term Lecture 3
Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio
More informationSequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018
CSE 353 Discrete Computatioal Structures Sprig 08 Sequeces, Mathematical Iductio, ad Recursio (Chapter 5, Epp) Note: some course slides adopted from publisher-provided material Overview May mathematical
More informationMachine Learning Assignment-1
Uiversity of Utah, School Of Computig Machie Learig Assigmet-1 Chadramouli, Shridhara sdhara@cs.utah.edu 00873255) Sigla, Sumedha sumedha.sigla@utah.edu 00877456) September 10, 2013 1 Liear Regressio a)
More informationEfficient GMM LECTURE 12 GMM II
DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet
More informationU8L1: Sec Equations of Lines in R 2
MCVU U8L: Sec. 8.9. Equatios of Lies i R Review of Equatios of a Straight Lie (-D) Cosider the lie passig through A (-,) with slope, as show i the diagram below. I poit slope form, the equatio of the lie
More information1 1 2 = show that: over variables x and y. [2 marks] Write down necessary conditions involving first and second-order partial derivatives for ( x0, y
Questio (a) A square matrix A= A is called positive defiite if the quadratic form waw > 0 for every o-zero vector w [Note: Here (.) deotes the traspose of a matrix or a vector]. Let 0 A = 0 = show that:
More informationMa 530 Introduction to Power Series
Ma 530 Itroductio to Power Series Please ote that there is material o power series at Visual Calculus. Some of this material was used as part of the presetatio of the topics that follow. What is a Power
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More informationZeros of Polynomials
Math 160 www.timetodare.com 4.5 4.6 Zeros of Polyomials I these sectios we will study polyomials algebraically. Most of our work will be cocered with fidig the solutios of polyomial equatios of ay degree
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationSignals & Systems Chapter3
Sigals & Systems Chapter3 1.2 Discrete-Time (D-T) Sigals Electroic systems do most of the processig of a sigal usig a computer. A computer ca t directly process a C-T sigal but istead eeds a stream of
More informationChapter Vectors
Chapter 4. Vectors fter readig this chapter you should be able to:. defie a vector. add ad subtract vectors. fid liear combiatios of vectors ad their relatioship to a set of equatios 4. explai what it
More informationEntropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP
Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.
More information